fagin: synteny-based phylostratigraphy and finer classification of young genes

Thumbnail Image
Date
2019-01-01
Authors
Li, Jing
Singh, Urminder
Seetharam, Arun
Wurtele, Eve
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Person
Seetharam, Arun
Research Scientist IV
Person
Wurtele, Eve
Professor Emeritus
Research Projects
Organizational Units
Journal Issue
Is Version Of
Versions
Series
Department
Genetics, Development and Cell BiologyBioinformatics and Computational BiologyGenome Informatics FacilityBioinformatics and Computational Biology
Abstract

Background: With every new genome that is sequenced, thousands of species-specific genes (orphans) are found, some originating from ultra-rapid mutations of existing genes, many others originating de novo from non-genic regions of the genome. If some of these genes survive across speciations, then extant organisms will contain a patchwork of genes whose ancestors first appeared at different times. Standard phylostratigraphy, the technique of partitioning genes by their age, is based solely on protein similarity algorithms. However, this approach relies on negative evidence ─ a failure to detect a homolog of a query gene. An alternative approach is to limit the search for homologs to syntenic regions. Then, genes can be positively identified as de novo orphans by tracing them to non-coding sequences in related species.

Results: We have developed a synteny-based pipeline in the R framework. Fagin determines the genomic context of each query gene in a focal species compared to homologous sequence in target species. We tested the fagin pipeline on two focal species, Arabidopsis thaliana (plus four target species in Brassicaseae) and Saccharomyces cerevisiae (plus six target species in Saccharomyces). Using microsynteny maps, fagin classified the homology relationship of each query gene against each target genome into three main classes, and further subclasses: AAic (has a coding syntenic homolog), NTic (has a non-coding syntenic homolog), and Unknown (has no detected syntenic homolog). fagin inferred over half the “Unknown” A. thaliana query genes, and about 20% for S. cerevisiae, as lacking a syntenic homolog because of local indels or scrambled synteny.

Conclusions: fagin augments standard phylostratigraphy, and extends synteny-based phylostratigraphy with an automated, customizable, and detailed contextual analysis. By comparing synteny-based phylostrata to standard phylostrata, fagin systematically identifies those orphans and lineage-specific genes that are well-supported to have originated de novo. Analyzing within-species genomes should distinguish orphan genes that may have originated through rapid divergence from de novo orphans. Fagin also delineates whether a gene has no syntenic homolog because of technical or biological reasons. These analyses indicate that some orphans may be associated with regions of high genomic perturbation.

Comments

This article is published as Arendsee, Zebulun, Jing Li, Urminder Singh, Priyanka Bhandary, Arun Seetharam, and Eve Syrkin Wurtele. "fagin: synteny-based phylostratigraphy and finer classification of young genes." BMC bioinformatics 20 (2019): 1-14. doi; 10.1186/s12859-019-3023-y.

Description
Keywords
Citation
DOI
Copyright
Tue Jan 01 00:00:00 UTC 2019
Collections