Raising orphans from a metadata morass: a researcher's guide to re-use of public ’omics data

Thumbnail Image
Date
2017-01-01
Authors
Seetharam, Arun
Arendsee, Zebulun
Hur, Manhoi
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Person
Wurtele, Eve
Professor Emeritus
Person
Seetharam, Arun
Research Scientist IV
Research Projects
Organizational Units
Journal Issue
Is Version Of
Versions
Series
Department
Genetics, Development and Cell BiologyGenome Informatics FacilityGenome Informatics Facility
Abstract

More than 15 petabases of raw RNAseq data is now accessible through public repositories. Acquisition of other ’omics data types is expanding, though most lack a centralized archival repository. Data-reuse provides tremendous opportunity to extract new knowledge from existing experiments, and offers a unique opportunity for robust, multi-’omics analyses by merging metadata (information about experimental design, biological samples, protocols) and data from multiple experiments. We illustrate how predictive research can be accelerated by meta-analysis with a study of orphan (species-specific) genes. Computational predictions are critical to infer orphan function because their coding sequences provide very few clues. The metadata in public databases is often confusing; a test case with Zea mays mRNA seq data reveals a high proportion of missing, misleading or incomplete metadata. This metadata morass significantly diminishes the insight that can be extracted from these data. We provide tips for data submitters and users, including specific recommendations to improve metadata quality by more use of controlled vocabulary and by metadata reviews. Finally, we advocate for a unified, straightforward metadata submission and retrieval system.

Comments

This is a manuscript of an article published as Priyanka Bhandary, Arun S. Seetharam, Zebulun Arendsee, Manhoi Hur, Eve Syrkin Wurtele, Raising orphans from a metadata morass: a researcher’s guide to re-use of public ’omics data, Plant Science (2017), doi: 10.1016/j.plantsci.2017.10.014. Posted with permission.

Description
Keywords
Citation
DOI
Copyright
Sun Jan 01 00:00:00 UTC 2017
Collections