Integration of large datasets for plant model organisms

Thumbnail Image
Date
2013-01-01
Authors
Sucaet, Yves
Major Professor
Advisor
Eve S. Wurtele
Julie A. Dickerson
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Journal Issue
Is Version Of
Versions
Series
Department
Genetics, Development and Cell Biology
Abstract

This dissertation is concerned with bioinformatics data integration. The first chapter illustrates the current state of biological pathway databases in general, and in particular, plant pathway databases. Key studies are cited to illustrate the potential benefits that may come from further research into integration methods.

Different models are explored to interface with the various stakeholders of biological data repositories. A public website (http://www.metnetonline.org) was built to address the role of a bioinformatics data warehouse as a server for external third parties. A dedicated API (MetNetAPI: http://www.metnetonline.org/api) accommodates bioinformaticians (and software developers in general) who wish to build advanced applications on top of MetNet. The API (implemented as .NET and Java libraries) was designed to be as user-friendly to programmers, as the public website is to end-users. Finally, a hybrid model is examined: the use of XML as a repository for information integration, downstream processing, and data manipulation. An overview of the use of XML in biological applications is included.

MetNetAPI functions according to certain principles; a subset of the API is abstracted and implemented to interface with a range of other public databases. This results in a new bioinformatics toolkit that can be used to mix and match data from heterogeneous sources in a transparent manner. An example would be the grafting of protein-protein interaction data on top of araCyc pathways.

Biological network data is often distributed over a variety of independently modeled databases. This dissertation makes two contributions to the field of bioinformatics: A new service - MetNet Online - is now operating which offers access to the earlier created and integrated MetNetDB data repository. The service is geared toward end-users, students and researchers alike, as well as seasoned bioinformatics software developers who wish to build their own applications on top of an already integrated datasource. Furthermore, integrated databases are only useful when they can be synchronized with their respective external sources. Thus, a framework was created that allows for a systematic approach to such integration efforts. In closing, this work provides a roadmap to maintain current as well as prepare for future integrated biological database projects.

Comments
Description
Keywords
Citation
Source
Subject Categories
Copyright
Tue Jan 01 00:00:00 UTC 2013