Ontology-guided extraction of structured information from unstructured text: Identifying and capturing complex relationships

Pandit, Sushain

Ontology-guided extraction of structured information from unstructured text: Identifying and capturing complex relationships

File

Pandit_iastate_0097M_11217.pdf (1.28 MB)

Date

2010-01-01

Authors

Pandit, Sushain

Advisor

Vasant Honavar

Altmetrics

Authors

Person

Pandit, Sushain

Graduate Student

Organizational Units

Organizational Unit

Computer Science

Computer Science—the theory, representation, processing, communication and use of information—is fundamentally transforming every aspect of human endeavor. The Department of Computer Science at Iowa State University advances computational and information sciences through; 1. educational and research programs within and beyond the university; 2. active engagement to help define national and international research, and 3. educational agendas, and sustained commitment to graduating leaders for academia, industry and government.

History
The Computer Science Department was officially established in 1969, with Robert Stewart serving as the founding Department Chair. Faculty were composed of joint appointments with Mathematics, Statistics, and Electrical Engineering. In 1969, the building which now houses the Computer Science department, then simply called the Computer Science building, was completed. Later it was named Atanasoff Hall. Throughout the 1980s to present, the department expanded and developed its teaching and research agendas to cover many areas of computing.

Dates of Existence
1969-present

Related Units

College of Liberal Arts and Sciences (parent college)

Department

Computer Science

Abstract

Many applications call for methods to enable automatic extraction of structured information from unstructured natural language text. Due to the inherent challenges of natural language processing, most of the existing methods for information extraction from text tend to be domain specific. This thesis explores a modular ontology-based approach to information extraction that decouples domain-specific knowledge from the rules used for information extraction. Specifically, the thesis describes:

1. A framework for ontology-driven extraction of a subset of nested complex relationships (e.g., Joe reports that Jim is a reliable employee) from free text. The extracted relationships are semantically represented in the form of RDF (resource description framework) graphs, which can be stored in RDF knowledge bases and queried using query languages for RDF.

2. An open source implementation of SEMANTIXS, a system for ontology-guided extraction and semantic representation of structured information from unstructured text.

3. Results of experiments that offer evidence of the utility of the proposed ontology-based approach to extract complex relationships from text.

Copyright

Fri Jan 01 00:00:00 UTC 2010

Collections

Theses and Dissertations

Full item page