Degree Type

Thesis

Date of Award

2010

Degree Name

Master of Science

Department

Computer Science

First Advisor

Vasant Honavar

Abstract

Many applications call for methods to enable automatic extraction of structured information from unstructured natural language text. Due to the inherent challenges of natural language processing, most of the existing methods for information extraction from text tend to be domain specific. This thesis explores a modular ontology-based approach to information extraction that decouples domain-specific knowledge from the rules used for information extraction. Specifically, the thesis describes:

1. A framework for ontology-driven extraction of a subset of nested complex relationships (e.g., Joe reports that Jim is a reliable employee) from free text. The extracted relationships are semantically represented in the form of RDF (resource description framework) graphs, which can be stored in RDF knowledge bases and queried using query languages for RDF.

2. An open source implementation of SEMANTIXS, a system for ontology-guided extraction and semantic representation of structured information from unstructured text.

3. Results of experiments that offer evidence of the utility of the proposed ontology-based approach to extract complex relationships from text.

DOI

https://doi.org/10.31274/etd-180810-2519

Copyright Owner

Sushain Pandit

Language

en

Date Available

2012-04-30

File Format

application/pdf

File Size

92 pages

Share

COinS