Determining the Statistical Significance of Rules for Rule-based Knowledge-extraction Algorithms

Pandit, Sushain; Kodavali, Sateesh Kumar; Sridharan, Krishnakumar

Determining the Statistical Significance of Rules for Rule-based Knowledge-extraction Algorithms

File

Paper_Stat.pdf (265.47 KB)

Date

2009-12-07

Authors

Pandit, Sushain

Kodavali, Sateesh Kumar

Sridharan, Krishnakumar

Organizational Units

Organizational Unit

Computer Science

Department

Computer Science

Abstract

Domain specific knowledge bases are often built from domain-specific texts using rule-based knowledge-retrieval algorithms. These algorithms are based on semantic extraction rules that process text using a parser, looking at the resulting parse trees & dependency graphs and then applying those rules to identify possible constructs for triple extraction. The performance of such algorithms critically depends on how capable these rules are in extracting the knowledge (in the form of triples) as a fraction of the total knowledge present in the text fragment. In this paper, we propose a way to statistically analyze the significance of these rules based on the fraction of knowledge that they extract out of given text corpora.