Technical Report Number
Domain specific knowledge bases are often built from domain-specific texts using rule-based knowledge-retrieval algorithms. These algorithms are based on semantic extraction rules that process text using a parser, looking at the resulting parse trees & dependency graphs and then applying those rules to identify possible constructs for triple extraction. The performance of such algorithms critically depends on how capable these rules are in extracting the knowledge (in the form of triples) as a fraction of the total knowledge present in the text fragment. In this paper, we propose a way to statistically analyze the significance of these rules based on the fraction of knowledge that they extract out of given text corpora.