An Efficient Interactive Algorithm for Regular Language Learning

Parekh, Rajesh; Honavar, Vasant

An Efficient Interactive Algorithm for Regular Language Learning

File

TR95_02.pdf (198.32 KB)

Date

1995-02-06

Authors

Parekh, Rajesh

Honavar, Vasant

Organizational Units

Organizational Unit

Computer Science

Computer Science—the theory, representation, processing, communication and use of information—is fundamentally transforming every aspect of human endeavor. The Department of Computer Science at Iowa State University advances computational and information sciences through; 1. educational and research programs within and beyond the university; 2. active engagement to help define national and international research, and 3. educational agendas, and sustained commitment to graduating leaders for academia, industry and government.

History
The Computer Science Department was officially established in 1969, with Robert Stewart serving as the founding Department Chair. Faculty were composed of joint appointments with Mathematics, Statistics, and Electrical Engineering. In 1969, the building which now houses the Computer Science department, then simply called the Computer Science building, was completed. Later it was named Atanasoff Hall. Throughout the 1980s to present, the department expanded and developed its teaching and research agendas to cover many areas of computing.

Dates of Existence
1969-present

Related Units

College of Liberal Arts and Sciences (parent college)

Department

Computer Science

Abstract

This paper presents an efficient algorithm for learning regular grammars. A knowledgeable teacher provides the learner with a structurally complete set of strings that belong to L(G), the language corresponding to an unknown regular grammar G. This structurally complete set of positive examples implicitly specifies a lattice which represents the hypothesis space of candidate grammars that contains the unknown grammar G. The learner searches this lattice by posing queries about the membership in G of particular strings and uses the teacher's response to eliminate parts of the hypothesis space. The learner maintains at all time, a compact representation of the lattice in the form of two sets $\cal S$ and $\cal G$ that correspond (respectively) to the set of most specific and most general grammars consistent with the set of positive examples provided and the queries answered by the teacher up to that point in time. The correctness of the algorithm is established by proving that at least one element of the lattice, G*, that is equivalent to the unknown grammar G, is contained in the hypothesis space represented by $\cal S$ and $\cal G$ and that the algorithm terminates upon identifying such an element.