Technical Report Number
Computing Methodologies, Theory of Computation
This paper presents an efficient algorithm for learning regular grammars. A knowledgeable teacher provides the learner with a structurally complete set of strings that belong to L(G), the language corresponding to an unknown regular grammar G. This structurally complete set of positive examples implicitly specifies a lattice which represents the hypothesis space of candidate grammars that contains the unknown grammar G. The learner searches this lattice by posing queries about the membership in G of particular strings and uses the teacher's response to eliminate parts of the hypothesis space. The learner maintains at all time, a compact representation of the lattice in the form of two sets $\cal S$ and $\cal G$ that correspond (respectively) to the set of most specific and most general grammars consistent with the set of positive examples provided and the queries answered by the teacher up to that point in time. The correctness of the algorithm is established by proving that at least one element of the lattice, G*, that is equivalent to the unknown grammar G, is contained in the hypothesis space represented by $\cal S$ and $\cal G$ and that the algorithm terminates upon identifying such an element.