Campus Units
Statistics
Document Type
Article
Publication Version
Submitted Manuscript
Publication Date
9-30-2020
Journal or Book Title
arXiv
Abstract
By record linkage one joins records residing in separate files which are believed to be related to the same entity. In this paper we approach record linkage as a classification problem, and adapt the maximum entropy classification method in text mining to record linkage, both in the supervised and unsupervised settings of machine learning. The set of links will be chosen according to the associated uncertainty. On the one hand, our framework overcomes some persistent theoretical flaws of the classical approach pioneered by Fellegi and Sunter (1969); on the other hand, the proposed algorithm is scalable and fully automatic, unlike the classical approach that generally requires clerical review to resolve the undecided cases.
Copyright Owner
The Authors
Copyright Date
2020
Language
en
File Format
application/pdf
Recommended Citation
Lee, Danhyang; Zhang, Li-Chun; and Kim, Jae Kwang, "Maximum Entropy classification for record linkage" (2020). Statistics Publications. 306.
https://lib.dr.iastate.edu/stat_las_pubs/306
Included in
Design of Experiments and Sample Surveys Commons, Probability Commons, Statistical Methodology Commons
Comments
This preprint is made available through arXiv: https://arxiv.org/abs/2009.14797.