Maximum Entropy classification for record linkage

Thumbnail Image
Date
2020-09-30
Authors
Lee, Danhyang
Zhang, Li-Chun
Kim, Jae Kwang
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Research Projects
Organizational Units
Organizational Unit
Journal Issue
Is Version Of
Versions
Series
Department
Statistics
Abstract

By record linkage one joins records residing in separate files which are believed to be related to the same entity. In this paper we approach record linkage as a classification problem, and adapt the maximum entropy classification method in text mining to record linkage, both in the supervised and unsupervised settings of machine learning. The set of links will be chosen according to the associated uncertainty. On the one hand, our framework overcomes some persistent theoretical flaws of the classical approach pioneered by Fellegi and Sunter (1969); on the other hand, the proposed algorithm is scalable and fully automatic, unlike the classical approach that generally requires clerical review to resolve the undecided cases.

Comments

This preprint is made available through arXiv: https://arxiv.org/abs/2009.14797.

Description
Keywords
Citation
DOI
Source
Copyright
Wed Jan 01 00:00:00 UTC 2020
Collections