Date of Award
Doctor of Philosophy
Image annotation is a challenging task of assigning keywords to an image given the content of an image. It has a variety of applications in multi-media data-mining and computer vision. Traditional machine learning approaches to image annotation require large amounts of labeled data. This requirement is often unrealistic, as obtaining labeled data is, in general, expensive and time consuming. However, large amounts of weakly labeled data and tagged images is readily available, in particular in the web and social network communities. In this thesis we address the problem of image annotation using weak supervision. In particular, we formulate the problem of image annotation as multiple instance multiple label learning problem and propose generative and discriminative models to tackle this learning problem. Multiple instance multiple label learning is a generalization of supervised learning in which the training examples are bags of instances and each bag is labeled with a set of labels. We explore two learning frameworks: generative and discriminative, and propose models within each framework to address the problem of assigning text keywords to images.
The first approach, the generative model attempts to describe the process according to which the data was generated, and then learn its parameters from the data. This model is a non-parametric generalization of the known mixture model used in the past. We extend this model to a Hierarchical Dirichlet Process which allows for countably infinite mixture components. Our experimental evaluation shows that the performance of this model does not depend on the number of mixture components, unlike the standard mixture model which suffers from over-fitting for a large number of mixture components.
The second approach is a discriminative model, which unlike generative model answers the following question: given the input bag of instances what is the most likely assignment of labels to the bag. We address this problem by learning as many classifiers as there are possible labels and force the classifiers to share weights using trace-norm regularization. We show that the performance of this model is comparable to the state-of-the-art multiple instance multiple label classifiers and that unlike some state-of-the-art models, it is scalable and practical for datasets with a large number of training instances and possible labels.
Finally we generalize the discriminative model to a semi-supervised setting to allow the model take advantage of labeled and unlabeled data. We do so by assuming that the data lies in a low-dimensional manifold and introducing a penalty that enforces the classifiers assign similar labels to indirectly similar instances (i.e. instances that are near-by in the manifold space). The manifold is learned by constructing a similarity neighborhood graph over bags, and then graph-Laplacian is used to compute the penalty term.
Yakhnenko, Oksana, "Learning from text and images: generative and discriminative models for partially labeled data" (2009). Graduate Theses and Dissertations. 11077.