Title
Degree Type
Creative Component
Semester of Graduation
Summer 2019
Department
Statistics
First Major Professor
Karin Dorman
Degree(s)
Master of Science (MS)
Major(s)
Statistics
Abstract
This article describes a practical, unsupervised hard clustering method for massive amplicon sequence datasets which take the quality scores into consideration. We adapted the k-means algorithms (Lloyd's [20]; MacQueen [21]; Hartigan's algorithm [12]) to maximize the discrete indicator variables that assign each observation to one of K clusters. Particularly, we compared the performance of our algorithms in terms of speed and accuracy on simulated and real datasets. We also compare our method to DADA2 [2], another model-based amplicon read clustering method.
Copyright Owner
Yudi Zhang
Copyright Year
2019
File Format
Recommended Citation
Zhang, Yudi, "K-haplotypes" (2019). Creative Components. 363.
https://lib.dr.iastate.edu/creativecomponents/363