Semester of Graduation
First Major Professor
Master of Science (MS)
This article describes a practical, unsupervised hard clustering method for massive amplicon sequence datasets which take the quality scores into consideration. We adapted the k-means algorithms (Lloyd's ; MacQueen ; Hartigan's algorithm ) to maximize the discrete indicator variables that assign each observation to one of K clusters. Particularly, we compared the performance of our algorithms in terms of speed and accuracy on simulated and real datasets. We also compare our method to DADA2 , another model-based amplicon read clustering method.
Zhang, Yudi, "K-haplotypes" (2019). Creative Components. 363.