Title

K-haplotypes

Degree Type

Creative Component

Semester of Graduation

Summer 2019

Department

Statistics

First Major Professor

Karin Dorman

Degree(s)

Master of Science (MS)

Major(s)

Statistics

Abstract

This article describes a practical, unsupervised hard clustering method for massive amplicon sequence datasets which take the quality scores into consideration. We adapted the k-means algorithms (Lloyd's [20]; MacQueen [21]; Hartigan's algorithm [12]) to maximize the discrete indicator variables that assign each observation to one of K clusters. Particularly, we compared the performance of our algorithms in terms of speed and accuracy on simulated and real datasets. We also compare our method to DADA2 [2], another model-based amplicon read clustering method.

Copyright Owner

Yudi Zhang

File Format

PDF

Share

COinS