Degree Type

Dissertation

Date of Award

2017

Degree Name

Doctor of Philosophy

Department

Electrical and Computer Engineering

Major

Bioinformatics and Computational Biology

First Advisor

Julie Dickerson

Second Advisor

Steven Cannon

Abstract

This dissertation is focused on improving RNA-Seq processing in terms of

transcript assembly, transcript quantification and detection of differential alternative splicing.

There are two major challenges of solving these three problems.

The first is accurately deriving transcript-level expression values from RNA-Seq reads that often align ambiguously to a set of overlapping isoforms.

To make matter worse, gene annotation tends to misguide transcript quantification as new transcripts are often discovered in new RNA-Seq experiments.

The second challenge is accounting for intrinsic uncertainties or variabilities in RNA-Seq measurement when calling differential alternative splicing from multiple samples across two conditions.

Those uncertainties include coverage bias and biological variations.

Failing to account for these variabilities can lead to higher false positive rates.

To addressed these challenges, I develop a series of novel algorithms which are implemented in a software package called Strawberry.

To tackle the read assignment uncertainty challenge, Strawberry assembles aligned RNA-Seq reads into transcripts using a constrained flow network algorithm.

After the assembly, Strawberry uses a latent class model to assign reads to transcripts.

These two steps use different optimization frameworks but utilize the same graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data.

To infer differential alternative splicing, Strawberry extends the single sample quantification model by imposing a generalized linear model on the relative transcript proportions.

To account for count overdispersion, Strawberry uses an empirical Bayesian hierarchical model.

For coverage bias, Strawberry performs a bias correction step which borrows information across samples and genes before fitting the differential analysis model.

A serious of simulated and real data are used to evaluate and benchmark Strawberry's result.

Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies.

In terms of detecting differential alternative splicing, Strawberry also outperforms several state-of-the-art methods including DEXSeq, Cuffdiff 2 and DSGseq.

Strawberry and its supporting code, e.g., simulation and validation, are freely available at my github (\url{https://github.com/ruolin}).

DOI

https://doi.org/10.31274/etd-180810-5792

Copyright Owner

Ruolin Liu

Language

en

File Format

application/pdf

File Size

168 pages

Share

COinS