Wednesday October 19, 12pm-1pm, 1116-E Klaus

Effective utilization of paired-reads for genome assembly

Rahul Nihalani

Advisor: Prof. Srinivas Aluru

ABSTRACT

De-novo genome assembly is the problem of recovering a genome from short reads sampled out of it. Large data with billions of reads, long repeat regions in the genome, erroneous reads and non-uniform read coverage make the problem quite challenging. de-Bruijn graph is a popular framework to approach the problem, and most modern assemblers use it to perform assembly. Paired-end sequencing is a commonly used technology to help more accurately determine the genomic location of individual reads and resolve repeats in genome assembly. Most current assembrrlers perform an initial assembly based on individual reads alone, and use the paired-end information in a separate stage for scaffolding. We propose a technique to embed read pair information directly into the de-Bruijn graph of reads in the form of distance constraints between graph nodes. We present an assembly algorithm that produces longer and more accurate contigs by utilizing the distance constraints in traversing the graph. In particular, we utilize the collective information represented by distance constraints along a contig constructed so far to resolve ambiguities that arise in further extending it. We also describe a model to facilitate the graph traversal with non-uniform read coverage.

BIO

B.Tech: LNMIIT Jaipur, India
M.Tech: IIT Bombay, India
Currently a PhD student in CSE with Dr. Srinivas Aluru