Change search
ReferencesLink to record
Permanent link

Direct link
Improved gap size estimation for scaffolding algorithms
Umeå University, Faculty of Science and Technology, Department of Plant Physiology. Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).ORCID iD: 0000-0001-6031-005X
2012 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, no 17, 2215-2222 p.Article in journal (Refereed) Published
Abstract [en]

Motivation: One of the important steps of genome assembly is scaffolding, in which contigs are linked using information from read-pairs. Scaffolding provides estimates about the order, relative orientation and distance between contigs. We have found that contig distance estimates are generally strongly biased and based on false assumptions. Since erroneous distance estimates can mislead in subsequent analysis, it is important to provide unbiased estimation of contig distance.

Results: In this article, we show that state-of-the-art programs for scaffolding are using an incorrect model of gap size estimation. We discuss why current maximum likelihood estimators are biased and describe what different cases of bias we are facing. Furthermore, we provide a model for the distribution of reads that span a gap and derive the maximum likelihood equation for the gap length. We motivate why this estimate is sound and show empirically that it outperforms gap estimators in popular scaffolding programs. Our results have consequences both for scaffolding software, structural variation detection and for library insert-size estimation as is commonly performed by read aligners.

Place, publisher, year, edition, pages
Oxford: Oxford University Press, 2012. Vol. 28, no 17, 2215-2222 p.
National Category
Biochemistry and Molecular Biology
URN: urn:nbn:se:umu:diva-60313DOI: 10.1093/bioinformatics/bts441ISI: 000308019200001OAI: diva2:566647
Available from: 2012-11-09 Created: 2012-10-09 Last updated: 2015-04-29Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Street, Nathaniel
By organisation
Department of Plant PhysiologyUmeå Plant Science Centre (UPSC)
In the same journal
Biochemistry and Molecular Biology

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 36 hits
ReferencesLink to record
Permanent link

Direct link