Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Animal Genomics and Improvement Laboratory » Research » Publications at this Location » Publication #336193

Title: Scaffolding of long read assemblies using long range contact information

item GHURYE, JAY - University Of Maryland
item POP, MIHAI - University Of Maryland
item KOREN, SERGEY - National Institutes Of Health (NIH)
item Bickhart, Derek
item CHIN, CHEN-SHAN - Pacific Biosciences Inc

Submitted to: BMC Genomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 6/20/2017
Publication Date: 7/12/2017
Citation: Ghurye, J., Pop, M., Koren, S., Bickhart, D.M., Chin, C. 2017. Scaffolding of long read assemblies using long range contact information. Biomed Central (BMC) Genomics. 18(1):527.

Interpretive Summary: Sequencing information that captures the physical shape of the DNA molecule has proven to be extremely useful in several fields of biological and genomics study. Unfortunately, our ability to resolve and interpret this data is still in its infancy, particularly when it is used to improve reference genomes. This manuscript demonstrates a new algorithm, called “SALSA,” that properly and accurately interprets this sequencing data in order to make high quality DNA reference genomes.

Technical Abstract: Long read technologies have made a revolution in de novo genome assembly by generating long contigs. Although the assembly contiguity has increased, it may not span a chromosome, resulting in an unfinished chromosome level assembly. To address this problem, we develop a scaffolding method that can boost the contiguity of the assembly using genome wide chromatin interaction data. We demonstrate an algorithm that uses Hi-C data for longer-range scaffolding of de novo long read genome assemblies. The software is open-source and available from: