|GHURYE, JAY - University Of Maryland|
|POP, MIHAI - University Of Maryland|
|KOREN, SERGEY - National Institutes Of Health (NIH)|
|CHIN, CHEN-SHAN - Pacific Biosciences Inc|
Submitted to: BMC Genomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 6/20/2017
Publication Date: 7/12/2017
Citation: Ghurye, J., Pop, M., Koren, S., Bickhart, D.M., Chin, C. 2017. Scaffolding of long read assemblies using long range contact information. Biomed Central (BMC) Genomics. 18(1):527. https://doi.org/10.1186/s12864-017-3879-z.
Interpretive Summary: Sequencing information that captures the physical shape of the DNA molecule has proven to be extremely useful in several fields of biological and genomics study. Unfortunately, our ability to resolve and interpret this data is still in its infancy, particularly when it is used to improve reference genomes. This manuscript demonstrates a new algorithm, called “SALSA,” that properly and accurately interprets this sequencing data in order to make high quality DNA reference genomes.
Technical Abstract: Long read technologies have made a revolution in de novo genome assembly by generating long contigs. Although the assembly contiguity has increased, it may not span a chromosome, resulting in an unfinished chromosome level assembly. To address this problem, we develop a scaffolding method that can boost the contiguity of the assembly using genome wide chromatin interaction data. We demonstrate an algorithm that uses Hi-C data for longer-range scaffolding of de novo long read genome assemblies. The software is open-source and available from: https://github.com/machinegun/hi-c-scaffold.