Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Animal Genomics and Improvement Laboratory » Research » Publications at this Location » Publication #311910

Title: The use of PacBio and Hi-C data in denovo assembly of the goat genome

Author
item Bickhart, Derek
item KOREN, SERGEY - Department Of Defense
item PHILLIPPY, ADAM - Department Of Defense
item SMITH, TIMOTHY - Department Of Defense
item BURTON, JOSH - University Of Washington
item LLACHKO, IVAN - University Of Washington
item SAYRE, BRIAN - Virginia State University
item HUSON, HEATHER - Cornell University
item Schroeder, Steven - Steve
item Van Tassell, Curtis - Curt
item Sonstegard, Tad

Submitted to: Plant and Animal Genome Conference Proceedings
Publication Type: Abstract Only
Publication Acceptance Date: 1/11/2015
Publication Date: 1/11/2015
Citation: Bickhart, D.M., Koren, S., Phillippy, A.M., Smith, T.P., Burton, J.N., Llachko, I., Sayre, B.L., Huson, H.J., Schroeder, S.G., Van Tassell, C.P., Sonstegard, T.S. 2015. The use of PacBio and Hi-C data in denovo assembly of the goat genome. Plant and Animal Genome Conference Proceedings. San Diego, CA, January 10–14, W144.

Interpretive Summary:

Technical Abstract: Generating de novo reference genome assemblies for non-model organisms is a laborious task that often requires a large amount of data from several sequencing platforms and cytogenetic surveys. By using PacBio sequence data and new library creation techniques, we present a de novo, high quality reference assembly for the goat (Capra hircus) that demonstrates a primarily sequencing-based approach to efficiently create new reference assemblies for Eukaryotic species. This goat reference genome was created using 38 million PacBio P5-C3 reads generated from a San Clemente goat using the Celera Assembler PBcR pipeline with PacBio read self-correction. In order to generate the assembly, corrected and filtered reads were pre-assembled into a consensus model using PBDAGCON, and subsequently assembled using the Celera Assembly version 8.2. We generated 5,902 contigs using this method with a contig N50 size of 2.56 megabases. In order to generate chromosome-sized scaffolds, we used the LACHESIS scaffolding method to identify cis-chromosome Hi-C interactions in order to link contigs together. We then compared our new assembly to the existing goat reference assembly to identify large-scale discrepancies. In our comparison, we identified 247 disagreements between the two assemblies consisting of 123 inversions and 124 chromosome-contig relocations. The high quality of this data illustrates how this methodology can be used to efficiently generate new reference genome assemblies without the use of expensive fluorescent cytometry or large quantities of data from multiple sequencing platforms.