|KOREN, SERGEY - Department Of Defense|
|PHILLIPPY, ADAM - Department Of Defense|
|SMITH, TIMOTHY - Department Of Defense|
|BURTON, JOSH - University Of Washington|
|LLACHKO, IVAN - University Of Washington|
|SAYRE, BRIAN - Virginia State University|
|HUSON, HEATHER - Cornell University|
|Schroeder, Steven - Steve|
|Van Tassell, Curtis - Curt|
Submitted to: Plant and Animal Genome Conference Proceedings
Publication Type: Abstract Only
Publication Acceptance Date: 1/11/2015
Publication Date: 1/11/2015
Citation: Bickhart, D.M., Koren, S., Phillippy, A.M., Smith, T.P., Burton, J.N., Llachko, I., Sayre, B.L., Huson, H.J., Schroeder, S.G., Van Tassell, C.P., Sonstegard, T.S. 2015. The use of PacBio and Hi-C data in denovo assembly of the goat genome. Plant and Animal Genome Conference Proceedings. San Diego, CA, January 10–14, W144.
Technical Abstract: Generating de novo reference genome assemblies for non-model organisms is a laborious task that often requires a large amount of data from several sequencing platforms and cytogenetic surveys. By using PacBio sequence data and new library creation techniques, we present a de novo, high quality reference assembly for the goat (Capra hircus) that demonstrates a primarily sequencing-based approach to efficiently create new reference assemblies for Eukaryotic species. This goat reference genome was created using 38 million PacBio P5-C3 reads generated from a San Clemente goat using the Celera Assembler PBcR pipeline with PacBio read self-correction. In order to generate the assembly, corrected and filtered reads were pre-assembled into a consensus model using PBDAGCON, and subsequently assembled using the Celera Assembly version 8.2. We generated 5,902 contigs using this method with a contig N50 size of 2.56 megabases. In order to generate chromosome-sized scaffolds, we used the LACHESIS scaffolding method to identify cis-chromosome Hi-C interactions in order to link contigs together. We then compared our new assembly to the existing goat reference assembly to identify large-scale discrepancies. In our comparison, we identified 247 disagreements between the two assemblies consisting of 123 inversions and 124 chromosome-contig relocations. The high quality of this data illustrates how this methodology can be used to efficiently generate new reference genome assemblies without the use of expensive fluorescent cytometry or large quantities of data from multiple sequencing platforms.