Publication : USDA ARS

ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Animal Genomics and Improvement Laboratory » Research » Publications at this Location » Publication #311910

Title: The use of PacBio and Hi-C data in denovo assembly of the goat genome

Author

	Bickhart, Derek
	KOREN, SERGEY - Department Of Defense
	PHILLIPPY, ADAM - Department Of Defense
	SMITH, TIMOTHY - Department Of Defense
	BURTON, JOSH - University Of Washington
	LLACHKO, IVAN - University Of Washington
	SAYRE, BRIAN - Virginia State University
	HUSON, HEATHER - Cornell University
	Schroeder, Steven - Steve
	Van Tassell, Curtis - Curt
	Sonstegard, Tad

Submitted to: Plant and Animal Genome Conference Proceedings
Publication Type: Abstract Only
Publication Acceptance Date: 1/11/2015
Publication Date: 1/11/2015
Citation: Bickhart, D.M., Koren, S., Phillippy, A.M., Smith, T.P., Burton, J.N., Llachko, I., Sayre, B.L., Huson, H.J., Schroeder, S.G., Van Tassell, C.P., Sonstegard, T.S. 2015. The use of PacBio and Hi-C data in denovo assembly of the goat genome. Plant and Animal Genome Conference Proceedings. San Diego, CA, January 10–14, W144.

Interpretive Summary:

Technical Abstract: Generating de novo reference genome assemblies for non-model organisms is a laborious task that often requires a large amount of data from several sequencing platforms and cytogenetic surveys. By using PacBio sequence data and new library creation techniques, we present a de novo, high quality reference assembly for the goat (Capra hircus) that demonstrates a primarily sequencing-based approach to efficiently create new reference assemblies for Eukaryotic species. This goat reference genome was created using 38 million PacBio P5-C3 reads generated from a San Clemente goat using the Celera Assembler PBcR pipeline with PacBio read self-correction. In order to generate the assembly, corrected and filtered reads were pre-assembled into a consensus model using PBDAGCON, and subsequently assembled using the Celera Assembly version 8.2. We generated 5,902 contigs using this method with a contig N50 size of 2.56 megabases. In order to generate chromosome-sized scaffolds, we used the LACHESIS scaffolding method to identify cis-chromosome Hi-C interactions in order to link contigs together. We then compared our new assembly to the existing goat reference assembly to identify large-scale discrepancies. In our comparison, we identified 247 disagreements between the two assemblies consisting of 123 inversions and 124 chromosome-contig relocations. The high quality of this data illustrates how this methodology can be used to efficiently generate new reference genome assemblies without the use of expensive fluorescent cytometry or large quantities of data from multiple sequencing platforms.

U.S. DEPARTMENT OF AGRICULTURE

Animal Genomics and Improvement Laboratory: Beltsville, MD