Skip to main content
ARS Home » Northeast Area » Leetown, West Virginia » Cool and Cold Water Aquaculture Research » Research » Publications at this Location » Publication #367509

Research Project: Integrated Research Approaches for Improving Production Efficiency in Salmonids

Location: Cool and Cold Water Aquaculture Research

Title: A long reads-based trio-binning De-novo assembly of the North American Atlantic salmon genome

item Gao, Guangtu
item Waldbieser, Geoffrey - Geoff
item YOUNGBLOOD, RAMEY - Mississippi State University
item Pietrak, Michael
item Scheffler, Brian
item Rexroad, Caird
item Peterson, Brian
item Palti, Yniv

Submitted to: International Conference on Integrative Salmonid Biology
Publication Type: Abstract Only
Publication Acceptance Date: 9/23/2019
Publication Date: 11/17/2019
Citation: Gao, G., Waldbieser, G.C., Youngblood, R.C., Pietrak, M.R., Scheffler, B.E., Rexroad Iii, C.E., Peterson, B.C., Palti, Y. 2019. A long reads-based trio-binning De-novo assembly of the North American Atlantic salmon genome. In:Proceedings of International Conference on Integrative Salmonid Biology. Fourth International Conference on Integrative Salmonid Biology, November 17, 2019, Edinburgh, Scotland, UK. P.No.Poster - 1

Interpretive Summary:

Technical Abstract: A high-quality reference genome assembly is available for the European sub-species of Atlantic salmon (GCF_000233375.1). Given the complex genomic differences between European and North American (NA) Atlantic salmon, another reference de-novo assembly is needed for the NA Atlantic salmon. Currently, we do not have a homozygous salmon of NA origin for a de-novo assembly, but this problem may be overcome by the recently published trio-binning assembly approach. Trio-binning uses short Illumina reads from the two parental genomes to partition the long reads obtained from the heterozygous offspring into haplotype-specific sets. Each haplotype is then assembled independently to reconstruct the two parental genomes. To accomplish this, we initially generated 104x genome coverage in PacBio Sequel long-read sequence from a single salmon male (Chromosome 3/6 Y lineage) from the St. John River broodstock of the USDA breeding program in Maine. We then generated over 40x Illumina paired-end reads from each parent and used the short reads to separate the maternal and paternal long reads. For each parental genome, contigs were assembled from the pre-selected long reads using the Canu pipeline and consensus sequence was error-corrected using two iterations of Arrow with the PacBio raw reads. The Canu assembly contained 12,416 and 13,021 contigs with an N50 contig length of 768,218bp and 796,316bp for the female and male parental haplotypes, respectively. The total lengths of the female and male assemblies were 3.32Gb and 3.31Gb, respectively. A BUSCO analysis detected 94.1% and 93.1% of conserved Actinopterygii genes in the female and male assembly, respectively. We are currently adding over 50x genome coverage with PacBio long reads from the heterozygous offspring to further improve the contiguity of the two parental assemblies. The two assemblies will also be further improved with a Bionano optical map and Hi-C proximity ligation sequence data to produce super-scaffolds and correct mis-joined scaffolds.