Skip to main content
ARS Home » Northeast Area » Leetown, West Virginia » Cool and Cold Water Aquaculture Research » Research » Publications at this Location » Publication #367496

Research Project: Integrated Research Approaches for Improving Production Efficiency in Salmonids

Location: Cool and Cold Water Aquaculture Research

Title: A long reads-based De-novo assembly of the rainbow trout Arlee-line genome

item Gao, Guangtu
item Waldbieser, Geoffrey - Geoff
item YOUNGBLOOD, RAMEY - Mississippi State University
item WHEELER, PAUL - Washington State University
item Scheffler, Brian
item THORGAARD, GARY - Washington State University
item Palti, Yniv

Submitted to: International Conference on Integrative Salmonid Biology
Publication Type: Abstract Only
Publication Acceptance Date: 9/23/2019
Publication Date: 11/17/2019
Citation: Gao, G., Waldbieser, G.C., Youngblood, R.C., Wheeler, P.A., Scheffler, B.E., Thorgaard, G.H., Palti, Y. 2019. A long reads-based De-novo assembly of the rainbow trout Arlee-line genome. International Conference on Integrative Salmonid Biology. In: Proceedings of International Conference on Integrative Salmonid Biology,November 17,2019, Edinburgh, Scotland, UK. P.No.ORAL-1.4

Interpretive Summary:

Technical Abstract: Although the most recent version of the rainbow trout genome assembly from the Swanson line has greatly improved the genome reference and is reliable for genes’ prediction, it contains 420,055 spanned gaps and 7,839 un-spanned gaps (GCA_002163495.1). Hence, there is still a need to improve the contiguity and completeness of the reference assembly, which is now possible with long-read DNA sequencing technologies. Currently, we are also working towards generating a rainbow trout “pan-genome” reference that will better represent the genetic diversity in this species. The Arlee doubled haploid YY male line has a different genetic background from the Swanson line. It was originated from a domesticated strain that was originally collected from the northern California coast. For the Arlee genome assembly, we generated 111x genome coverage in long-read sequence data using the PacBio Sequel system. The read length distribution has N50 of ~33 kb and an average read length greater than 20 kb (Figure 1). Contigs were assembled using the Canu pipeline and consensus sequence was error-corrected using two iterations of Arrow with the PacBio reads followed by one iteration of Freebayes using Illumina paired-end reads. The Canu assembly contained 1,591 contigs with an N50 contig length of 9,835,815 bp, which is a major improvement in contiguity compared to the current Swanson assembly. The assembly was further improved with a Bionano optical map and Hi-C proximity ligation sequence data to produce super-scaffolds and correct mis-joined scaffolds. This improved the assembly to a total of 2.35 Gb in 945 scaffolds with an N50 length of 46,466,374 bp. The range of the scaffolds’ length distribution after Bionano and Hi-C was 16,956bp – 88,658,648bp. A BUSCO analysis detected 96.6% of conserved Actinopterygii gene content in this assembly. We are currently using the rainbow trout high-density genetic map to guide chromosomal alignment of scaffolds.