Submitted to: bioRxiv
Publication Type: Pre-print Publication
Publication Acceptance Date: 1/6/2022
Publication Date: 1/6/2022
Citation: Bickhart, D.M., Koch, L.M., Smith, T.P., Riday, H., Sullivan, M.L. 2022. Chromosome-scale assembly of the highly heterozygous genome of red clover (Trifolium pratense L.), an allogamous forage crop species. bioRxiv. https://doi.org/10.1101/2022.01.06.475143.
Interpretive Summary: Red clover is a widely grown forage legume harvested for hay, grown in pasture for grazing, and sown as a companion crop. Like for many crops, genomic resources for red clover have greatly improved over the last decade. Unfortunately, a high-quality genomic reference sequence needed for many types of bioinformatic analyses has been lacking. Here we present a new reference genome for red clover generated using the latest sequencing technologies. The new reference genome is a vast improvement over the currently available genome: it reduces the number of contiguous sequences on which the genome is contained from 40,000 to 150 and takes into account the heterozygous nature of red clover’s genome. The new reference genome is expected to greatly facilitate work in gene discovery, transcriptomics, marker assisted breeding, and genome structure in red clover.
Technical Abstract: Background: Red clover is used as a forage crop in livestock production due to a variety of favorable traits relative to other crops. Improved varieties of Trifolium pratense L. have been developed mostly through conventional breeding approaches, but genetic progress could be accelerated and gene discovery facilitated using modern genomic methods based on a high-quality reference genome. Existing short-read based assemblies of the approximately 410 Megabase (Mb) genome have been reported but are fragmented into more than 100,000 contigs with numerous errors in order and orientation within scaffolds, in part due to the biology of the plant which displays a gametophytic self-incompatibility system that results in inherent high heterozygosity. Findings: A high-quality long-read based assembly of red clover is presented that reduces the number of contigs by more than 500-fold from existing assemblies and improves the contig N50 statistic by three orders of magnitude. The per-base quality is also improved and the 413.5 Mb assembly is nearly 20% longer than the 350 Mb short read assembly, closer to the predicted genome size. Quality measures are presented and full-length isoform sequence of RNA transcripts reported for use in assessing accuracy and for future annotation of the genome. Conclusions: The ARS-RCv1.1 genome assembly of red clover and related full-length transcript data for this species represents a major improvement of genomic resources for genetic improvement of an important forage crop. The assembly accurately represents the seven main linkage groups present in the genome of an obligate outcrossing, highly heterozygous plant species.