Submitted to: GigaByte
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 2/14/2022
Publication Date: 2/18/2022
Citation: Bickhart, D.M., Koch, L.M., Smith, T.P., Riday, H., Sullivan, M.L. 2022. Chromosome-scale assembly of the highly heterozygous genome of red clover (Trifolium pratense L.), an allogamous forage crop species. GigaByte. 42:1-13. https://doi.org/10.1101/2022.01.06.475143.
Interpretive Summary: Red clover is a widely grown forage legume harvested for hay, grown in pasture for grazing, and sown as a companion crop. Like for many crops, genomic resources for red clover have greatly improved over the last decade. Unfortunately, a high-quality genomic reference sequence needed for many types of bioinformatic analyses has been lacking. Here we present a new reference genome for red clover generated using the latest sequencing technologies. The new reference genome is a vast improvement over the currently available genome: it reduces the number of contiguous sequences on which the genome is contained from 40,000 to 150 and takes into account the heterozygous nature of red clover’s genome. The new reference genome is expected to greatly facilitate work in gene discovery, transcriptomics, marker assisted breeding, and genome structure in red clover.
Technical Abstract: Relative to other crops, red clover (Trifolium pratense L.) has various favorable traits making it an ideal forage crop. Conventional breeding has improved varieties, but modern genomic methods could accelerate progress and facilitate gene discovery. Existing short-read-based genome assemblies of the ~420 megabase pair (Mbp) genome are fragmented into >135,000 contigs, with numerous order and orientation errors within scaffolds, probably associated with the plant’s biology, which displays gametophytic self-incompatibility resulting in inherent high heterozygosity. Here, we present a high-quality long-read-based assembly of red clover with a more than 500-fold reduction in contigs, improved per-base quality, and increased contig N50 by three orders of magnitude. The 413.5 Mbp assembly is nearly 20% longer than the 350 Mbp short-read assembly, closer to the predicted genome size. We also present quality measures and full-length isoform RNA transcript sequences for assessing accuracy and future genome annotation. The assembly accurately represents the seven main linkage groups in an allogamous (outcrossing), highly heterozygous plant genome.