Skip to main content
ARS Home » Midwest Area » St. Paul, Minnesota » Plant Science Research » Research » Publications at this Location » Publication #412038

Research Project: Genetic Improvement and Cropping Systems of Alfalfa for Livestock Utilization, Environmental Protection and Soil Health

Location: Plant Science Research

Title: Genome sequencing and comparative genomic analysis in the autotetraploid alfalfa (Medicago sativa L.)

item KAUR, HARPEET - University Of Minnesota
item FARMER, ANDREW - National Center For Genome Resources
item MUDGE, JOANN - National Center For Genome Resources
item SHANNON, LAURA - University Of Minnesota
item Samac, Deborah - Debby

Submitted to: Plant and Animal Genome Conference
Publication Type: Abstract Only
Publication Acceptance Date: 12/20/2023
Publication Date: 1/12/2024
Citation: Kaur, H., Farmer, A., Mudge, J., Shannon, L.M., Samac, D.A. 2024. Genome sequencing and comparative genomic analysis in the autotetraploid alfalfa (Medicago sativa L.). Plant and Animal Genome Conference. San Diego, California. January 12-17, 2024.

Interpretive Summary:

Technical Abstract: Alfalfa (Medicago sativa L.) is an important perennial forage legume grown worldwide. It is an outcrossing, highly heterozygous autotetraploid species (2n=4x=32) with genome size of approximately 800 MB. Availability of high-quality reference genome sequences are a pre-requisite for fundamental and applied research in plant biology. In order to develop a pan genome, we sequenced the genomes of 10 different alfalfa accessions/cultivars with varying fall dormancy (4 to 10) and disease tolerance levels. The contigs were developed by integrating PacBio high-fidelity CCS long reads and Illumina short reads data. The obtained contigs were assembled into scaffolds using Dovetail Omni-C and BioNano Optical mapping sequencing data with total assembly length varying from 2.88 to 3.19 GB across the 10 genotypes. K-mer analysis revealed the presence of 4 haplotypes i.e., autotetraploidy in each alfalfa genome. Linkage based genetic maps can be used to produce chromosome-scale haplotype-resolved genome assemblies. We developed three genetic maps in order to validate, orient and phase the assemblies prior to annotation. These maps are based on GBS SNP markers called using three different reference genomes: 1) the ZhongmuNo.1 monoploid genome assembly, 2) the first homolog of allele-aware XinJiangDaYe genome assembly, and 3) the stable FASTA format of a graph-based pangenome developed using ZhongmuNo.1 as reference with four additional assemblies. The phased linkage maps with four haplotypes consisted of 2,482, 2,635, and 2,618 SNP markers spanning 1743.66, 2576.59, and 2701.13 cM in 8 linkage groups for ZhongmuNo.1, XinJiangDaYe, and the graph-based pangenome, respectively. Preliminary analysis using these genetic maps has helped phasing more than half of the assembled genome of ‘RegenSY27x’ genotype into four haplotypes. For genome annotations, RNA was extracted from roots, root nodules, stems, leaves, flowers, and seed pods of each of the 10 genotypes and used for RNA-Seq and long-read Iso-Seq analysis. The BUSCO (Benchmarking Universal Single Copy Orthologs) analyses revealed more than or equal to 99.40% genome completeness in each of the 10 assemblies. A pipeline to generate the alfalfa pan-genome was also developed. The comparative genomic analysis was conducted using three publicly available alfalfa whole genome assemblies of cultivars ZhongmuNo.1, XinJiangDaYe, and ZhongmuNo.4, all bred and grown in China. We identified that 65.3% of gene families among three genomes were dispensable, and 34.7% were shared by all three genomes to represent core gene families. We discovered 195,837 structural variations such as insertions, deletions, and inversions of size >50 bp. These variations in species-specific pan-genome highlight the importance of sequencing multiple genomes to discover and utilize the total variation present in the U.S. alfalfa germplasm.