Skip to main content
ARS Home » Research » Publications » Publications at this Location

Research Project: Applied Agricultural Genomics and Bioinformatics Research

Location: Genomics and Bioinformatics Research

Title: An anchored chromosome-scale genome assembly of spinach (Spinacia oleracea) improves annotation and reveals extensive gene rearrangements in euasterids

item Hulse-Kemp, Amanda
item BOSTAN, HAMED - National Institutes Of Health (NIH)
item CHEN, SHIYU - University Of California
item ASHRAFI, HAMID - North Carolina State University
item IORIZZO, MASSIMO - North Carolina State University
item VAN DEYNZE, ALLEN - University Of California

Submitted to: The Plant Genome
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/27/2021
Publication Date: N/A
Citation: N/A

Interpretive Summary: A new high-quality reference genome for Spinach has been completed, which is of exponentially higher quality than the currently available version. This improved genome has allowed for a number of discoveries which were not possible with the previous version. While previously it was thought that Spinach had not undergone any whole genome duplications in it's evolutionary history, we were able to find evidence that it actually has a whole genome triplication followed by massive genome rearrangements. The occurrence of genome rearrangements has made it difficult to identify these duplication events without a higher-quality genome. We were also able to look at the population structure of 75 spinach cultivar lines that have differing leaf shapes. The shape of the leaves is important for how spinach is classified into market types. It appears there are three major genetic groups which appear to share some aspects of leaf shape as a major distinguishing factor, the one group is primarily composed of oriental leaf type, another group is primarily smooth leaf type and the third group is a combination of savoy and semi-savoy leaf types. The study of leaf types was also compared to a previous study, which has made it clear that genetics studies in Spinach must be very careful to genotype the specific plants in which traits are measured as the cultivar lines can be segregating in varying amounts which is even visible in leaf characteristics and can effect the outcomes of genetic studies if not handled correctly. Overall this study will provide many valuable details for spinach breeding in the future and provide a high-quality reference genome which will better enable genetic studies.

Technical Abstract: Spinach is a member of the Caryophyllales family, a basal eudicot asterid that also consists of sugar beet (Beta vulgaris), quinoa (Chenopodium quinoa) and amaranth (Amaranthus hypochondriacus). With the introduction of baby leaf types, it has become a staple food in many homes. Production issues focus on yield, nitrogen-use efficiency and resistance to downy mildew (Peronospora effusa). Although genome sequences are available for each of the above species, a chromosome-level assembly exists only for quinoa, allowing for proper annotation and structural analyses to enhance crop improvement. We independently assembled and annotated the variety Viroflay using a short-read strategy (Illumina) and long-read strategy (Pacific Biosciences) to develop a chromosome-level, genetically-anchored assembly for spinach. Scaffold N50 for the Illumina assembly was 389 kb whereas that for Pacific BioSciences was 4.43 Mb, representing 911 Mb (93% of the genome) in 221 scaffolds, 80% of which are anchored and oriented on a sequence-based genetic map, also described within this work. The two assemblies were 99.5% collinear. Independent annotation of the two assemblies with the same comprehensive transcriptome dataset show that the quality of the assembly directly affects the annotation with significantly more genes predicted (26,862 vs 34,878) in the long-read assembly. An in-depth analysis of resistance genes shows a bias in R-gene motifs more typical to monocots. Evolutionary analysis indicates that Spinacia is a paleohexaploid with a whole genome triplication followed by gene rearrangements. Diversity analysis of 75 lines indicate that variation in genes is ample for hypothesis-driven genomic-assisted breeding enabled by this work.