Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Research Project #425848

Research Project: Enhancing Plant Genome Function Maps Through Genomic, Genetic, Computational and Collaborative Research

Location: Plant, Soil and Nutrition Research

2014 Annual Report

1: Apply computational, genomic, genetic and/or systems biology approaches to develop new models for plant genome structure and organization that advance our understanding of plant evolution and diversity. 1.1: Establish an integrated reference genome resource for plant genomes. 1.2: Analysis and visualization of genotypic, epigenomic, and functionally phenotypic diversity. 1.3: Comparative genomics: analysis of plant genomes (stewardship of reference resource) and visualization informed by evolutionary histories. 2: Analyze and develop genome level regulatory network models that focus on and integrate the processes underlying plant development and responses to environmental change. 2.1: Develop genome-wide functional networks for the model plant genome Arabidopsis. 2.2: Crop GRNs to support functional prediction for agriculturally relevant phenotypes. 3: Collaborate, develop and implement new standards for the management and analysis of plant genomic, genetic and phenotypic information to facilitate integration and interoperability between biological databases. 4: Facilitate the use of genomic and genetic data, information, and tools for germplasm improvement, thus empowering ARS scientists and partners to use a new generation of computational tools and resources.

We propose to leverage emerging and standard computational and experimental approaches, building on existing and newly developed resources to support stewardship of plant genome reference sequences, genome annotations and gene networks. This will support development of a common standard platform for comparative genomic analysis and visualization. The enriched genome annotations will include controlled vocabularies to describe metadata and primary data associated with comparative phylogenomics, epigenetics, and population-based phenotypes. The proposed research in gene networks is directed at the development and validation of gene regulatory networks (GRN). The network view of the underlying molecular processes will enhance the fundamental biological understanding of development and abiotic stress responses and its relationship to agronomic traits. The computationally predicted and experimentally verified sub-networks combined with the prioritized regulatory gene targets will provide focal points for further research at gene-by-gene level. They will be integrated with the suite of genetic resources obtained from Objective 1, including SNPs and orthology mapping, and thus will be a resource for breeders and researchers engaged in molecular breeding approaches and segregation analysis. Genome-wide network reconstructions will be quite useful in quantifying and characterizing the genotype-to-phenotype relationships. We propose to leverage and build upon existing infrastructure to manage and analyze plant genomic, genetic, and phenotypic data. The resources will focus on the delivery of anticipated products from Objectives 1 and 2 with a focus on plant datasets, but much of the software will be species-agnostic, making the resources developed from the project usable to a broader audience including animals, insects, and fish relevant to agriculture, human health, and a sustainable environment.

Progress Report
A plants reference genomic sequence can serve as a map or an instruction manual on how to build a plant, describing its development, and a reference manual on how to operate the plant in different environments. But the features described on a map or the depth of instructions in a reference guide or manual can vary in quality. If we are looking at an instruction manual to assembly a bike this will usually start with a parts list and a set of directions on how the parts go together. With our colleagues, we have organized more than 30 plant genomes in a standard format to allow us to compare the reference parts (genes) of 33 plants, flies, worms, yeast, and humans. Using this standard format has allowed us to compare the part lists between these different organism, and to organize them to predict which parts are likely to have the same functions. We use this information to tell us which parts are new between organisms, and when the parts began to be used in an evolutionary context. This is important because the appearance of new parts suggests the organism has new modifications or adaptions, many of which may have been environmentally driven. For instance, looking at differences between parts in a plant and humans, will tell us which genes are likely to have plant specific functions. We have used these comparisons to tell which parts are similar and which parts are different. We can also use these lists to tell us which parts may be missing. The missing parts can be an artifact of the incompleteness or errors in the underlying reference or a result of a part that no longer provide a fitness or performance advantage. Using a bike analogy, a racing bike used on a track may no longer have a bell or light, which is advantageous for a bike used in the city, but increases the weight and profile, leading to a decrease in “performance” or “fitness.” This example leads to a cross or modification of parts that no longer provide fitness benefits. In addition to looking between species, we can also look at differences within species. The differences within species can be limited to single nucleotide change or the loss of gene, both of which could lead to a loss of function. Using these comparisons, we have been able to identify lists of shared, new, missing and modified genes. These lists lay the foundation for a set of features on a map or a parts list of an organism. By comparing these parts list, we can take information about the function known in one organism, and use this to improve the description of the gene in another organism. We can also use these to identify groups of genes that have led to modifications in different organisms and provide fitness advantages. We have used this information to characterize genes that have been gained and lost in different plant lineages. This information provides insight into genes involved in plant development and response to environment, and provide potential candidate genes targets to germplasm improvement. Once we have the parts list, we want to understand how these parts work together. In a plant or animals, not all of the parts are available in a single cell or tissue. We can use the information of when parts or genes are likely to be present together, suggest that they work together. With our colleagues, we used molecular and genetic approaches to identify genes that are responsible for regulating when and where genes are likely to be made. We have also developed preliminary sets of genes that work together in roots and flowers, in corn, sorghum, and the model plant Arabidopsis. Using these resources in the last year, we have identified candidate genes responsible for branching in flowers, which leads to an increase in seed production in sorghum. The information about these genes and the specific changes which can lead to a gain or loss of function can be used to support breeding of improved variety crops like corn, rice, sorghum, and sugarcane, for both production of human and animal feed, and bioenergy objectives. Breakthroughs in imaging and sequencing technologies have led to opportunities to generate reference genome sequences for almost all organisms. It also resulted in massive challenges to manage, analyze, share and draw insights from the thousand-trillion data points that are being generated. “Big Data” in Biology will require a paradigm shift. Data is no longer sparse. Cultural changes will be required to shift resources from the generation to the management and sharing of data. With our colleagues, we developed a software platform to support the storage, analyses of the data plant genome and phenotype data, using national high performance computing resources, and sharing of the data. In addition, we developed training material to support educating emerging and existing scientist. The platform currently has more than 1,000 registered users.

1. Characterization of genes involved in corn ear and tassel development. In order to ensure food and energy security, it will be necessary to increase yield. Grain yield in corn; the most important crop in the US and in many countries throughout the world, is directly related to the development of the ear and tassel on the plant. In the last 3 years ARS researchers at Ithaca, New York have contributed to a project, which has developed a catalog of temporal and spatial, ear and tassel gene expression in corn. We have used this catalogue to uncover several groups of genes involved in grass flower architecture. The insight derived from the analyses, improve our understanding of the genes involved in grass architecture and can be used in corn and other grasses to accelerate germplasm improvement using both marker-assisted breeding and directed engineering approaches.

2. Sorghum genetic variation: improved access and insight. Sorghum is a major climate resilient crop in the U.S. for feed, forage, and starch based biofuel. Improvements to sorghum as a bioenergy feedstock have the potential for rapid impact, through an existing seed industry and farmers who are familiar with sorghum as an annual row crop. In the last two years, there have been several projects that have reported on the characterization of the genetic variation, which can be used to support germplasm improvement through marker assisted breeding or directed engineering approaches, but the usability of the data has been limited due to the difference in how the data was released. In the last year, ARS researchers at Ithaca, New York have coordinated with domestic and international partners on processing and release of genetic variation from 378 sorghum accessions in both machine and human readable standard formats. The standard format allows a person to visualize the genetic variation in context to gene structure and function, providing insight on the functional consequences of the variation. The machine-readable formats support the ability to operate on the data programmatically and support data reuse. Standardization and integration with gene structure and function, increases accessibility, reuse of the data, and increasing the value of the initial investment. Improved accessibility of the sorghum variation will support insights into genome evolution, ease of reuse for genotype and phenotype analysis, and accelerate germplasm improvement through biotechnology and marker assisted breeding.

3. Identification of two genes associated with multiple seeds in sorghum. In order to ensure food and energy security, it will be necessary to increase yield. Sorghum is a major climate resilient crop in the U.S. for feed, forage, and starch based biofuel. Improvements to sorghum as a bioenergy feedstock have the potential for rapid impact, through an existing seed industry and farmers who are familiar with sorghum as an annual row crop. It is also a major crop in Africa; improvements will contribute to economic stability and food security. One approach to address yield in sorghum as well as other grasses is to change the branching patterns in the flower architecture. Using genetic and sequencing approaches, ARS researchers at Ithaca, New York and Lubbock, TX, have identified two genes responsible for the changes in flower which increase the number of seeds from an individual sorghum plant. The genetic materials can be used to accelerate sorghum improvement through marker-assisted breeding. The knowledge about the two genes can be used to accelerate sorghum and other grasses using both marker-assisted breeding and directed engineering approaches.

4. Characterization of core promoter elements in plant genomes. While ARS researchers at Ithaca, New York are starting to build a dictionary of genes, specifically coding sequence within a plant genome, we know very little about the regulatory sequence that contributes to how these plant genes are transcribed. Studies in humans and flies have identified sequence in the promoters that are associated with transcription regulation call core promoter elements (CPEs). In the last two years we have used computational approaches to scan for these elements in the promoters of eight plant genomes. The positions of the CPEs are for the most part found to be conserved with only slight differences across all eight-plant genomes. In addition to the CPEs, DNA free energy profiles were also evaluated and were found to differ between regulatory and non-regulatory genome sequence (genes). Expanding the dictionary of plant regulatory elements, and their configuration (regulatory architecture) in the genome, will provide insights into genome architecture, support molecular breeding objectives and enhance and improve design of synthetic plant promoters.

Review Publications
Olson, A.J., Klein, R.R., Dugas, D.V., Lu, Z., Regulski, M., Klein, P., Ware, D. 2014. Expanding and vetting Sorghum bicolor gene annotations through transcriptome and methylome sequencing. The Plant Genome. 7(2). Available:
Campbell, M.S., Law, M., Holt, C., Stein, J., Moghe, G., Hunagel, D., Lei, J., Achawanantakun, R., Lawrence, C.J., Ware, D., Shiu, S., Childs, K.L., Sun, Y., Jiang, N., Yandell, M. 2014. MAKER-P: a tool-kit for the creation, management, and quality control of plant genome annotations. Plant Physiology. 164(2):513-24.
Myles, S., Boyko, A.R., Owens, C.L., Brown, P.J., Grassi, F., Aradhya, M.K., Prins, B.H., Reynolds, A., Chia, J., Ware, D., Bustamante, C.D., Buckler, E.S. 2011. Genetic structure and domestication history of the grape. Proceedings of the National Academy of Sciences. 108:3530-3535.
Eveland, A.L., Goldschmidt, A., Pautler, M., Morohashi, K., Liseron-Monfils, C., Lewis, M.W., Kumari, S., Yang, F., Hiraga, S., Unger-Wallace, E., Olson, A., Stanfield, S., Hake, S.C., Schmidt, R.J., Vollbrecht, E., Grotewold, E., Ware, D., Jackson, D. 2013. Regulatory modules controlling maize inflorescence architecture. Genome Research. 24:431-443.