2011 Annual Report
1a.Objectives (from AD-416)
Objective 1: Integrate new maize genetic and genomic data into the database.
Objective 2: Provide community support services, such as lending help to the community of maize researchers with respect to developing and publicizing a set of guidelines for researchers to follow to ensure that their data can be made available through MaizeGDB; coordinating annual meetings; and conducting elections and surveys.
1b.Approach (from AD-416)
Data integration: To best leverage the cooperative spirit of the maize community, we will encourage the use of a set of Community Curation Tools to enable researchers to deposit their own small datasets into the database directly. To reduce secondary curation of data, we will generate standards for data deposition and define file formats for automated inputs of large datasets and will work in concert with maize researchers as they devise methods for initial data storage so that the data transition to MaizeGDB is simplified. Shift to a sequence-centric paradigm: To allow researchers to visualize a gene within its genomic context and to visualize gene products within the context of relevant metabolic pathways annotated with ontology terms, we will develop new views of the data. We will link sequence data to relevant datasets, especially the centrally important maps such as (1) IBM2, (2) its neighbors, and (3) the new maize diversity map. We also will incorporate a genome browser into the MaizeGDB product to create a view that includes all major genome assemblies and predicted gene structures and displays the official maize genome annotation. Community coordination: We will conduct critical maize genetics community functions including coodinating and conducting annual meetings, elections, and surveys and preparing the Maize Newsletter.
ARS scientists working on the Maize Genomic Database (MaizeGDB) in Ames, IA, Columbia, MO, and Albany, CA worked to improve tools that make the maize genome sequence useful for investigative researchers. Project personnel added significant data (including but not limited to reference maps, map scores, insertional mutant, locus, gene model, and sequence information for all subspecies of maize) to the database. Most notably, the expression of maize genes across 60 tissue types and developmental stages was mapped to the genome sequence in a way that researchers can quickly determine the relative expression of genes across various tissues and treatments. MaizeCyc, another resource that has been developed and deployed, allows the gene set of maize to be mapped to highly curated metabolic pathways to determine how expression of genes contributes to phenotypes. At this time, all major published recombination-based genetic maps have been brought into the MaizeGDB resource. The data management and display system called ZeAlign now coordinates integration of individual investigator data with the deluge of next generation sequencing data and prepares such datasets for automatic upload to the MaizeGDB Genome Browser. We continue to develop and use controlled vocabularies and ontologies to describe developmental, growth, and anatomic aspects of visible phenotype trait variation and also, new this year, to apply these to molecular gene expression data. In addition, personnel have put in place an infrastructure that ensures that MaizeGDB will always be available for researchers' use. Work carried out by the MaizeGDB team has resulted in improved communication among maize researchers worldwide, increased ability to document the results of experiments, and increased availability of information relative to high impact research. Collaborations are being monitored by meetings, phone calls, and e-mail communications.
MaizeCyc: a new tool to understand maize gene function. The function of many genes is more apparent to scientists within the context of the various metabolic pathways in which they function. To determine how changes in a particular gene’s expression might affect plant productivity, knowledge about how that gene product affects the plant’s metabolism is required. ARS researchers in Ames, IA, Cold Spring Harbor, NY, Columbia, MO, and Albany, CA in collaboration with researchers working at Oregon State University created and deployed the MaizeCyc metabolic pathway visualization and analysis tool for researchers' use. Researchers can now search and browse information on genes to learn how their expression affects plant metabolism, growth, and development. This tool will enable researchers to understand how genes may function and test their ideas in both the field and the lab to accelerate crop improvement.
ZeAlign tool allows researchers to align many query sequences to the maize genome simultaneously. Researchers generate large sequence sets, e.g., for gene expression studies, that require bioinformatics analyses to determine which genes are being expressed. No freely available resource allowed researchers to map their large datasets to the genome and subsequently load the results to a genome browser for visualization. In addition, to prepare new whole-genome datasets for inclusion in the Maize Genetics and Genomics Database (MaizeGDB) Genome Browser for public access, researchers had to determine how best to map their sequences to the genome and document their procedures, then make special arrangements to have their data included in the MaizeGDB resource. ARS researchers in Ames, IA, Columbia, MO, and Albany, CA created ZeAlign. The ZeAlign system allows researchers to align many sequences to the maize genome simultaneously and returns to them outputs that are appropriate for immediate visualization. The deployment of ZeAlign has allowed researchers to analyze and visualize their own data more easily and has streamlined data deposition at MaizeGDB, allowing both researchers and the personnel at MaizeGDB to reduce time spent on repetitive and mundane tasks.
Infrastructure improvements for the Maize Genetics and Genomics Database (MaizeGDB) ensure worldwide access even if disaster strikes. MaizeGDB is a website of maize genetic and genomic data. The MaizeGDB website routinely receives more than 2 million hits per month from computers in over 100 countries. Because access to the information at MaizeGDB is necessary for researchers to analyze maize data and to form hypotheses to test as a part of their research toward improving maize, it is important to ensure that access to the MaizeGDB resource is always available. The infrastructure (hardware and software) that underpins the MaizeGDB resource was improved by ARS researchers in Ames, IA to ensure that when any component of the system fails, a backup copy of the resource automatically is engaged and the MaizeGDB resource is back online within a matter of seconds. Researchers are assured instant access to MaizeGDB.
Integration of a maize gene atlas into the Maize Genetics and Genomics Database (MaizeGDB). Rapid access to quality gene expression information is helpful to translate the blueprint of the maize reference genome sequence, whose features include some 30,000 gene models. ARS researchers in Ames, IA, Columbia, MO, and Albany, CA integrated the maize B73 gene expression atlas into MaizeGDB, the central integrated repository for public maize genetic and genomic data. The atlas, published in early 2011 by research groups in Wisconsin and Michigan, represents expression of genes across the maize genome surveyed in all major plant tissues. The detailed expression data and sequences have been integrated with the current version of the maize sequence, and a strategy developed to update for future versions of the genome sequence. Plant breeders and basic researchers exploring the maize genome for candidate genes of agronomic importance will be enabled to visualize the expression of genes across various tissue types. This is a step toward unraveling how the expression of genes defines agronomically important traits.
Yennamalli, R.M., Rader, A.J., Wolt, J.D., Sen, T.Z. 2011. Thermostability in endoglucanases is fold-specific. BMC Structural Biology. 11(10):1-15.
Lawrence, C.J. 2011. MaizeGDB - Past, present, and future. Maydica. 56(1):1-3.
Schaeffer, M.L., Harper, E.C., Gardiner, J.M., Andorf, C.M., Campbell, D.A., Cannon, E.K., Sen, T.Z., Lawrence, C.J. 2011. MaizeGDB: Curation and outreach go hand-in-hand. Database: The Journal of Biological Databases and Curation. 2011:Article bar022. Available: http://database.oxfordjournals.org/content/2011/bar022.long.
Harper, E.C., Schaeffer, M.L., Thistle, J., Gardiner, J., Andorf, C.M., Campbell, D.A., Cannon, E.K., Braun, B.L., Birkett, S., Lawrence, C.J., Sen, T.Z. 2011. The MaizeGDB Genome Browser Tutorial: One example of database outreach to biologists via video. Database: The Journal of Biological Databases and Curation. DOI: 10.1093/database/bar016:1.
Green, J.M., Harnsomburana, J., Schaeffer, M.L., Lawrence, C.J., Shyu, C. 2011. Multi-source and ontology-based retrieval engine for maize mutant phenotypes. Database: The Journal of Biological Databases and Curation. 2011:Article ID bar012. Available: http://database.oxfordjournals.org/content/2011/bar012.