2010 Annual Report
1a.Objectives (from AD-416)
Objective 1: Integrate new maize genetic and genomic data into the database.
Objective 2: Provide community support services, such as lending help to the community of maize researchers with respect to developing and publicizing a set of guidelines for researchers to follow to ensure that their data can be made available through MaizeGDB; coordinating annual meetings; and conducting elections and surveys.
1b.Approach (from AD-416)
Data integration: To best leverage the cooperative spirit of the maize community, we will encourage the use of a set of Community Curation Tools to enable researchers to deposit their own small datasets into the database directly. To reduce secondary curation of data, we will generate standards for data deposition and define file formats for automated inputs of large datasets and will work in concert with maize researchers as they devise methods for initial data storage so that the data transition to MaizeGDB is simplified. Shift to a sequence-centric paradigm: To allow researchers to visualize a gene within its genomic context and to visualize gene products within the context of relevant metabolic pathways annotated with ontology terms, we will develop new views of the data. We will link sequence data to relevant datasets, especially the centrally important maps such as (1) IBM2, (2) its neighbors, and (3) the new maize diversity map. We also will incorporate a genome browser into the MaizeGDB product to create a view that includes all major genome assemblies and predicted gene structures and displays the official maize genome annotation. Community coordination: We will conduct critical maize genetics community functions including coodinating and conducting annual meetings, elections, and surveys and preparing the Maize Newsletter.
Over the course of Fiscal Year 2010, the Maize Genomic Database (MaizeGDB) team worked with the Maize Genome Sequencing Consortium personnel to make information about the project to sequence maize inbred line, B73, accessible to researchers. Project personnel added significant data (including but not limited to reference maps, map scores, insertional mutant, locus, gene model, and sequence information for all subspecies of maize) to the database. Releases of the MaizeGDB Genome Browser based upon the assembled genome (B73 RefGen_v1 and now RefGen_v2) were made available. These sequence resources serve as the centerpiece for MaizeGDB’s now completed transition to a sequence-centric resource. The Genome Browser’s associated Locus Lookup Tool (which allows researchers to identify regions of the genome where genes of interest may lie based upon physical and genetic map data) was adapted to the assembled genomes. Members of the MaizeGDB team assisted researchers at Mexico’s Center for Research and Advanced Studies of the National Polytechnic Institute (in Spanish: Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional or simply CINVESTAV) who sequenced Palomero Toluqueno maize by making the genome sequence available via a Web interface mechanism as well as by formatting and depositing the genomic sequences in GenBank. A main objective is to fully integrate the genome sequence with other information needed by plant breeders and maize researchers. Our goal is to integrate all new genetic maps and empirically confirmed documentation with linking information to the sequence and other sequence-based databases. At this time, all major published recombination-based genetic maps have been brought into the MaizeGDB resource. We have designed and deployed a data management and display system that will coordinate integration of individual investigator data with a deluge of next generation sequencing data for the main United States maize lines. We continue to develop and use controlled vocabularies and ontologies to describe developmental, growth, and anatomic aspects of visible phenotype trait variation and also, new this year, to apply these to molecular gene expression data. We have recently begun to collaborate with researchers at CIMMYT (International Maize and Wheat Improvement Center), Mexico, towards an international agronomic traits controlled vocabulary for maize: the Maize Crop Ontology. In addition, groundwork was laid for creating a high-availability infrastructure for MaizeGDB. Work carried out by the MaizeGDB team has resulted in improved communication among maize researchers worldwide, increased ability to document the results of experiments, and increased availability of information relative to high impact research. Collaborations are being monitored by meetings, phone calls, and e-mail communications.
Breeding data from CIMMYT scheduled for transfer to Maize Genomic Database (MaizeGDB). Data collected by CIMMYT (International Maize and Wheat Improvement Center, Mexico) researchers relative to maize breeding have previously not been curated into MaizeGDB in a systematic fashion. Through a new collaboration with CIMMYT that was made possible by the United States Agency for International Development, data transfer to MaizeGDB has been initiated. The data to be shared relate to studies on the inheritance of traits in various tropical and other germplasm, with special emphasis on tolerance to drought.
Maize Genomic Database (MaizeGDB) provides access to B73 sequencing project information and materials. Maize researchers need access to information describing the status of the project to sequence the maize inbred line, B73, in order to plan and conduct sequence-based research. The Maize Genome Sequencing Consortium’s reports, PowerPoint presentations, and lists of materials utilized in the sequencing efforts were posted and kept up-to-date at MaizeGDB (see http://www.maizegdb.org front page as well as http://www.maizegdb.org/sequencing_project.php). Outreach tutorials describing how the genome was sequenced as well as mechanisms of accessing genomic sequence data were created and disseminated. Maize researchers can access descriptions of the sequencing effort as well as maize sequences via MaizeGDB, thus enabling them to utilize the maize genome sequence alongside other genetic and genomic information to further their research. In the words of one stakeholder, "The community recognizes that MaizeGDB is responsible for making the B73 genome sequence accessible for our use."
Maize Genomic Database (MaizeGDB’s) Genome Browser versions 2.0 and 3.0 were released. The assembly of the maize inbred line, B73, genome is recalculated periodically, and data that add value to the sequence including locations of insertional mutations and variation among cultivars must be transitioned to newer, improved assemblies as they are created. The assemblies B73 RefGen_v1 and RefGen_v2 were made available via the MaizeGDB resource. Ancillary data generated or made available by the plant genetic databases MaizeSequence.org, PlantGDB, MAGI, PLEXdb, UniformMu, and others were loaded onto the same frame to allow researchers to visualize and interact with the B73 genome sequence (examples of RefGen_v1 and RefGen_v2 at http://gbrowse.maizegdb.org/cgi-bin/gbrowse/maize/ and http://gbrowse.maizegdb.org/cgi-bin/gbrowse/maize_v2/). Researchers can use the maize genome sequence to further their research.
The maize genome sequence linked to information about map probes of proven utility. The maize genome sequence is stored at the National Center for Biotechnology Information as a string of letters with no annotation about known probes useful for modern plant breeders. To add this information, and in collaboration with researchers at the University of Arizona, Tucson, AZ, we aligned consensus genetic maps developed and maintained by ARS researchers at Columbia, MO. The result links the genome sequence to some 15,000 loci that are associated with empirically proven probes. These probes include next generation probes useful for high throughput mapping, and which were designed by a project that involves ARS researchers at Columbia, MO, Ithaca, NY, and Raleigh, NC. The end product is currently viewable at Maize Genomic Database (MaizeGDB). This work will be useful to public and private plant breeders and basic researchers seeking to isolate favorable alleles and candidate genes for biological processes.
Maize Genomic Database (MaizeGDB’s) ancillary POPcorn Project Portal has been improved to allow access to sequence data from a single location. Maize researchers cannot easily leverage all available genetic and genomic data because the online locations of all resources are not easy to find, and the sequence-indexed resources generated by individual projects must be searched independently. In addition, it is often the case that when a project’s funding period ends, the generated data are lost because they are not moved to long-term repositories. These once-funded project sites degrade over time and sometimes disappear entirely. These challenges are overcome in collaboration with the community of maize researchers by launching POPcorn (Project Portal for corn), a needs-driven resource and data pipeline. POPcorn currently makes available (1) a centralized Web-accessible resource to search and browse ongoing maize genomics projects. Over the course of the past year, (2) a single, stand-alone tool that makes use of Web services and minimal data warehousing to enable researchers to carry out sequence searches at one location that return matches for all participating projects’ related resources has been deployed. In the coming year, a set of tools that enable collaborators to migrate their data to MaizeGDB, the long-term model organism database for maize genetic and genomic information, at their projects’ conclusion will be added. POPcorn aids in the identification of the molecular-level phenotypes manifesting as traits that plant breeders select for and will lead to improvements in food, fuel, and nutrition.
Palomero Toluqueno genome sequence made available to maize researchers worldwide. Researchers at CINVESTAV (Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional) sequenced the genome of Palomero Toluqueno maize. However, the data were not deposited in GenBank for a variety of reasons. By request of our Mexican colleagues, personnel at Maize Genomic Database (MaizeGDB) made the Palomero sequence available for sequence searches and wholesale download at http://www.palomerotoluqueno.org while simultaneously reformatting the data. MaizeGDB personnel completed the project by uploading the Palomero genome sequence to GenBank on behalf of the Mexican scientists. Maize researchers worldwide now have access to these data and can make use of the diversity captured in the Palomero genome sequence. Ties between United States and Mexican scientists were strengthened.
5.Significant Activities that Support Special Target Populations
During the summer months of Fiscal Year 2010, three American Indian students were mentored for a program that aims to increase their representation in the sciences. The students mapped maize knobs (dense chromosomal regions) onto the maize genome assembly, prepared samples of meiocytes (cells soon to become gametes) to determine the cytological map positions of knobs in maize lines Black Mandan, Cudu, B73, and Mo17, and compared and contrasted the National Plant Germplasm System’s maize germplasm maintenance procedures with traditional methods to grow, propagate, and conserve maize gemplasm within Navajo communities.
Sen, T.Z., Harper, E.C., Schaeffer, M.L., Andorf, C.M., Seigfried, T.E., Campbell, D.A., Lawrence, C.J. 2010. Choosing a Genome Browser for a Model Organism Database (MOD): Surveying the Maize Community. Database: The Journal of Biological Databases and Curation. doi: 10.1093/database/baq007. p. 1.
Andorf, C.M., Lawrence, C.J., Harper, E.C., Schaeffer, M.L., Campbell, D.A., Sen, T.Z. 2010. The Locus Lookup Tool at MaizeGDB: Identification of Genomic Regions in Maize by Integrating Sequence Information with Physical and Genetic Maps. Bioinformatics. 26(3):434-436.
Sen, T.Z., Andorf, C.M., Schaeffer, M.L., Harper, E.C., Sparks, M., Duvick, J., Brendel, V., Cannon, E., Campbell, D.A., Lawrence, C.J. 2009. MaizeGDB Becomes Sequence-centric. Database: The Journal of Biological Databases and Curation. doi: 10.1093/database/bap020. p. 1.