2010 Annual Report
1a.Objectives (from AD-416)
Objective 1: Implement web-accessible computational and visualization tools, including semantic web technologies, to enable comparison and transfer of agronomically important genetic information among soybean and other legume and related dicot species. Objective 2: Continue to curate and enhance SoyBase and the Soybean Breeder’s Toolbox (SBT), more fully integrating the genetic, phenotypic, physical map, and whole-genome sequence data from soybean and other legumes. Objective 3: Coordinate the quality assembly and annotation of the soybean whole-genome sequence.
1b.Approach (from AD-416)
Soybean ontologies will be prepared to describe selected data types from the Soybean Breeders Toolbox (SBT). Data exchange descriptions (“RDF graphs”) will be developed to allow integration of the data into the Virtual Plant Information Network (VPIN). To let researchers transparently find, retrieve, or apply analytical methods to data contained in the SBT, web services will be developed to make these services accessible through a single portal. Soybase and the SBT will be maintained and updated with new data classes as needed. The Williams 82 physical map and the soybean whole genome sequence, new sequence-based data types in SoyBase, and comparative data from other legume will be integrated and displayed. The project works closely with DOE-JGI to enhance the quality of the soybean whole-genome sequence assembly. This will include analysis of sequence-based genetic markers, comparative analyses with other genomes, and various informatic analyses.
The team implemented web-accessible computational and visualization tools, including semantic web technologies, to enable comparison and transfer of important genetic information between soybean and other legumes, as well as among related species. Working with National Center for Genome Resources (Legume Information System) collaborators we evaluated and redesigned the web interfaces, web tools, and databases at http://comparative-legumes.org to better accommodate new legume genomic data sets, including recently completed genome sequences and data from high throughput sequencing and mapping technologies. We developed a new genomic sequence search tool that delivers users to web interfaces at appropriate locations on four sequenced or partly-sequenced legume genomes, and to other specialized genomic and bioinformatic web resources when applicable. We updated the on-line genetic map viewer software (CMap), and added 17 genetic maps from nine legume species and developed views that show how genetic features (genetic markers, specifically) are related among nine legume species and the genomes of three species with sequenced genomes: soybean, Medicago truncatula and Lotus japonicus. Sub-objective 1.A of the parent project has been fully met through the annotation of existing data with soybean trait ontology terms and ID numbers. We also added references to external ontology terms to Soybean Breeders Toolbox (SBT) data. Although Sub-objective 1.B is identified as Not Met, the reason is new work with the iPlant Collaborative. Software is being developed that will complete our objective faster and simpler. The SoyBase team will serve as primary testers of their software. The team continued to curate and enhance SoyBase and the SBT by more fully integrating the genetic, phenotypic, physical map, and whole-genome sequence data from soybean and other legumes. Sub-objective 2.A was fully met by adding molecular markers to the physical map. The implementation of the genome browser for display of the whole genome sequence, implementation of links between various map displays and databases, and design of databases and displays for haplotype data fully met the milestones for Sub-objective 2.B. Sub-objective 3.A was fully met through the integration of genetically mapped markers to further anchor sequence to the genetic map. Database tables were developed and populated with all available marker data. Marker loci have been placed onto sequence map displays. Sub-objective 3.B was also fully met by the incorporation of transposons and other repetitive sequences into a database (SoyTEd). SoyTEdb.org is now incorporated into the SoyBase genome viewer. We also incorporated predicted genes and repetitive elements into genome browser displays. All of the gene models for the initial release of the soybean genome sequence were made available in the SoyBase genome viewer. Database tables have been developed and populated with all publicly available information about the gene models. We began to establish an ongoing system for revising and updating annotations. Forms have been developed and added to web pages to allow users to contribute annotation or other data.
A new database of transposable elements has been developed for soybean. Transposable Elements (TEs) are relatively short DNA sequences that have the ability to move around from place to place in a genome. Although TEs make up around 50% of the nuclear genome of soybean, in most cases they have an unknown effect on a plant phenotype and so it is important to be able to differentiate them from the genes that control important traits in soybean. In collaboration with colleagues at Purdue University, ARS scientists at Ames, IA identified and characterized over 36,000 TEs in the soybean nuclear DNA. Extensive biological information about these TEs was collected and made available to the research community through a comprehensive web site, SoyTEdb.org. This database is critical to the annotation of the whole genome sequence.
Plant Trait Ontology terms were associated with 1,423 SoyBase QTL records. A recurring problem in science is the inability of researchers to interpret and interrelate findings because of the variations in terminology used among research organisms and even among laboratories. The incorporation of these terms into the record of each QTL will allow researchers in any plant species to recognize and retrieve soybean genetic data that might help their analyses. The incorporation of the Plant Trait Ontology terms will also make QTL data more accessible to automated (programs) searches for plant data. This will greatly improve the interrelatedness of plant genome databases.
Gene sequences have been linked to biochemical pathways. Correlating genome sequence with plant traits is a grand challenge. ARS scientists at Ames, IA associated ~4200 genes with ~400 enzymatic pathways and made this information available through SoyBase. The SoyCyc metabolic pathway database deployment will allow researchers to identify which soybean metabolic functions are involved in the plants response to various stimuli and stresses. This will allow researchers to identify which pathways and genes to focus on for crop improvement and is a first step toward linking genotype with phenotype.
A Fluorescence In Situ Hybridization System was developed for karyotyping soybean. ARS scientists at the Ames location, with researchers in Missouri, have developed methods for precisely identifying soybean chromosomes using a visible-light microscope. For many applications in genetic research, important chromosomal features or genetic abnormalities can be seen directly under a microscope if the individual chromosomes can be identified. However, in soybean, the chromosomes are small and difficult to distinguish. This work describes a protocol that applies fluorescent labels to uniquely identify the 20 chromosomes of soybean. The work also reports breakages and rearrangements of several chromosomes in comparison with a wild soybean variety. These techniques and findings are important because they provide a method to soybean researchers to directly study soybean chromosomes, and to visually identify chromosomal abnormalities that may prevent fertile crosses between some soybean varieties.
High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the Soybean Whole Genome Sequence. ARS scientists at the Ames and Beltsville locations designed more than 7,100 genetic markers, and mapped 1,790 of these to assist in the assembly of the soybean whole genome sequence. The method used a novel approach of selective re-sequencing of the wild progenitor of soybean and identification of mappable variants between the wild and cultivated lines. The method will be useful for other mapping and sequencing projects, and the results in soybean helped produce a high-quality full genome assembly in soybean. The genome sequence, in turn, will be highly valuable to plant breeders and researchers, helping to speed the development of new soybean lines.
The Soybean paleopolyploid genome sequence has been added to SoyBase. The DNA sequence for soybean was released in 2010. It is one of the largest and most complex genome decoded as of today. The DNA sequence has been incorporated into SoyBase with extensive 'under the hood' links to the information that has been collected on soybean traits over the last 30 years (~ 85 distinct mapped traits). Thus, it is now possible for plant breeders to begin to understand at a molecular level the phenotypic information they have used for years in plant breeding. This is an important addition to the knowledge about soybean and, as has been the case for research on human diseases, will be invaluable as soybean scientists work to develop improved varieties.
Grant, D.M., Nelson, R., Cannon, S.B., Shoemaker, R.C. 2009. SoyBase, The USDA-ARS Soybean Genetics and Genomics Database. Nucleic Acids Research. Doi: 10.1093/nar/gkp798.
Schmutz, J., Cannon, S.B., Schlueter, J., Ma, J., Hyten, D.L., Song, Q., Mitros, T., Nelson, W., May, G.D., Gill, N., Peto, M.F., Shu, S., Goodstein, D., Thelen, J.J., Cheng, J., Sakurai, T., Umezawa, T., Shinozaki, K., Du, J., Bhattacharyya, M., Sandhu, D., Grant, D.M., Joshi, T., Libault, M., Zhang, X., Hguyen, H., Valliyodan, B., Xu, D., Futrell-Griggs, M., Abernathy, B., Hellsten, U., Berry, K., Grimwood, J., Yu, Y., Wing, R.A., Cregan, P.B., Stacey, G., Specht, J., Rokhsar, D., Shoemaker, R.C., Jackson, S. 2010. Genome Sequence of the Paleopolyploid Soybean (Glycine max (L.) Merr.). Nature. 463:178-183.
Nelson, R., Avraham, S., Shoemaker, R.C., May, G., Ware, D., Gessler, D.D. 2009. Applications and Methods Utilizing the Simple Semantic Web Architecture and Protocol (SSWAP) for Bioinformatics Resource Discovery and Disparate Data and Service Integration. BioMed Central (BMC) BioData Mining. 10:309.
Cannon, S.B., May, G.D., Jackson, S.A. 2009. Update on Comparative Genomics of Legumes. Plant Physiology. 151:970-977.
Singer, S.R., Maki, S.L., Farmer, A.D., Ilut, D., May, G.D., Cannon, S.B., Doyle, J.J. 2009. Beyond the Papilionoids – What can We Learn from Chamaecrista? Plant Physiology. 151:1041-1047.