Plant Genetics Research Site Logo
ARS Home About Us Helptop nav spacerContact Us En Espanoltop nav spacer
Printable VersionPrintable Version     E-mail this pageE-mail this page
Agricultural Research Service United States Department of Agriculture
Search
  Advanced Search
 
Programs and Projects
Subjects of Investigation
Diverse Maize Research
 

Research Project: MODIFICATION OF SOYBEAN SEED COMPOSITION FOR FOOD, FEED, AND OTHER INDUSTRIAL USES

Location: Plant Genetics Research

Title: Shape-to-String Mapping: A Novel Approach to Clustering Time-Index Biomics Data

Authors
item Antoine, Wesner - UNIVERSITY OF MISSOURI
item Miernyk, Jan

Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: July 14, 2007
Publication Date: July 16, 2008
Citation: Antoine, W., Miernyk, J.A. 2008. Shape-to-String Mapping: A Novel Approach to Clustering Time-Index Biomics Data. Bioinformatics. 8:139-153.

Interpretive Summary: An enormous amount of information is generated during biomics-type profiling studies. It is important to summarize this data in a statistically valid way. A key step in reducing the size of data-sets is called clustering. During the process of clustering, all of the patterns that display a similar change in size with the change in time are combined into a single cluster. After clustering, all of the information can be treated as a single pattern. A unique multi-step statistical transformation method was developed for cluster analysis. The main innovation of this method is the application of a root method based upon that used for comparison of primary amino acid sequence data. The method can be used both to describe the relationships in a set of data, and to predict how the relationships will vary. This information will be important to researchers in their attempts to understand and interpret the information derived from genome-type profiling studies, and to develop more efficient crop plants through classical genetics or biotechnology.

Technical Abstract: Herein we describe a qualitative approach for clustering time-index biomics data. The data are transformed into angles from the intensity-ratios between adjacent time-points. A code is used to map a qualitative representation of the numerical time-index data which captures the features in the data that define the shape of the pattern expression as a function of time. The problem of clustering time-index biomics data is then either solved directly or reduced to a problem similar to the well-studied task of clustering protein sequence data. For datasets with few time points, the words derived from the transformation are adequate to define clusters. Dissimilarities between the newly defined objects can be estimated, and the distance matrix can be used for further clustering. The results from transcript profiling of developing soybean embryo have been used to illustrate the utility of the method. Comparative mapping of the intensity-ratios and the angles by multidimensional scaling and Procrustes analysis revealed otherwise cryptic information within the data. The Euclidian distance matrices were calculated from the words and corresponding gene list using the PHYLogeny Inference Package (PHYLIP) algorithms and the Point of Accepted Mutation (PAM) scores matrix to compare the effectiveness of the code in clustering the data.

   

 
Project Team
Miernyk, Jan
Oliver, Melvin - Mel
Bilyeu, Kristin
Krishnan, Hari
 
Publications
   Publications
 
Related National Programs
  Plant Biological and Molecular Processes (302)
 
 
Last Modified: 05/23/2013
ARS Home | USDA.gov | Site Map | Policies and Links 
FOIA | Accessibility Statement | Privacy Policy | Nondiscrimination Statement | Information Quality | USA.gov | White House