Submitted to: Advanced Studies in Biology
Publication Type: Peer reviewed journal
Publication Acceptance Date: 1/5/2009
Publication Date: 1/23/2009
Citation: Antoine, W., Miernyk, J.A. 2009. A Multidimensional Scaling-Based Model for Analysis of Time-Index Biomics Data. Advanced Studies in Biology. 1:43-59. Interpretive Summary: An enormous amount of information is generated during genomics-type profiling studies. It is important to summarize this data in a meaningful way. A key step in reducing the size of data-sets is called clustering. During the process of clustering, all of the patterns that display a similar change in size with the change in time are combined into a single cluster. After clustering, all of the information can be treated as a single pattern. A unique three-step statistical method was developed for clustering. The main innovation of this method is that it allows the user to determine the ideal number of clusters for any given data set. The method can be used both to describe the relationships in a set of data, and to predict how the relationships will vary. This information will be important to researchers in their attempts to understand and interpret the information derived from genome-type profiling studies, and to develop more efficient crop plants through classical genetics or biotechnology.
Technical Abstract: An enormous amount of data is generated during time-index biomics profiling studies, and it is important to summarize this data in a biologically meaningful way. In this perspective, pattern detection techniques and modeling are important tools. We propose use of the multidimensional scaling algorithm to detect consensus patterns within clustered time-index data. Retrieved patterns can be used as a reference to describe the individual gene expressions through a model. The model describes expression by proposing a profile match index and a level parameter for each individual pattern. A publicly available transcript profiling dataset from developing soybean embryos was used to illustrate the model. After describing a pattern for each of 11 clusters, the parameters of the model and the goodness of fit were estimated for each gene. The legitimacy of the consensus pattern for each cluster was assessed relative to the badness of fit. The profile match index and the individual fit statistics served to validate the membership of a gene to a cluster, and successfully isolated out-groups within clusters. A description of the gene expression network manifest during physiological change could be established based on the patterns detected using the MDS procedure to consider each cluster as a building block. The described method allows extraction of meaningful information from any time-index profiling data.