Submitted to: BMC Genomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 9/19/2007
Publication Date: 9/19/2007
Citation: Schlueter, J., Lin, J., Schlueter, S., Vasylenko-Sanders, I., Deshpandem, S., Yi, J., O'Bleness, M., Roe, B., Nelson, R., Scheffler, B.E., Jackson, S., Shoemaker, R.C. 2007. Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing. Biomed Central (BMC) Genomics. 8:330.
Interpretive Summary: The genome of soybean is currently being decoded through a cooperative arrangement between USDA and DOE. The approach being used requires that millions of small segments of DNA sequence be reassembled into psuedochromosomes. The soybean genome has been duplicated at least twice. This creates some regions that very closely resemble other regions. This may cause problems when putting the genome back together. In this study the authors identified and decoded duplicated regions of the genome. They compared the organization of the duplicated regions, mixed the sequences together, and then assembled the regions. They concluded that the soybean genome is a mosaic of organizational structures. They also concluded that even highly similar regions can still be assembled without serious confusion. This information is important to geneticists and informaticists who are studying the evolution of the hereditary material of important crops such as soybean and who will be assembling the whole genome sequence.
Technical Abstract: Seventeen BACs representing ~ 2.03 Mb were sequenced as representative homeologous regions from the paleopolyploid soybean (Glycine max (L.) Merr.) genome. Sequence identity comparisons between homeologous BACs shows that the soybean genome is a mosaic of retained paleopolyploid regions. some having high sequence homeology on a gene for gene basis while other regions share only the gene that identified the duplicate BACs. In light of the whole-genome shotgun sequencing effort in soybean, the effects of retained paleopolyploidy in sequence assemblies were investigated. While duplicate BACs with high sequence similarity (upwards of 95%) do not cross-assemble, tandem duplications of greater than 95% similarity cause assembly errors. A preliminary assembly of 80,000 sequence traces from the JGI-DOE sequencing effort show that many repetitive sequences within the soybean genome have not been fully characterized and will need to be screened for during a whole-genome assembly.