Submitted to: Genome
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 4/6/2004
Publication Date: 10/1/2004
Citation: Clough, S.J., Tuteja, J.H., Min, L., Marek, L.F., Shoemaker, R.C., Vodkin, L.O. 2004. Features of A 103-kb, Gene-rich Region In Soybean Include An Inverted Perfect Repeat Cluster of CHS Genes Comprising The i Locus. Genome. 47(5):819-831. Interpretive Summary: Genes control the physical properties of an organism, such as seed color in soybean, and therefore, sequence data provides a wealth of biological information. Black seeded soybean are highly undesirable in food processing industries as it darkens the oil as well as the protein products, whereas consumers prefer lighter products. In this manuscript we describe the sequence information from a sample of the soybean genome that contains a cluster of genes known to affect seed pigmentation. From this sequence, we identified 6 versions of the gene CHS, a gene that controls pigmentation of soybean seed. These 6 genes were found to occur in an unusual structure that could provoke genetic rearrangements or deletions. This is of interest since it helps to clarify the confusing observation in the past that mutations of this region could cause CHS genes to be activated, turning a normal yellow soybean seed to black. A deeper understanding of this mechanism of CHS regulation could lead to control of the occurrence of spontaneous black seeds. In addition to clarifying how the CHS genes are structured, we also identified 5 new genes that had not previously been sequenced from soybean. Two of these genes have no known function but are expressed. This information will be useful to any scientist interested in genetics, gene structure and gene function.
Technical Abstract: The I locus in soybean (Glycine max) corresponds to a region of chalcone synthase (CHS) gene duplications affecting seed pigmentation. We sequenced and annotated BAC clone, 104J7, harboring a dominant ii allele from cultivar Williams 82 to gain insight into the genetic structure of this multigenic region in addition to examining its flanking regions. The 103-kb BAC encompasses a gene-rich region with 11 putatively expressed genes. In addition to six copies of CHS, these genes include: a geranylgeranyl transferase type II beta subunit, a beta-galactosidase, a putative spermine/spermadine synthase, and an unknown but expressed gene. Strikingly, sequencing data revealed that the 10.91-kb CHS1, CHS3, CHS4 cluster is present as a perfect inverted repeat separated by 5.87 kb. Contiguous arrangement of CHS paralogs could provoke folding into multiple secondary structures, hypothesized to induce deletions that have previously been shown to effect CHS expression. BAC104J7 also contains several gene fragments representing a cation/hydrogen exchanger, a 40S ribosomal protein, a CBL interacting protein kinase, and the amino terminus of a subtilisin. Chimeric ESTs were identified that may represent read-through transcription from a flanking truncated gene into a CHS cluster, generating aberrant CHS RNA molecules that could play a role in CHS gene silencing.