Location: Plant, Soil and Nutrition Research
Title: Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats Authors
|Wicker, Thomas - UNIV. OF ZURICH|
|Narechania, Apurva - COLD SPRING HARBOR LAB.|
|Sabot, Francois - UNIVERSITE DE PERPIGNAN|
|Stein, Joshua - COLD SPRING HARBOR LAB.|
|Giang, Vu Thi Ha - LEIBNIZ INST OF PLT GEN.|
|Graner, Andreas - LEIBNIZ INST OF PLT GEN.|
|Stein, Nils - LEIBNIZ INST OF PLT GEN.|
Submitted to: Biomed Central (BMC) Genomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: October 31, 2008
Publication Date: October 31, 2008
Citation: Wicker, T., Narechania, A., Sabot, F., Stein, J., Giang, V., Graner, A., Ware, D., Stein, N. 2008. Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats. Biomed Central (BMC) Genomics. 9:518. Interpretive Summary: Using a method that was devised for maize (K-mer frequencies as a universal tool to annotate large repetitive plant genomes), this work presents evidence that k-mer indexing works well for barley and wheat. In addition, here we illustrate that the unbiased survey sequence used to construct the index need not be traditional whole genome shotgun libraries. For Barley, the k-mer index was constructed with DNA reads produced by Solexa sequence, a next generation sequencing technology. Thus, in this study we demonstrate that the cost savings in precluding the need for manual repeat curation can be compounded by the ability to use cheaper sequencing in the generation of survey libraries. We also demonstrate that k-mer frequencies can be used to identify previously unknown repeat sequences that were missed by hand annotation and that excluding repeat sequence using this method makes detection and annotation of genes much more efficient. This study successfully extends the original effort devised in maize to other large plant genomes. The implication is that the method would be useful for any future sequencing project, especially among any highly repetitive plants that may be under consideration for biofuel initiatives. The ability to create cheap, abbreviated libraries of partial genome equivalents using any of the new next-generation sequencing technologies gives the method added flexibility and makes it a cost-effective alternative to traditional repeat detection.
Technical Abstract: Background: Barley has one of the largest and most complex genomes of all economically important food crops. The rise of new short read sequencing technologies such as Illumina/Solexa permits such large genomes to be effectively sampled at relatively low costs. An MDR (Mathematically Defined Repeat) index can be generated from such short genomic sequence reads and can be exploited to map repetitive regions in genomic sequences. Results: We have generated 574 Mbp of Illumina/Solexa sequences from barley total genomic DNA, representing about 10% of a genome equivalent. From these sequences we generated an MDR index which was then used to graphically and qualitatively characterise repetitive regions in genomic sequences from barley. Comparison of the MDR plots with expert repeat annotation revealed a strong regional overlap between the two methods but still helped identify dozens of novel repeat sequences which were not recognised by hand-annotation. The MDR data was also used to identify gene-containing regions by exclusion of repetitive sequences in eight de-novo sequenced BAC clones. In exactly half of the identified candidate gene islands indeed gene sequences could be identified. By mapping the MDR on genomic sequences from (nota bene closely related) wheat Triticum monococcum, we also showed that MDR data is only of limited use across species boundaries as only a fraction of the repetitive sequences were recognised. Conclusion: Combining whole-genome Illumina/Solexa sequencing with MDR analysis of barley proved to be practically as efficient in repeat identification as manual expert annotation of genomic sequences. Circumventing the labour-intensive step of producing a specific repeat library, it provides an elegant and efficient tool for the identification of repetitive and low-copy (i.e potentially gene-containing sequences) regions in uncharacterised genomic sequences. The restriction that a particular MDR index can not be used across species is compensated by the low costs of Illumina/Solexa sequencing which makes any chosen genome accessible for whole-genome sequence sampling.