Author
KUNDE-RAMAMOORTHY, GOVINDARAJAN - Children'S Nutrition Research Center (CNRC) | |
COARFA, CRISTIAN - Baylor College Of Medicine | |
LARITSKY, ELEONORA - Children'S Nutrition Research Center (CNRC) | |
KESSLER, NOAH - University Of Houston | |
HARRIS, R - Baylor College Of Medicine | |
XU, MINGCHU - Baylor College Of Medicine | |
CHEN, RUI - Baylor College Of Medicine | |
SHEN, LANLAN - Children'S Nutrition Research Center (CNRC) | |
MILOSAVLJEVIC, ALEKSANDAR - Baylor College Of Medicine | |
WATERLAND, ROBERT - Children'S Nutrition Research Center (CNRC) |
Submitted to: Nucleic Acids Research
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 11/29/2013 Publication Date: 1/3/2014 Citation: Kunde-Ramamoorthy, G., Coarfa, C., Laritsky, E., Kessler, N.J., Harris, R.A., Xu, M., Chen, R., Shen, L., Milosavljevic, A., Waterland, R.A. 2014. Comparison and quantitative verification of mapping algorithms for whole genome bisulfite sequencing. Nucleic Acids Research. 42(6):e43. Interpretive Summary: The new gold standard for genome-wide analysis of DNA methylation (a biological process where a methyl molecular is added to the cytosine DNA nucleotides) combines chemical conversion of genomic DNA with next-generation sequencing (called Bisulfite-seq). Chemical conversion of DNA by bisulfite converts all unmethylated cytosines to thymines (i.e. C to T in the DNA sequence). This dramatically reduces the complexity of the sequence, making the mapping of sequence reads very difficult. Various programs to map the sequencing reads to the genome have been developed, but these had not previously been objectively evaluated. We used Bisulfite-seq to generate 4 complete sets of human DNA methylation marks (methylomes) and compared 5 different mapping algorithms. We used an independent method of measuring DNA methylation (bisulfite pyrosequencing) to validate the quantitative methylation estimates of the different mappers. Our results show important differences in the performance of the mappers, providing useful guidance to help investigators choose the ideal mapper for their Bisulfite-seq studies. Technical Abstract: Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitative accuracy has been reported. We sequenced bisulfite-converted DNA from two tissues from each of two healthy human adults and systematically compared five widely used Bisulfite-seq mapping algorithms: Bismark, BSMAP, Pash, BatMeth and BS Seeker. We evaluated their computational speed and genomic coverage and verified their percentage methylation estimates. With the exception of BatMeth, all mappers covered greater than 70% of CpG sites genome-wide and yielded highly concordant estimates of percentage methylation (r2 = 0.95). Fourfold variation in mapping time was found between BSMAP (fastest) and Pash (slowest). In each library, 8–12% of genomic regions covered by Bismark and Pash were not covered by BSMAP. An experiment using simulated reads confirmed that Pash has an exceptional ability to uniquely map reads in genomic regions of structural variation. Independent verification by bisulfite pyrosequencing generally confirmed the percentage methylation estimates by the mappers. Of these algorithms, Bismark provides an attractive combination of processing speed, genomic coverage and quantitative accuracy, whereas Pash offers considerably higher genomic coverage. |