Skip to main content
ARS Home » Northeast Area » Newark, Delaware » Beneficial Insects Introduction Research Unit » Research » Publications at this Location » Publication #337263

Research Project: Host Specificity and Systematics of Insect Biological Control Agents

Location: Beneficial Insects Introduction Research Unit

Title: Phased genotyping-by-sequencing enhances analysis of genetic diversity and reveals divergent copy number variants in maize

Author
item Manching, Heather - University Of Delaware
item Sengupta, Subhajit - University Of Chicago
item Hopper, Keith
item Polson, Shawn - University Of Delaware
item Ji, Yuan - University Of Chicago
item Wisser, Randall - University Of Delaware

Submitted to: Genes, Genomes, Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 4/28/2017
Publication Date: 5/19/2017
Citation: Manching, H., Sengupta, S., Hopper, K.R., Polson, S.W., Ji, Y., Wisser, R.J. 2017. Phased genotyping-by-sequencing enhances analysis of genetic diversity and reveals divergent copy number variants in maize. Genes, Genomes, Genetics. 7(7):2161-2170. doi: 10.1534/g3.117.042036.

Interpretive Summary: High-throughput sequencing of partial genomes has ushered in an era of genotyping-by-sequencing (GBS), where genome-wide data can be obtained for nearly any species. However, methods are needed for genotyping large samples taken from heterogeneous populations. We resolved these issues while developing and validating a new, comprehensive platform for GBS. This study supports the notion that GBS can be tailored to particular aims, and using maize, our results indicate that large samples of unknown pedigree can be genotyped to obtain complete and accurate GBS data. Optimizing size selection to sequence shared loci and using simple filters to exclude repetitive loci and genotypes with low coverage, a GBS method was established that produces high call rates per marker (>85%) with accuracy exceeding 99%. Furthermore, a new tool for scoring phased genotypes was developed. Phased genotypes in maize revealed the existence of inaccurate genotypes due to divergent copy number variants unobservable in the underlying data.

Technical Abstract: High-throughput sequencing of reduced representation genomic libraries has ushered in an era of genotyping-by-sequencing (GBS), where genome-wide genotype data can be obtained for nearly any species. However, there remains a need for imputation-free GBS methods for genotyping large samples taken from heterogeneous populations of heterozygous individuals. This requires a number of issues encountered with GBS be considered, including the sequencing of non-overlapping sets of loci across multiple GBS libraries, a common missing data problem that results in low call rates for markers per individual, and a tendency for applicability only in inbred line samples or biparental populations. We resolved these issues while developing and validating a new, comprehensive platform for GBS. This study supports the notion that GBS can be tailored to particular aims, and using Zea mays our results indicate that large samples of unknown pedigree can be genotyped to obtain complete and accurate GBS data. Optimizing size selection to sequence shared loci and using simple in silico filters to exclude variant calls at repetitive loci and genotypes with low read depth coverage, a GBS method was established that produces high call rates per marker (>85%) with accuracy exceeding 99.4%. Furthermore, by capitalizing on the sequence-read structure of GBS data (stacks of reads) a new tool for resolving haplotypes and scoring phased genotypes was developed. This reduces the marker dimensionality of the genotype matrix while increasing the informativeness of the data. Phased-GBS in maize revealed the existence of reproducibly inaccurate (apparent accuracy) genotypes that were due to divergent copy number variants unobservable in the underlying single nucleotide polymorphism data.