|Zeng, D - UNIV. OF NORTH CAROLINA|
|Lin, D - UNIV. OF NORTH CAROLINA|
|Avery, C - UNIV. OF NORTH CAROLINA|
|North, K - UNIV. OF NORTH CAROLINA|
Submitted to: Biostatistics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: February 7, 2006
Publication Date: February 24, 2006
Citation: Zeng, D., Lin, D.Y., Avery, C.L., North, K.E., Bray, M.S. 2006. Efficient semiparametric estimation of haplotype-disease associations in case-cohort and nested case-control studies. Biostatistics 7(3):486-502. Interpretive Summary: Estimating the effects of multiple sites of DNA sequence variation on the age of onset of a disease is an important step toward the discovery of genes that influence complex human diseases. A "haplotype" is a specific set of DNA sequence variations that lie close to each other on the same chromosome. This paper is about the development of statistical methods designed to determine the effects of a haplotype on a disease outcome. An analysis of heart disease is provided as an example.
Technical Abstract: Estimating the effects of haplotypes on the age of onset of a disease is an important step toward the discovery of genes that influence complex human diseases. A haplotype is a specific sequence of nucleotides on the same chromosome of an individual and can only be measured indirectly through the genotype. We consider cohort studies which collect genotype data on a subset of cohort members through case-cohort or nested case-control sampling. We formulate the effects of haplotypes and possibly time-varying environmental variables on the age of onset through a broad class of semiparametric regression models. We construct appropriate nonparametric likelihoods, which involve both finite- and infinite-dimensional parameters. The corresponding nonparametric maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Consistent variance-covariance estimators are provided, and efficient and reliable numerical algorithms are developed. Simulation studies demonstrate that the asymptotic approximations are accurate in practical settings and that case-cohort and nested case-control designs are highly cost-effective. An application to a major cardiovascular study is provided.