Submitted to: Epidemiology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/10/2007
Publication Date: 1/1/2008
Citation: Bureau, A., Diallo, M.S., Ordovas, J.M., Cupples, L.A. 2008. Estimating interaction between genetic and environmental risk factors efficiency of sampling designs within a cohort. Epidemiology. 19(1):83-93. Interpretive Summary: Large prospective cohort studies are needed to identify new biochemical and environmental disease risk factors. As more emphasis is being placed on the importance of genetic factors more studies are being exploited to study genetic markers of disease. Moreover, it is recognized that the whole picture can be obtained from the study of gene-environment interactions. However, these studies are very costly in large populations and being able to select a representative subgroup that can capture all the needed information will result in substantial savings. We tested different models and sampling designs to achieve this goal using the Framingham Heart Study and our previously published interaction between smoking, the apolipoprotein E gene and coronary heart disease. We demonstrate that is possible to achieve an efficiency close to that of the full cohort for estimating the interaction effect with a case-control study containing fewer than half the subjects of the entire cohort.
Technical Abstract: Large prospective cohorts originally assembled to study environmental risk factors are increasingly exploited to study gene-environment interactions. Given the cost of genetic studies in large numbers of subjects, being able to select a sub-sample for genotyping that contains most of the information from the cohort would lead to substantial savings. We consider nested casecontrol, sub-cohort and case-cohort sampling designs with and without stratification and compare their efficiency relative to the entire cohort for estimating the effects of genetic and environmental risk factors and their interactions. Asymptotic calculations show that the casecohort design achieves the highest relative efficiency among the designs considered over a range of scenarios of the relationships between genes, environmental exposure and disease status. We confirmed these asymptotic results by simulation of the sampling designs within the Framingham Offspring Study, using the interaction between Apolipoprotein E and smoking on the risk of coronary heart disease as an application. It was possible to achieve an efficiency close to that of the full cohort for estimating the interaction effect with a case-cohort or a nested case-control sample containing fewer than half the subjects of the cohort.