Skip to main content
ARS Home » Plains Area » Houston, Texas » Children's Nutrition Research Center » Research » Publications at this Location » Publication #385720

Research Project: Metabolic and Epigenetic Regulation of Nutritional Metabolism

Location: Children's Nutrition Research Center

Title: Classify multicategory outcome in patients with lung adenocarcinoma using clinical, transcriptomic and clinico-transcriptomic data: Machine learning versus multinomial models

Author
item DENG, FEI - Shanghai Institute Of Technology
item SHEN, LANLAN - Children'S Nutrition Research Center (CNRC)
item WANG, HE - Yale University
item ZHANG, LANJING - Princeton University

Submitted to: American Journal of Cancer Research
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/25/2020
Publication Date: 12/1/2020
Citation: Deng, F., Shen, L., Wang, H., Zhang, L. 2020. Classify multicategory outcome in patients with lung adenocarcinoma using clinical, transcriptomic and clinico-transcriptomic data: Machine learning versus multinomial models. American Journal of Cancer Research. 10(12):4624-4639.

Interpretive Summary: Genomic data have been used in clinical genetic counselling and risk stratification in genetic diseases. But they are rarely used in clinical decision making or predicting treatment outcomes. In this study, we developed a generalizable protocol for integrating and analyzing modern-clinical and multiomic data. Using The Cancer Genome Atlas (TCGA) lung cancer dataset, we showed that transcriptomic data alone appeared sufficient to classify multi-category survival-outcome. Thus, our studies demonstrated that it is important to integrate clinical, genomic and other 'omic data for clinical use. These findings may help identify choosing potential nutritional options as a means of disease prevention.

Technical Abstract: Classification of multicategory survival-outcome is important for precision oncology. Machine learning (ML) algorithms have been used to accurately classify multi-category survival-outcome of some cancer-types, but not yet that of lung adenocarcinoma. Therefore, we compared the performances of 3 ML models (random forests, support vector machine [SVM], multilayer perceptron) and multinomial logistic regression (Mlogit) models for classifying 4-category survival-outcome of lung adenocarcinoma using the TCGA. Mlogit model overall performed similar to SVM and multilayer perceptron models (micro-average area under curve=0.82), while random forests model was inferior. Surprisingly, transcriptomic data alone and clinico-transcriptomic data appeared sufficient to accurately classify the 4-category survival-outcome in these patients, but no models using clinical data alone performed well. Notably, NDUFS5, P2RY2, PRPF18, CCL24, ZNF813, MYL6, FLJ41941, POU5F1B, and SUV420H1 were the top-ranked genes that were associated with alive without disease and inversely linked to other outcomes. Similarly, BDKRB2, TERC, DNAJA3, MRPL15, SLC16A13, CRHBP and ACSBG2 were associated with alive with progression and GAL3ST3, AD2, RAB41, HDC, and PLEKHG1 associated with dead with disease, respectively, while also inversely linked other outcomes. These cross-linked genes may be used for risk-stratification and future treatment development.