Classification and biomarker selection in lower-grade glioma using robust sparse logistic regression applied to RNA-seq data

Main Article Content

João Carrilho
Marta B. Lopes

Abstract

Effective diagnosis and treatment in cancer is a barrier for the development of personalized medicine, mostly due to tumor heterogeneity. In the particular case of gliomas, highly heterogeneous brain tumors at the histological, cellular and molecular levels, and exhibiting poor prognosis, the mechanisms behind tumor heterogeneity and progression remain poorly understood. The recent advances in biomedical high-throughput technologies have allowed the generation of large amounts of molecular information from the patients that combined with statistical and machine learning techniques can be used for the definition of glioma subtypes and targeted therapies, an invaluable contribution to disease understanding and effective management.
In this work sparse and robust sparse logistic regression models with the elastic net penalty were applied to glioma RNA-seq data from The Cancer Genome Atlas (TCGA), to identify relevant transcriptomic features in the separation between lower-grade glioma (LGG) subtypes and identify putative outlying observations. In general, all classification models yielded good accuracies, selecting different sets of genes. Among the genes selected by the models, TXNDC12, TOMM20, PKIA, CARD8 and TAF12 have been reported as genes with relevant role in glioma development and progression. This highlights the suitability of the present approach to disclose relevant genes and fosters the biological validation of non-reported genes.

Article Details

How to Cite
Carrilho, J., & Lopes, M. B. (2022). Classification and biomarker selection in lower-grade glioma using robust sparse logistic regression applied to RNA-seq data. Brazilian Journal of Biometrics, 40(4), 371–381. https://doi.org/10.28951/bjb.v40i4.634
Section
Articles