Huy Pham
University of Dalat
Pham Quang Huy received his Bachelor’s degree in computer science from the University of Dalat, Vietnam, in 2000, and his Master’s degrees in computer science from the Natural Science University of Ho Chi Minh City, Vietnam, in 2005. He has been a PhD student in Computer Science at the University of Windsor since in area of bioinformatics. His research interests are mainly focused on data-mining, machine learning and pattern recognition, mostly in the fields of closed frequent item-set mining, association rules mining, chemotherapy response prediction and drug-target interaction prediction.
Introduction. We applied a machine learning approach to identify bio-marker genes capable of predicting breast cancer outcomes including disease-free survival, and overall survival at 5 years and long-term, after a combination of the treatments: chemotherapy (CT), hormone therapy (HT), radiation therapy (RT), or no recorded therapy (NONE).
Method. The data from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC 2016), which contains gene expression of about 24500 genes and 1904 patients, was used to learn the classification model in backward elimination manner. First, support vector machine with linear kernel (SVM) is trained on the current set of features (genes), under 5-fold cross-validation scheme. Then, the feature with the lowest coefficients is removed. That procedure recursively repeats on the pruned set until the desired number of features to select is eventually reached. The final sets of genes are considered as potential biomarkers.
Results. For each treatment, we finally obtained a subset of 189 genes. The classification performances corresponding to seven combinations of treatments are presented in Figure 1. For the patients treated with CT and RT (CT=YES, HT=NO, RT=YES) we obtained the highest accuracy, about 98%. For the patients treated with CT only (CT=YES, HT=NO, RT=NO) the accuracy is the lowest, but still as high as about 90%. For each subset, about 15 to 26 genes are breast-cancer-related, about 55 to 65 genes are cancer-related, according to the list of 8016 cancer-related genes that we collected from various public resources. Among them, many genes associated with cancer-relevant pathways. They include: FGFR4, EGFR, MUC16, FGFR4, GSTP1, PLA2G2A, GPC3, DUSP1, PLA2G16, RUNX2, CDH1, CYB5A, CTGF, NCOA4, C1QB, CYB5A, CGA, ESR1, KIT, TAT, PYCARD, AIM2, SAA1, CEACAM1, ESR1, PPP1R1B, PRKAR1A, HPGD, TP63, TGFBR3, PGR, H19. For examples, the gene EGFR involve in the pathways “Inhibition of Signaling by Overexpressed EGFR”, “Signaling by Overexpressed Wild-Type EGFR in Cancer”, “PLCG1 events in ERBB2 signaling”. Or, the gene PGR involve in the pathway “Nuclear signaling by ERBB4”.
Conclusion. The results showed that our selected sets of genes could be considered as biomarkers for breast cancer survivability prediction.
Keywords: Breast cancer, survivability prediction, treatment, machine learning, bio-marker.
Biomarkers and diagnostics, liquid biopsy, imaging, biochip/microarray technologies, advan , Integrating Big Data (genome data, pharmacogenomics, therapeutic applications of genome ed , Drug target discovery and integration with individualized therapy, integration of diagnosi