| Home

Overview


Original Research

TRANSCRIPTOMICS ANALYSIS OF BREAST CANCER TISSUES: AN IN-SILICO APPROACH USING MACHINE LEARNING FEATURE SELECTION ALGORITHMS

ALPNA SHARMA 1, NISHEETH JOSHI 2, and VINAY KUMAR 3.

Vol 18, No 05 ( 2023 )   |  DOI: 10.17605/OSF.IO/DG6TK   |   Author Affiliation: Department of Computer Science, Apaji Institute, Banasthali University, India 1,2; Ex Scientist GOI, Ex Professor, VIPS – Vivekananda Institute of Professional Studies 3.   |   Licensing: CC 4.0   |   Pg no: 1782-1798   |   Published on: 31-05-2023

Abstract

The most frequent cancer in women and the second most common cancer overall among newly diagnosed cases is breast cancer. Local invasion and metastasis are factors that precede the majority of cancer fatalities, with metastasis accounting for 90% of deaths, but very little is known about the molecular causes of invasion and metastasis. Thus exposing the underlying causes of this condition at the Transcriptomics level can lead to a novel treatment approach for Breast Cancer. To identify underlying differences between epithelial breast cancer tissues (TEC), stromal breast cancer tissues (SCC), normal control epithelial breast cancer tissue samples (EN), and normal control stromal breast cancer tissue samples (SN) at the Transcriptomics level, the total RNA microarray processed data from GEO for breast cancer patients was analyzed. The transcriptional profiles of 64 samples, including 28 TEC, 28 SCC, 5 EN, and 5 SN controls received from the NCBI-Bio project, were therefore subjected to various bioinformatics analysis in the current work (PRJNA107497). First, exploratory data analysis based on gene expression data using principal component analysis (PCA) depicted distinct patterns between TEC vs EN and SCC vs SN samples. Subsequently, the Welch’s T-test differential gene expression analysis identified 22277 significantly differentially expressed genes (Fold change (>= 1.5), p.adj


Keywords

Breast cancer, Machine Learning, Differential gene expression, KEGG pathway analysis, PCA, Heat maps, Dendrogram