Microarray Gene Expression Data Pre-Processing Using PPCA and Classification using RF-SVM Algorithm
Keywords:
Gene Expression, Cancer gene Classification, Gene Selection, SVM-RFE with MRMR Filter, RF-SVMAbstract
Various recent research have shown that microarray gene expression data is useful for cancer
classification and microarray based gene expression profiling has turned out to be most vital and promising dataset for
the purpose of cancer classification that are used for effective diagnosis and prognosis. It is extremely vital to determine
the most informative and defective genes in order to improve premature cancer diagnosis and to provide effective
chemotherapy processes. In addition, in order to find perfect gene selection methods that considerably reduce the
dimensionality and choose informative genes is extremely noteworthy issue in the field of cancer classification. Here, in
this work, at first preprocessing process is done with the assistance of Probabilistic Principle Component Analysis
(PPCA) in order to discover the Mutual Information detection on Micro array dataset and to effectively diminish the
noise included in the dataset. Then, by using the preprocessed dataset an Support Vector Machine Recursive Feature
Elimination with Minimum Redundancy–Maximum Relevancy (SVM-RFE with MRMR Filter) algorithm is proposed to
minimize the redundancy among the selected genes. It also improves the accuracy of classification and yields smaller
gene sets on several benchmark cancer gene expression datasets. This method outperforms compared to other popular
gene selection methods. The RF-SVM (Random Forest-SVM Classifies) algorithm 2 is applied to classify the genes and
our experimental results shows that the proposed algorithm classifies accurately compare to other existing algorithms.
The SVM-RFE with MRMR Filter algorithm 1 which is applied before classification for feature selection also performed
well with small amount of predictive genes when tested using both datasets and compared against previously suggested
schemes. Finally the result proves that the proposed RF-SVM (Random forest - SVM classifier) is a promising approach
for cancer classification problems