Diabetes prediction using feature selection and classification

Authors

  • Khyati K. Gandhi PG Student, CE Department, BVM Engg. College, Vallabh Vidhyanagar, kvmehta108@gmail.com
  • Prof. Nilesh B.Prajapati IT Department, BVM Engg. College, Vallabh Vidhyanagar, nilesh.prajapati@bvmengineering.ac.in

Keywords:

Data mining, Feature selection, F-score, SVM classifier, K-means clustering.

Abstract

Medical data mining is becoming increasingly important in healthcare. The diversity of
medical data collected/stored for diagnosis and prognosis and the availability of widespread data mining
techniques to process these data place medical data mining in a unique position to truly impact patient
care using these stored data. Medical data are high dimensional in nature. It contains irrelevant and
redundant features that reduce prediction accuracy so data pre-processing is required to prepare data for
mining task. Feature selection has been an active and fruitful field of research and development for
decades in statistical machine learning, data mining. It is effective in enhancing learning efficiency,
increasing predictive accuracy, and reducing complexity of learned results. Feature selection is the preprocessing technique that selects optimal feature subset from whole features. F-score method and Kmeans clustering is used for feature selection. The performance of the SVM classifier is empirically
evaluated on the reduced feature subset of Pima Indian diabetes dataset is one of the standard dataset
available at UCI machine learning laboratory used for testing data mining algorithms to see their
prediction accuracy in diabetes data classification.

Published

2014-05-25

How to Cite

Khyati K. Gandhi, & Prof. Nilesh B.Prajapati. (2014). Diabetes prediction using feature selection and classification. International Journal of Advance Engineering and Research Development (IJAERD), 1(5), 850–856. Retrieved from https://ijaerd.org/index.php/IJAERD/article/view/150