An optimization expectation maximization approach for email classification using naive bayes and k-nearest neighbor
Keywords:
Email Classification, Naïve Bayes, K-NN, SSLAbstract
With the development of Internet and the emergence of a large number of text resources, the automatic text
classification has become a research hotspot. Emails is one of the fastest communication ways that today it has became
the part of communication means of millions of people. It has become a part of everyday life for millions of people,
changing the way we work and collaborate. The larg e percentage of the total traffic over the internet is the email.
Email data is also growing rapidly, creating needs for automated analysis. In many security informatics applications it
is important to detect deceptive communication in email. In the iterative process in the standard EM-based semisupervised learning, there are two steps: firstly, use the current classifier constructed in the previous iteration to predict
the labels of all unlabeled samples; then, reconstruct a new classifier based on the new training samples set. In this
work, an EM based Semi-Supervised Learning algorithm using Naïve Bayesian is proposed in which unlabeled
documents are divided into two parts, reliable and misclassified. An Ensemble technique is used to add only reliable
unlabeled documents to the training set. Also preprocessing of unlabelled documents is performed before learning
process of Naïve Bayesian and K-NN classifiers during first step of EM to reduce time of preprocessing, so with this
proposed work accuracy of classifier will be increased and execution time will be decreased.