Using Machine Learning for Automatic Text Classification of Unstructured Blog Data

Authors

  • Dr. Nitin Rajvanshi Lecturer (Sel. Grade), Govt. Women Polytechnic College, Jodhpur

Keywords:

Automatic Blog Text Classification; Feature Extraction; Machine Learning Models; Semi-Supervised Learning; Polysemy ; Prior Probability

Abstract

Opportunities for integrating applications of machine intelligence into the daily lives of people are growing with
the increasing popularity of computing systems, the widening diversity of web services, the growing popularity of portable
devices that contain general-purpose operating systems, and ongoing inventions in human-computer interaction— including
the cases of speech recognition, handwriting, and sketch-understanding interfaces. Much of machine learning (ML) research
is inspired by problems and its solutions from biology, medicine, finance, astronomy, etc.This paper presents automatic
classification of unstructured blog entries by following pre-processing steps like tokenization, stop-word elimination and
stemming. It uses Machine Learning techniques for feature set extraction, and feature set enhancement by semantic
resources followed by modeling using a alternative machine learning model—the naïve Bayesian model.Empirical
evaluations and calculations done in this paper indicate that this multi-step classification approach has resulted in good
overall classification accuracy over unstructured blog text datasets with machine learning model alternative.Automatic
classification of blog entries is generally treated as a semi-supervised machine learning task, in which the blog entries are
automatically assigned to one of a set of pre-defined classes based on the features extracted from their textual content.
The naïve Bayesian classification model clearly out-performs the other classification model when a smaller feature-set is
available which is usually the case when a blog topic is recent and the number of training datasets available is restricted.

Published

2018-03-25

How to Cite

Dr. Nitin Rajvanshi. (2018). Using Machine Learning for Automatic Text Classification of Unstructured Blog Data. International Journal of Advance Engineering and Research Development (IJAERD), 5(3), 95–99. Retrieved from https://ijaerd.org/index.php/IJAERD/article/view/4967