A Similarity Measure for Text Classification and Clustering

Authors

  • Swati Hatekar B.E. Computer Engineering, D.Y.P.I.E.T., Pimpri, Pune
  • Snehal Pagar B.E. Computer Engineering, D.Y.P.I.E.T., Pimpri, Pune
  • Ashish Tayde B.E. Computer Engineering, D.Y.P.I.E.T., Pimpri, Pune
  • Vikas Lande B.E. Computer Engineering, D.Y.P.I.E.T., Pimpri, Pune
  • Mrs.Poonam Sawdekar B.E. Computer Engineering, D.Y.P.I.E.T., Pimpri, Pune

Keywords:

Document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms

Abstract

Clustering is one of the necessary techniques in machine learning and data mining techniques. Similar data
grouping is performed using clustering techniques. In document vector each component indicates the value of the
corresponding feature in the document. The characteristic measure could be term frequency, are similar to relative term
frequency. Similarity Measurement for Text Process (SMTP) is used to measure the similarity between two documents
with respect to a feature. Presents and options of the features in both documents are used to estimate the similarity
values. The SMTP is extended to estimate similarity between two set of documents. The SMTP scheme is used with text
clustering and classification task. K means algorithm is used for the clustering techniques.

Published

2016-01-25

How to Cite

A Similarity Measure for Text Classification and Clustering. (2016). International Journal of Advance Engineering and Research Development (IJAERD), 3(1), 182-186. https://ijaerd.org/index.php/IJAERD/article/view/1622

Similar Articles

1-10 of 899

You may also start an advanced similarity search for this article.