A Similarity Measure for Text Classification and Clustering

Swati Hatekar; Snehal Pagar; Ashish Tayde; Vikas Lande; Mrs.Poonam Sawdekar

Authors

Swati Hatekar B.E. Computer Engineering, D.Y.P.I.E.T., Pimpri, Pune
Snehal Pagar B.E. Computer Engineering, D.Y.P.I.E.T., Pimpri, Pune
Ashish Tayde B.E. Computer Engineering, D.Y.P.I.E.T., Pimpri, Pune
Vikas Lande B.E. Computer Engineering, D.Y.P.I.E.T., Pimpri, Pune
Mrs.Poonam Sawdekar B.E. Computer Engineering, D.Y.P.I.E.T., Pimpri, Pune

Keywords:

Document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms

Abstract

Clustering is one of the necessary techniques in machine learning and data mining techniques. Similar data
grouping is performed using clustering techniques. In document vector each component indicates the value of the
corresponding feature in the document. The characteristic measure could be term frequency, are similar to relative term
frequency. Similarity Measurement for Text Process (SMTP) is used to measure the similarity between two documents
with respect to a feature. Presents and options of the features in both documents are used to estimate the similarity
values. The SMTP is extended to estimate similarity between two set of documents. The SMTP scheme is used with text
clustering and classification task. K means algorithm is used for the clustering techniques.

A Similarity Measure for Text Classification and Clustering

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Similar Articles

Make a Submission

downloads

Imp links

google

Latest publications

Information