Survey On A Similarity Measure For Text Classification And Clustering

Rahul Nalawade; Akash Samal; Kiran Avhad

Authors

Rahul Nalawade Computer Engineering Department , STES’ Sinhgad Academy Of Engineerin,Pune
Akash Samal Computer Engineering Department , STES’ Sinhgad Academy Of Engineerin,Pune
Kiran Avhad Computer Engineering Department , STES’ Sinhgad Academy Of Engineerin,Pune

Keywords:

Text Classification , Text Clustering , Clustering Algorithms , Preprocessing , Tokenization , Stemming

Abstract

Computing the similarity between documents is an important operation in the text processing. In this paper, a
new similarity measure is proposed. To calculate the similarity between two documents with respect to a feature, the
proposed measure takes the following three cases in to account I) The same feature appears in both documents, II) The
same feature appears in only one document, and III) The same feature appears in none of the documents. For the first
case, the similarity will increases as the difference between the two involved feature values decreases. For the second
case, a fixed value is involved to the similarity. For the last case, the feature has no appearance to the similarity. The
proposed measure is extended to the similarity between the sets of documents. The effectiveness of our measure is
computed on the number of data sets for text clustering and classification. The performance obtained by the proposed
measure is better than achieved by other measures.

Survey On A Similarity Measure For Text Classification And Clustering

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Similar Articles

Make a Submission

downloads

Imp links

google

Latest publications

Information