Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Comparison of String Similarity Algorithms to Measure Lexical Similarity


Affiliations
1 Shrimad Rajchandra Institute of Management and Computer Applications, UTU, Bardoli, India
2 Shrimad Rajchandra Inst. of Management & Comp. Appl., UTU, Bardoli, India
     

   Subscribe/Renew Journal


A string similarity represents the lexical similarity between two words. This can be further exploited to identify similarity between questions. Several string similarity algorithm exists in literature. In this paper the authors have implemented five string similarity algorithms viz. Dice coefficient, Jaccard similarity, Levenshtein distance, Jaro distance and Cosine similarity. The results of these algorithms are further compared with human judges to determine, which of them resembles the human way to dissimilarize the given strings. The experimentation is done over 1000 English word pairs.
Subscription Login to verify subscription
User
Notifications
Font Size



  • Comparison of String Similarity Algorithms to Measure Lexical Similarity

Abstract Views: 560  |  PDF Views: 6

Authors

Sagar J. Gandhi
Shrimad Rajchandra Institute of Management and Computer Applications, UTU, Bardoli, India
Mihirraj M. Thakor
Shrimad Rajchandra Institute of Management and Computer Applications, UTU, Bardoli, India
Jikitsha Sheth
Shrimad Rajchandra Inst. of Management & Comp. Appl., UTU, Bardoli, India
Hariom I. Pandit
Shrimad Rajchandra Institute of Management and Computer Applications, UTU, Bardoli, India
Hemin S. Patel
Shrimad Rajchandra Institute of Management and Computer Applications, UTU, Bardoli, India

Abstract


A string similarity represents the lexical similarity between two words. This can be further exploited to identify similarity between questions. Several string similarity algorithm exists in literature. In this paper the authors have implemented five string similarity algorithms viz. Dice coefficient, Jaccard similarity, Levenshtein distance, Jaro distance and Cosine similarity. The results of these algorithms are further compared with human judges to determine, which of them resembles the human way to dissimilarize the given strings. The experimentation is done over 1000 English word pairs.

References