Efficient Similarity Join Method Using Unsupervised Learning

Bilal Hawashin; Farshad Fotouhi; William Grosky

Efficient Similarity Join Method Using Unsupervised Learning

Bilal Hawashin ¹, Farshad Fotouhi ², William Grosky ³

Affiliations
1 Department of Computer Information Systems, Alzaytoonah University of Jordan, Amman 11733, Jordan
2 Department of Computer Science, Wayne State University, Detroit, MI 48202, United States
3 Department of Computer and Information Science, University of Michigan-Dearborn, Dearborn, MI 48128, United States

This paper proposes an efficient similarity join method using unsupervised learning, when no labeled data is available. In our previous work, we showed that the performance of similarity join could improve when long string attributes, such as paper abstracts, movie summaries, product descriptions, and user feedback, are used under supervised learning, where a training set exists. In this work, we adopt using long string attributes during the similarity join under unsupervised learning. Along with its importance when no labeled data exists, unsupervised learning is used when no labeled data is available, it acts also as a quick preprocessing method for huge datasets. Here, we show that using long attributes during the unsupervised learning can further enhance the performance. Moreover, we provide an efficient dynamically expandable algorithm for databases with frequent transactions.

Keywords

Similarity Join, Unsupervised Learning, Diffusion Maps, Databases, Machine Learning.

I-Scholar

Journal Help

User

Notifications

Journal Content
Browse

Font Size

Information

Efficient Similarity Join Method Using Unsupervised Learning

Abstract Views: 490 | PDF Views: 188

Authors

Bilal Hawashin
Department of Computer Information Systems, Alzaytoonah University of Jordan, Amman 11733, Jordan

Farshad Fotouhi
Department of Computer Science, Wayne State University, Detroit, MI 48202, United States

William Grosky
Department of Computer and Information Science, University of Michigan-Dearborn, Dearborn, MI 48128, United States

Abstract

Keywords

Similarity Join, Unsupervised Learning, Diffusion Maps, Databases, Machine Learning.

Username
Password
Remember me

Username
Password
Remember me

AIRCC's International Journal of Computer Science and Information Technology

AIRCC's International Journal of Computer Science and Information Technology

Efficient Similarity Join Method Using Unsupervised Learning

Keywords

Efficient Similarity Join Method Using Unsupervised Learning

Authors

Abstract

Keywords