Comparing State-of-the-art Models for Language Detection Methods on Short Texts from Twitter

Devendra Kumar Tayal; Yashima Hooda; Diksha; Aananya Nagpal

Comparing State-of-the-art Models for Language Detection Methods on Short Texts from Twitter

Devendra Kumar Tayal , Yashima Hooda , Diksha , Aananya Nagpal

Affiliations
1 Department of Computer Science & Engineering Indira Gandhi Delhi Technical University for Women Delhi, India., India

Short text communication via microblogging platforms like Twitter has become the norm in today’s fast-paced world. These platforms have a global reach; thus, usage of multiple languages (including region-specific languages) is common. Language detection is an important task that finds its application in several NLP tasks as data is available for further analysis, only once its natural language has been detected. In our work, we have analysed and compared the performances of two major state-of-the-art models, which are Naive-Bayes and Logistic Regression for identification of the natural languages, on short-text data. Both the models were trained on a dataset from Kaggle, which made them capable of detecting 22 languages. They were compared on different parameters like accuracy, precision, recall, and f1 score, and it was learnt that Logistic Regression works better on relatively small datasets like ours.

Keywords

Natural language Processing, Language Detection, Logistic Regression, Naive Bayes.

I-Scholar

Journal Help

User

Notifications

Journal Content
Browse

Font Size

Information

Comparing State-of-the-art Models for Language Detection Methods on Short Texts from Twitter

Abstract Views: 351 | PDF Views: 0

Authors

Devendra Kumar Tayal
Department of Computer Science & Engineering Indira Gandhi Delhi Technical University for Women Delhi, India., India

Yashima Hooda
Department of Computer Science & Engineering Indira Gandhi Delhi Technical University for Women Delhi, India., India

Diksha
Department of Computer Science & Engineering Indira Gandhi Delhi Technical University for Women Delhi, India., India

Aananya Nagpal
Department of Computer Science & Engineering Indira Gandhi Delhi Technical University for Women Delhi, India., India

Abstract

Keywords

Natural language Processing, Language Detection, Logistic Regression, Naive Bayes.

Research Cell: An International Journal of Engineering Sciences

Comparing State-of-the-art Models for Language Detection Methods on Short Texts from Twitter

Keywords

Comparing State-of-the-art Models for Language Detection Methods on Short Texts from Twitter

Authors

Abstract

Keywords

References

Username
Password
Remember me

Username
Password
Remember me