

Comparing State-of-the-art Models for Language Detection Methods on Short Texts from Twitter
Short text communication via microblogging platforms like Twitter has become the norm in today’s fast-paced world. These platforms have a global reach; thus, usage of multiple languages (including region-specific languages) is common. Language detection is an important task that finds its application in several NLP tasks as data is available for further analysis, only once its natural language has been detected. In our work, we have analysed and compared the performances of two major state-of-the-art models, which are Naive-Bayes and Logistic Regression for identification of the natural languages, on short-text data. Both the models were trained on a dataset from Kaggle, which made them capable of detecting 22 languages. They were compared on different parameters like accuracy, precision, recall, and f1 score, and it was learnt that Logistic Regression works better on relatively small datasets like ours.
Keywords
Natural language Processing, Language Detection, Logistic Regression, Naive Bayes.
User
Font Size
Information