Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Efficient Calculation of Maximum Likelihood Estimation for Authorship Attribution Using Lexical and Syntactic Features


Affiliations
1 University of Madras, Chennai, India
2 Department of CSE, Veltech Dr. RR & Dr. SR Technical University, Chennai, India
3 Veltech Dr. RR & Dr. SR Technical University, Chennai, India
     

   Subscribe/Renew Journal


In this paper two models for Authorship Attribution using Bayesian approach are compared. Authorship attribution deals with the ascertainment of the actual author for a particular text. When two authors, say A1 and A2, claim to be the author of a particular essay, the real author is to be found out. For solving such a problem usually maximum likelihood estimation (MLE) for the authors under dispute is computed i.e., train a probabilistic model for author A1 and another probabilistic model for author A2. Then using those, calculate the MLE. This method is known as Bayesian approach. For doing this an unknown text and two authors with a large text sample each are needed. To calculate the maximum likelihood unigram, bigram or trigram models can be chosen. Usually unigrams are chosen; number of occurrences of those unigrams are found out; their probabilities are calculated. Based on the higher probability actual author is ascertained. The above seen is the method commonly used for Authorship Attribution. In this paper another method which consider the singleton unigram words is going to be used, that is, the words that have occurred only once in the text under dispute or “the unknown text”. In this paper, vocabulary usage to ascertain the original author is concentrated upon. Also an advanced method of using further grammatical features like Syntactic features is proposed. Both singleton unigram model and unigram model are used to find out the maximum likelihood estimate.

Keywords

Unigrams, Singleton Unigrams, Tokenizer, Bayesian Approach, Syntax, POS Tagger, Parser.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 222

PDF Views: 2




  • Efficient Calculation of Maximum Likelihood Estimation for Authorship Attribution Using Lexical and Syntactic Features

Abstract Views: 222  |  PDF Views: 2

Authors

R. Padmamala
University of Madras, Chennai, India
E. Kannan
Department of CSE, Veltech Dr. RR & Dr. SR Technical University, Chennai, India
V. Prema
Veltech Dr. RR & Dr. SR Technical University, Chennai, India

Abstract


In this paper two models for Authorship Attribution using Bayesian approach are compared. Authorship attribution deals with the ascertainment of the actual author for a particular text. When two authors, say A1 and A2, claim to be the author of a particular essay, the real author is to be found out. For solving such a problem usually maximum likelihood estimation (MLE) for the authors under dispute is computed i.e., train a probabilistic model for author A1 and another probabilistic model for author A2. Then using those, calculate the MLE. This method is known as Bayesian approach. For doing this an unknown text and two authors with a large text sample each are needed. To calculate the maximum likelihood unigram, bigram or trigram models can be chosen. Usually unigrams are chosen; number of occurrences of those unigrams are found out; their probabilities are calculated. Based on the higher probability actual author is ascertained. The above seen is the method commonly used for Authorship Attribution. In this paper another method which consider the singleton unigram words is going to be used, that is, the words that have occurred only once in the text under dispute or “the unknown text”. In this paper, vocabulary usage to ascertain the original author is concentrated upon. Also an advanced method of using further grammatical features like Syntactic features is proposed. Both singleton unigram model and unigram model are used to find out the maximum likelihood estimate.

Keywords


Unigrams, Singleton Unigrams, Tokenizer, Bayesian Approach, Syntax, POS Tagger, Parser.