Efficient Calculation of Maximum Likelihood Estimation for Authorship Attribution Using Lexical and Syntactic Features

R. Padmamala; E. Kannan; V. Prema

Efficient Calculation of Maximum Likelihood Estimation for Authorship Attribution Using Lexical and Syntactic Features

Affiliations
1 University of Madras, Chennai, India
2 Department of CSE, Veltech Dr. RR & Dr. SR Technical University, Chennai, India
3 Veltech Dr. RR & Dr. SR Technical University, Chennai, India

In this paper two models for Authorship Attribution using Bayesian approach are compared. Authorship attribution deals with the ascertainment of the actual author for a particular text. When two authors, say A1 and A2, claim to be the author of a particular essay, the real author is to be found out. For solving such a problem usually maximum likelihood estimation (MLE) for the authors under dispute is computed i.e., train a probabilistic model for author A1 and another probabilistic model for author A2. Then using those, calculate the MLE. This method is known as Bayesian approach. For doing this an unknown text and two authors with a large text sample each are needed. To calculate the maximum likelihood unigram, bigram or trigram models can be chosen. Usually unigrams are chosen; number of occurrences of those unigrams are found out; their probabilities are calculated. Based on the higher probability actual author is ascertained. The above seen is the method commonly used for Authorship Attribution. In this paper another method which consider the singleton unigram words is going to be used, that is, the words that have occurred only once in the text under dispute or “the unknown text”. In this paper, vocabulary usage to ascertain the original author is concentrated upon. Also an advanced method of using further grammatical features like Syntactic features is proposed. Both singleton unigram model and unigram model are used to find out the maximum likelihood estimate.

Keywords

Unigrams, Singleton Unigrams, Tokenizer, Bayesian Approach, Syntax, POS Tagger, Parser.

I-Scholar

Journal Help

User

Subscription Login to verify subscription

Notifications

Journal Content
Browse

Font Size

Information

Abstract Views: 343

PDF Views: 2

Efficient Calculation of Maximum Likelihood Estimation for Authorship Attribution Using Lexical and Syntactic Features

Abstract Views: 343 | PDF Views: 2

Authors

R. Padmamala
University of Madras, Chennai, India

E. Kannan
Department of CSE, Veltech Dr. RR & Dr. SR Technical University, Chennai, India

V. Prema
Veltech Dr. RR & Dr. SR Technical University, Chennai, India

Abstract

Keywords

Unigrams, Singleton Unigrams, Tokenizer, Bayesian Approach, Syntax, POS Tagger, Parser.

Username
Password
Remember me

Username
Password
Remember me

Data Mining and Knowledge Engineering

Data Mining and Knowledge Engineering

Efficient Calculation of Maximum Likelihood Estimation for Authorship Attribution Using Lexical and Syntactic Features

Subscribe/Renew Journal

Keywords

Efficient Calculation of Maximum Likelihood Estimation for Authorship Attribution Using Lexical and Syntactic Features

Authors

Abstract

Keywords