Open Access Open Access  Restricted Access Subscription Access

Document Summarization Using Positive Pointwise Mutual Information


Affiliations
1 Department of Computer Science, University of Kerala, Kerala, India
2 School of Engineering, Amrita Vishwa Vidyapeetham Amritapuri Campus, Kollam, India
 

The degree of success in document summarization processes depends on the performance of the method used in identifying significant sentences in the documents. The collection of unique words characterizes the major signature of the document, and forms the basis for Term-Sentence-Matrix (TSM). The Positive Pointwise Mutual Information, which works well for measuring semantic similarity in the Term-Sentence-Matrix, is used in our method to assign weights for each entry in the Term-Sentence-Matrix. The Sentence-Rank-Matrix generated from this weighted TSM, is then used to extract a summary from the document. Our experiments show that such a method would outperform most of the existing methods in producing summaries from large documents.

Keywords

Data Mining, Text Mining, Document Summarization, Positive Pointwise Mutual Information, Term-Sentence-Matrix.
User
Notifications
Font Size

Abstract Views: 203

PDF Views: 151




  • Document Summarization Using Positive Pointwise Mutual Information

Abstract Views: 203  |  PDF Views: 151

Authors

S. Aji
Department of Computer Science, University of Kerala, Kerala, India
Ramachandra Kaimal
School of Engineering, Amrita Vishwa Vidyapeetham Amritapuri Campus, Kollam, India

Abstract


The degree of success in document summarization processes depends on the performance of the method used in identifying significant sentences in the documents. The collection of unique words characterizes the major signature of the document, and forms the basis for Term-Sentence-Matrix (TSM). The Positive Pointwise Mutual Information, which works well for measuring semantic similarity in the Term-Sentence-Matrix, is used in our method to assign weights for each entry in the Term-Sentence-Matrix. The Sentence-Rank-Matrix generated from this weighted TSM, is then used to extract a summary from the document. Our experiments show that such a method would outperform most of the existing methods in producing summaries from large documents.

Keywords


Data Mining, Text Mining, Document Summarization, Positive Pointwise Mutual Information, Term-Sentence-Matrix.