Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Automated Text Summarization:A Case Study for Marathi Language


Affiliations
1 Department of Information Technology, Pune Institute of Computer Technology, Pune, India
     

   Subscribe/Renew Journal


The amount of information on the Internet/Web is growing day by day, which has caused information overload. To find relevant useful information is becoming crucial task. This growth has created a huge demand for automatic methods and tools for text summarization. In Natural Language Processing, Text summarization is an area getting attention of lots of researcher. In this paper, we present a survey on text summarization techniques, also discuss the key morphology of Marathi Languages and proposed framework of Text Summarization. Last decade, lots of work done on English language text summarization but a few notable works have been done for Marathi Language.
The Proposed framework summarizes a single document using extraction method. Before creating the summary of a text, first it is preprocessed by segmentation, tokenization, removal of stop words and stemming. In feature extraction process, the countable features like TF-ISF, sentence length, sentence positional value, SOV verification are used to make the summary more relevant and precise. For stemming purpose we develop a rule based as well as directory based Marathi Stemmer.

Keywords

Stemming, Stop Words, Text Summarization, Tokenization.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 376

PDF Views: 2




  • Automated Text Summarization:A Case Study for Marathi Language

Abstract Views: 376  |  PDF Views: 2

Authors

Umakant Dakulge
Department of Information Technology, Pune Institute of Computer Technology, Pune, India
S. C. Dharmadhikari
Department of Information Technology, Pune Institute of Computer Technology, Pune, India

Abstract


The amount of information on the Internet/Web is growing day by day, which has caused information overload. To find relevant useful information is becoming crucial task. This growth has created a huge demand for automatic methods and tools for text summarization. In Natural Language Processing, Text summarization is an area getting attention of lots of researcher. In this paper, we present a survey on text summarization techniques, also discuss the key morphology of Marathi Languages and proposed framework of Text Summarization. Last decade, lots of work done on English language text summarization but a few notable works have been done for Marathi Language.
The Proposed framework summarizes a single document using extraction method. Before creating the summary of a text, first it is preprocessed by segmentation, tokenization, removal of stop words and stemming. In feature extraction process, the countable features like TF-ISF, sentence length, sentence positional value, SOV verification are used to make the summary more relevant and precise. For stemming purpose we develop a rule based as well as directory based Marathi Stemmer.

Keywords


Stemming, Stop Words, Text Summarization, Tokenization.