Hindi Language Document Summarization Using Context Based Indexing Model

Swati Sargule; Ramesh M. Kagalkar

Hindi Language Document Summarization Using Context Based Indexing Model

Affiliations
1 Department of Computer Engineering, Dr. D. Y. Patil School of Engineering and Technology, Lohegaon, Pune, India
2 Department of Computer Engineering, Dr. D. Y. Patil School of Engineering and Technology, Charoli, B. K. Via, Lohegaon, Pune, Maharashtra, India

Hindi Document Summarization (DS) is an Information Retrieval (IR) process in which summery of document is extracted to provide overview of that document. Existing document summarization models generally use the similarity among sentences in the original document to extract the maximum significant sentences. The documents along with the sentences are generally indexed using standard term indexing computation techniques, which do not take into account the context related to document. Thus, the similarity values of sentence are independent of the context. In this paper, a context sensitive document indexing model is propose which based on the Bernoulli model of randomness for Hindi text document. The Bernoulli model has been used to check the probability of the co-occurrences of two terms in a large set of documents.