Open Access Open Access  Restricted Access Subscription Access

Content Modelling Intelligence System Based on Automatic Text Summarization


Affiliations
1 Department of Information Science & Engineering, JSS ACADEMY OF TECHNICAL EDUCATION, Bangalore-560060, India
 

Nowadays, within the period of having huge information, literary information is rapidly developing and is accessible in numerous diverse languages. Often due to time limitations, we are not able to devour all the information that is accessible. With the fast-paced world, it is troublesome to peruse all the textual content. Therefore, the necessity for content summarization comes to the spotlight. It is in this manner we are able to summarize the content so that it gets easier to ingest the data, keeping up the substance, and understanding the data. A few content summarization approaches have been presented in the past for a long time for English and some other European languages but there are startlingly few methods that can be found for the local languages of India. This paper presents a study of extractive content summarization methods for multiple Indian and international languages like Hindi, Kannada, Telugu, Marathi, German, French, etc. This paper proposes a system of Optical Character Recognition (OCR) which extracts the content from the uploaded picture. The main motive of the OCR is the creation of editable records from documents that already exist or picture files. The Optical Character Recognition also works on sentence discovery to protect a document’s structure. The paper also presents a strategy for programmed sentence extraction utilizing the Text-rank algorithm. This approach relegates scores to the sentences by weighting the highlights like term frequency, word events, and noun weight and expressions. The outcome of this work demonstrates that our approach gives more accuracy and also provides text-to-speech with the interpretation of one language to another while maintaining coherence and accomplishes superior results when compared with existing methods.

Keywords

Natural Language Processing, Optical-Character Recognition, Summarization, Text-Rank Algorithm, Text-to-Speech.
User
Notifications
Font Size

  • Dr. D V Ashoka, Sanjay B Ankali, “Detection Architecture of Application Layer DDoS Attack for Internet”, International Journal of Advanced Networking and Applications (2011).
  • Dr. Annapurna P Patil, Shivam Dalmia, Syed Abu Ayub Ansari, Tanay Aul, Varun Bhatnagar, “Automatic Text Summarizer “, International Conference on Advances in Computing, Communications and Informatics, IEEE (2014).
  • Narendra Andhale, L.A.Bewoor, “ An Overview of Text Summarization Techniques ” ICACCI (2016).
  • Prakhar Sethi, Sameer Sonawane, Saumitra Khanwalker, R. B. Keskar, “Automatic Text Summarization of News Articles “, International Conference on Big Data, IoT and Data Science (BID) Vishwakarma Institute of Technology, Pune, Dec 20-22 IEEE (2017).
  • Prachi Shah, Nikitha P. Desai “A Survey of Automatic Text Summarization Techniques for Indian and Foreign Languages “, International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) (2016).
  • ZHANG Pei-Ying, LI Cun-he, “Automatic text summarization based on sentence clustering and extraction “, Institute of Electrical and Electronics Engineers IEEE (2009).
  • R. Chetan and D. V. Ashoka, “Data mining-based network intrusion detection system: A database centric approach”, 2012 International Conference on Computer Communication and Informatics, Coimbatore, pp. 1-6, doi: 10.1109/ICCCI.2012.6158816 (2012)
  • D. Jasmine Guna Sundari, D. Sundar, “A Study of Various Text Mining Techniques” International Journal of Advanced Networking and Applications (IJANA) Volume: 08, Issue: 05 Pages: 82-85 Special Issue, pp:8285(2017).
  • Teddy Montoro, Abdul Muis Sobri, Wendi Usino, “Optical Character Recognition (OCR) Performance in Server-based Mobile Environment “, International Conference on Advanced Computer Science Applications and Technologies (2013).
  • Dr T. Santha, M .Abhayadev “Content Based Image Retrieval Public and Private Search Engines”, Special Issue Published in Int. Jnl. Of Advanced Networking and Applications (IJANA) pp:98-102(2015)
  • Akash Shekar, Jeevanantham.P, “Question and Answer Extraction Using NLP”, Special Issue Published in Int. Jnl. Of Advanced Networking and Applications (IJANA)197-199

Abstract Views: 223

PDF Views: 0




  • Content Modelling Intelligence System Based on Automatic Text Summarization

Abstract Views: 223  |  PDF Views: 0

Authors

Sanjan S. Malagi
Department of Information Science & Engineering, JSS ACADEMY OF TECHNICAL EDUCATION, Bangalore-560060, India
Rachana Radhakrishnan
Department of Information Science & Engineering, JSS ACADEMY OF TECHNICAL EDUCATION, Bangalore-560060, India
R. Monisha
Department of Information Science & Engineering, JSS ACADEMY OF TECHNICAL EDUCATION, Bangalore-560060, India
S. Keerthana
Department of Information Science & Engineering, JSS ACADEMY OF TECHNICAL EDUCATION, Bangalore-560060, India

Abstract


Nowadays, within the period of having huge information, literary information is rapidly developing and is accessible in numerous diverse languages. Often due to time limitations, we are not able to devour all the information that is accessible. With the fast-paced world, it is troublesome to peruse all the textual content. Therefore, the necessity for content summarization comes to the spotlight. It is in this manner we are able to summarize the content so that it gets easier to ingest the data, keeping up the substance, and understanding the data. A few content summarization approaches have been presented in the past for a long time for English and some other European languages but there are startlingly few methods that can be found for the local languages of India. This paper presents a study of extractive content summarization methods for multiple Indian and international languages like Hindi, Kannada, Telugu, Marathi, German, French, etc. This paper proposes a system of Optical Character Recognition (OCR) which extracts the content from the uploaded picture. The main motive of the OCR is the creation of editable records from documents that already exist or picture files. The Optical Character Recognition also works on sentence discovery to protect a document’s structure. The paper also presents a strategy for programmed sentence extraction utilizing the Text-rank algorithm. This approach relegates scores to the sentences by weighting the highlights like term frequency, word events, and noun weight and expressions. The outcome of this work demonstrates that our approach gives more accuracy and also provides text-to-speech with the interpretation of one language to another while maintaining coherence and accomplishes superior results when compared with existing methods.

Keywords


Natural Language Processing, Optical-Character Recognition, Summarization, Text-Rank Algorithm, Text-to-Speech.

References