Mining Issues in Traditional Indian Web Documents

Kolla Bhanu Prakash

doi:10.17485/ijst/2015/v8i32/122692

Mining Issues in Traditional Indian Web Documents

Kolla Bhanu Prakash

Affiliations
1 Faculty of Computing, Chirala Engineering College, Chirala - 523157, Andhra Pradesh, India

Abstract
References
Article Metrics
Refbacks

Recent developments in information technology are mostly in areas where information, content creation and knowledge integration are the driving forces. Beginning with adjusting to complexities in internet and mobile communications, these developments are becoming significant sources of knowledge and expertise creators and this is where countries like India and China play a major role. Indian tradition is considered more than 5000 years old and proofs of some of this are available even now on written, oral and real forms like Mahabharata on text or Mohenjo-Daro-Harappa as structures. This study presents issues at extracting information from traditional Indian documents and a method of evaluating content as language, script and form of the web documents are significantly varied. The development is based on pixel level to make the approach generic and presents results for some basic issue at text level and how this can be extended to word and document level.