Open Access
Subscription Access
Open Access
Subscription Access
Optimized Web Page Generation Using Web Content Mining
Subscribe/Renew Journal
In the past few years, there has been an exponential increase in the amount of information available on World Wide Web. Web pages have been the potential source of information retrieval and data mining technology, but most HTML documents on Internet are cluttered with large amount of less informative and typically unrelated materials such as large amount of banner ads, navigation bars and copyright notices etc. Such irrelevant information is not part of the main content of the pages, they will seriously harm Web mining and searching. In this paper we develop an automatic HTML generator to generate optimized web pages using Web content mining from the already existing web pages. The input for the HTML generator is any HTML webpage or web pages. The web pages are downloaded manually by the user or by using the download manager developed in the automatic HTML generator. These downloaded pages are mined and useful information's are extracted including keywords and stored in the specific location. By using the keywords Web pages are clustered by Dbscan clustering algorithm to identify website category. With the help of these mined resources a new optimized webpage is created. This web page will be user friendly and noise free in nature and it may contains text, images, audio, video, structured list and hyperlink structures. Although only sample web pages of five different categories are considered, the proposed method can be applied to any web pages that can be mined for knowledge extraction.
Keywords
Web Content Mining, Text Mining, Web Structure Mining, Link Mining, HTML Generator.
User
Subscription
Login to verify subscription
Font Size
Information
Abstract Views: 311
PDF Views: 2