Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Extracting Template Properties Using Agglomerative Clustering


Affiliations
1 Karpagam University, Coimbatore, India
2 Bharathiyar University, Coimbatore, India
     

   Subscribe/Renew Journal


World Wide Web is widely used to publish and access information on the Internet. Most of the web pages in the web sites are published using the common templates with contents. Templates are the readymade holders, which provide readers easy access to the contents guided by consistent structures. It provides common look and feel to the web pages. However, the accuracy and performance of the web applications are degraded due to the presence of irrelevant terms in the templates. Thus, template detection techniques have received a lot of attention recently to improve the performance of search engines, clustering, and classification of web documents. Hence, the proposed system presents a new clustering algorithm for grouping the web pages that are using similar templates. The web pages under same cluster have equal priority and they are homogeneous. Hence, all those pages will not be displayed. In order to prioritize any homogeneous web page, the properties of that particular website will be extracted and modified. By changing the properties, homogeneous web page can be converted to a heterogeneous web page.

Keywords

Clustering, Heterogeneous, Homogeneous, Prioritize, Template Extraction.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 238

PDF Views: 1




  • Extracting Template Properties Using Agglomerative Clustering

Abstract Views: 238  |  PDF Views: 1

Authors

R. Devika
Karpagam University, Coimbatore, India
T. Mohanraj
Bharathiyar University, Coimbatore, India

Abstract


World Wide Web is widely used to publish and access information on the Internet. Most of the web pages in the web sites are published using the common templates with contents. Templates are the readymade holders, which provide readers easy access to the contents guided by consistent structures. It provides common look and feel to the web pages. However, the accuracy and performance of the web applications are degraded due to the presence of irrelevant terms in the templates. Thus, template detection techniques have received a lot of attention recently to improve the performance of search engines, clustering, and classification of web documents. Hence, the proposed system presents a new clustering algorithm for grouping the web pages that are using similar templates. The web pages under same cluster have equal priority and they are homogeneous. Hence, all those pages will not be displayed. In order to prioritize any homogeneous web page, the properties of that particular website will be extracted and modified. By changing the properties, homogeneous web page can be converted to a heterogeneous web page.

Keywords


Clustering, Heterogeneous, Homogeneous, Prioritize, Template Extraction.