Open Access
Subscription Access
Open Access
Subscription Access
Clustering of Web Page for Different Domains Using Data Extraction and Self Organizing Map
Subscribe/Renew Journal
Given the rapid growth and success of public information sources on the World Wide Web, it is increasingly attractive to extract data from these sources and make it available for further processing by end users and application programs. Data extracted from Web sites can serve as the springboard for a variety of tasks, including information retrieval (e.g. business intelligence), event monitoring (news and stock market), and electronic commerce (shopping comparison). Extracting structured data from Web sites is not a trivial task. Most of the information on the Web today is in the form of Hypertext Markup Language (HTML) documents which are viewed by humans with a browser. A sophisticated method to organize the layout of the information and assist user navigation is therefore particularly important. Data Extraction is the process of retrieving data out of data sources further data processing. Online data exists in the form of a web record. Depending on the end user query, the query results are generated by web databases and from this query results pages. The main objective of this paper is to extract and align important data from different domains with the help of HTML tags and its value. After extracting data, Self Organizing Map (SOM) will classify the extracted data from different domains in the form of clusters. Clustering is the process of grouping physical or abstract objects into classes of similar objects.
Keywords
Data Extraction, Data Record Alignment, Clustering, QRR, SOM.
User
Subscription
Login to verify subscription
Font Size
Information
Abstract Views: 261
PDF Views: 3