Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Clustering of Web Page for Different Domains Using Data Extraction and Self Organizing Map


Affiliations
1 C.S.E., Technocrats Institute of Technology, Bhopal, M.P., India
2 Dept. of C.S.E., Technocrats Institute of Technology, Bhopal, M.P., India
3 Dept of C.S.E., Technocrats Institute of Technology, Bhopal, M.P., India
     

   Subscribe/Renew Journal


Given the rapid growth and success of public information sources on the World Wide Web, it is increasingly attractive to extract data from these sources and make it available for further processing by end users and application programs. Data extracted from Web sites can serve as the springboard for a variety of tasks, including information retrieval (e.g. business intelligence), event monitoring (news and stock market), and electronic commerce (shopping comparison). Extracting structured data from Web sites is not a trivial task. Most of the information on the Web today is in the form of Hypertext Markup Language (HTML) documents which are viewed by humans with a browser. A sophisticated method to organize the layout of the information and assist user navigation is therefore particularly important. Data Extraction is the process of retrieving data out of data sources further data processing. Online data exists in the form of a web record. Depending on the end user query, the query results are generated by web databases and from this query results pages. The main objective of this paper is to extract and align important data from different domains with the help of HTML tags and its value. After extracting data, Self Organizing Map (SOM) will classify the extracted data from different domains in the form of clusters. Clustering is the process of grouping physical or abstract objects into classes of similar objects.

Keywords

Data Extraction, Data Record Alignment, Clustering, QRR, SOM.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 260

PDF Views: 3




  • Clustering of Web Page for Different Domains Using Data Extraction and Self Organizing Map

Abstract Views: 260  |  PDF Views: 3

Authors

Chhaya Varade
C.S.E., Technocrats Institute of Technology, Bhopal, M.P., India
Bhupesh Gour
Dept. of C.S.E., Technocrats Institute of Technology, Bhopal, M.P., India
Asif Ullah Khan
Dept of C.S.E., Technocrats Institute of Technology, Bhopal, M.P., India
Shailendra Jain
Dept of C.S.E., Technocrats Institute of Technology, Bhopal, M.P., India

Abstract


Given the rapid growth and success of public information sources on the World Wide Web, it is increasingly attractive to extract data from these sources and make it available for further processing by end users and application programs. Data extracted from Web sites can serve as the springboard for a variety of tasks, including information retrieval (e.g. business intelligence), event monitoring (news and stock market), and electronic commerce (shopping comparison). Extracting structured data from Web sites is not a trivial task. Most of the information on the Web today is in the form of Hypertext Markup Language (HTML) documents which are viewed by humans with a browser. A sophisticated method to organize the layout of the information and assist user navigation is therefore particularly important. Data Extraction is the process of retrieving data out of data sources further data processing. Online data exists in the form of a web record. Depending on the end user query, the query results are generated by web databases and from this query results pages. The main objective of this paper is to extract and align important data from different domains with the help of HTML tags and its value. After extracting data, Self Organizing Map (SOM) will classify the extracted data from different domains in the form of clusters. Clustering is the process of grouping physical or abstract objects into classes of similar objects.

Keywords


Data Extraction, Data Record Alignment, Clustering, QRR, SOM.