Web Forum Crawling Using Index Thread Page Flipping Algorithm

A. Anny Leema; P. Iswarya

Web Forum Crawling Using Index Thread Page Flipping Algorithm

A. Anny Leema , P. Iswarya

Affiliations
1 Department of Computer Applications, B.S. Abdur Rahman University, Chennai, Tamil Nadu, India

Subscribe/Renew Journal

Internet forums are important platforms where users can send request and exchange information from different sources. The issue in existing system is the URL type recognition problem which consists of duplicate links and uninformative pages. Index Thread Page Flipping Algorithm (ITF) is used to overcome this issue. URL layout and page layout are used to recognise whether the URL link is valid or invalid.

In this project (Phase-I), "Web Forum Crawling using Index Thread Page Flipping Algorithm" is provided that finds whether the links are valid or invalid. The goal is to crawl relevant content. The Internet forums will have the URL type recognition problem. It learns to get the correct path or URL by using regular expression patterns and with created training sets from page type classifiers.

The modules implemented are user interface design module, page flipping module, entry URL discovery module, index/thread URL detection module, generic crawler module. In the user interface design module to connect with a server, user must give their user name and password. In the page flipping module, a long forum is divided into more pages which are linked by page-flipping links.Generic crawlers process each page individually and ignore the relationships between such pages. In the entry URL discovery module entry URL should be specified to perform the process. Some rules are defined to find the entry URL. In the index and thread URL detection module, index URL and thread URL are identified by their URL pattern. In the generic crawler module, given a forum it enters into the thread page and it performs crawling where it avoids the duplicate links and page flipping links.

The front end for all the modules in the project (Phase-I) is designed using eclipse and the backend is designed using SQL server 2005. The two modules in the project (Phase-I) are implemented using Java Servlet, JSP and the code behind is written using Java. The main feature of this project (Phase-I) is to save the bandwidth and time.

Keywords

Forum Crawling, Index Url, Thread Url, Page Flipping Url.

I-Scholar

Journal Help

Subscription Login to verify subscription

User

Notifications

Journal Content
Browse

Font Size

Information

Web Forum Crawling Using Index Thread Page Flipping Algorithm

Abstract Views: 919 | PDF Views: 2

Authors

A. Anny Leema
Department of Computer Applications, B.S. Abdur Rahman University, Chennai, Tamil Nadu, India

P. Iswarya
Department of Computer Applications, B.S. Abdur Rahman University, Chennai, Tamil Nadu, India

Abstract

Keywords

Forum Crawling, Index Url, Thread Url, Page Flipping Url.

International Journal of Business Analytics and Intelligence

Web Forum Crawling Using Index Thread Page Flipping Algorithm

Subscribe/Renew Journal

Keywords

Web Forum Crawling Using Index Thread Page Flipping Algorithm

Authors

Abstract

Keywords

References

Username
Password
Remember me

Username
Password
Remember me