A Study on Enhanced Path Sequence Algorithm Using Web Directories
Subscribe/Renew Journal
A web directory is not a search engine and does not display lists of web pages based on keywords; instead, it lists web sites by category and subcategory. The categorization is usually based on the whole web site rather than one page or a set of keywords, and sites are often limited to inclusion in only a few categories. Web directories often allow site owners to submit their site for inclusion, and have editors review submissions for fitness.
In dissimilarity to most of the work on Web usage mining, the usage data that are analyzed here communicate to user navigation throughout the Web, to a certain extent than a particular Web site exhibit as a result a high amount of thematic diversity. Due to proxy servers and cached versions of the pages used by the client using ‘Back’, the sessions identified have many missed pages. Enhanced Path Sequence Algorithm proposed there are chances of missing pages after constructing transactions due to proxy servers and caching problems.
Three approaches used for this 1. Time Window: A time window transaction is framed from triplets of ip address, user identification, and time length of each webpage up to a limit called time window. 2. Reference Length approach: This approach is based on the assumption that the amount of time a user spends on a page correlates to whether the page is an auxiliary page or content page for that user. 3. Maximal Forward Reference: A transaction is considered as the set of pages from the visited page until there is a backward reference.
Forward reference pages are considered as content pages and the path is taken as index pages. The primary usage to store sessions and pointers to secondary table which is having complete path navigation.
Keywords
Abstract Views: 282
PDF Views: 3