The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).

If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.

Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.

Fullscreen Fullscreen Off


Objectives: To develop a suitable model to study the behavior of web crawled dataset and perform simulation on the modeled data for better understanding of the system Methods/Statistical Analysis: M/M/1 model is a variation of Single Birth Single Death (SBSD) model which is applied to study the behavior of web crawled dataset for the Classification Problem. KanchiCrawler, a stylized focused web crawler is implemented to collect the data for this application. The size of the corpora (Population) is 500k. Control corpus (sample) can be drawn from the corpora based on enforcing certain pre-determined conditions. Findings: A 20-state model starting with an initial test corpus of 25k and then by gradually increasing with an increment of 25k up to 500k is developed. This is achieved through the computation of Forward State Transition Probability and Reverse State Transition Probability for the respective states. This model provides fairly good results by testing the algorithmic efficiency of a KanchiCrawler and to model the web crawled dataset for the classification problem. Applications: M/M/1 models are tractable and often used to model various operations of nature. In most situations where large numbers are involved, M/M/1 model are statistically stable and reflective of reality.

Keywords

Dataset Modeling, KanchiCrawler, M/M/1 Model, State Transition Probability.
User