Open Access
Subscription Access
Hybridization of Bag-of-Words and Forum Metadata for Web Forum Question Post Detection
Background/Objective:A web forum is a problem-solving online community.Web forum research activitieshave been focused on answer mining with the assumption that the starting post is a question post. This paper proposes methods for mining standard web forum questions. Methods/Statistical Analysis:Popular methods for web forum question post detection are question mark, question words, higher n-grams and sequential pattern mining. These methods have problem of low detection rate and implementation complexity. Implemented in this paper is hybridization of simple bag-of-words model with web forum metadata, simple rule of question mark and question words. Dimensional reduction was performed using chi-square and wrapper techniques. Findings:The quality of web forum question posts varies from excellent to mediocre or even spam. Detecting good question posts is non-trivial. It requires utilization of salient features. Combination of simple rule of question mark and question words with forum metadata performed better than each of the two.Integration of bag-of-words model with simple rule of question marks, question words and forum metadata enhances question post detection. Dimensionality reduction using chi-square were found to perform better than other popular filters like info gain, gain ratio and symmetric uncertain. Applications/Improvements: Three publicly available datasets of varying technical degrees were used for the experiments.The experimental results revealed that an enhanced bag-of-words model can perform better than complex techniques that implement higher N-gram with part-of-speech tagging.
Keywords
Bag-of-words, Forum Metadata, Web Forum,Question Detection, Dimensionality Reduction, Web Forum Question
User
Information
Abstract Views: 128
PDF Views: 0