Open Access Open Access  Restricted Access Subscription Access

Hybridization of Bag-of-Words and Forum Metadata for Web Forum Question Post Detection


Affiliations
1 Department of Computer Science, College of Science and Technology, Kaduna Polytechnic, P.M.B 2021, Kaduna, Nigeria
2 Faculty of Computing, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia
 

Background/Objective:A web forum is a problem-solving online community.Web forum research activitieshave been focused on answer mining with the assumption that the starting post is a question post. This paper proposes methods for mining standard web forum questions. Methods/Statistical Analysis:Popular methods for web forum question post detection are question mark, question words, higher n-grams and sequential pattern mining. These methods have problem of low detection rate and implementation complexity. Implemented in this paper is hybridization of simple bag-of-words model with web forum metadata, simple rule of question mark and question words. Dimensional reduction was performed using chi-square and wrapper techniques. Findings:The quality of web forum question posts varies from excellent to mediocre or even spam. Detecting good question posts is non-trivial. It requires utilization of salient features. Combination of simple rule of question mark and question words with forum metadata performed better than each of the two.Integration of bag-of-words model with simple rule of question marks, question words and forum metadata enhances question post detection. Dimensionality reduction using chi-square were found to perform better than other popular filters like info gain, gain ratio and symmetric uncertain. Applications/Improvements: Three publicly available datasets of varying technical degrees were used for the experiments.The experimental results revealed that an enhanced bag-of-words model can perform better than complex techniques that implement higher N-gram with part-of-speech tagging.

Keywords

Bag-of-words, Forum Metadata, Web Forum,Question Detection, Dimensionality Reduction, Web Forum Question
User

Abstract Views: 128

PDF Views: 0




  • Hybridization of Bag-of-Words and Forum Metadata for Web Forum Question Post Detection

Abstract Views: 128  |  PDF Views: 0

Authors

Adekunle Isiaka Obasa
Department of Computer Science, College of Science and Technology, Kaduna Polytechnic, P.M.B 2021, Kaduna, Nigeria
Naomie Salim
Faculty of Computing, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia
Atif Khan
Faculty of Computing, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia

Abstract


Background/Objective:A web forum is a problem-solving online community.Web forum research activitieshave been focused on answer mining with the assumption that the starting post is a question post. This paper proposes methods for mining standard web forum questions. Methods/Statistical Analysis:Popular methods for web forum question post detection are question mark, question words, higher n-grams and sequential pattern mining. These methods have problem of low detection rate and implementation complexity. Implemented in this paper is hybridization of simple bag-of-words model with web forum metadata, simple rule of question mark and question words. Dimensional reduction was performed using chi-square and wrapper techniques. Findings:The quality of web forum question posts varies from excellent to mediocre or even spam. Detecting good question posts is non-trivial. It requires utilization of salient features. Combination of simple rule of question mark and question words with forum metadata performed better than each of the two.Integration of bag-of-words model with simple rule of question marks, question words and forum metadata enhances question post detection. Dimensionality reduction using chi-square were found to perform better than other popular filters like info gain, gain ratio and symmetric uncertain. Applications/Improvements: Three publicly available datasets of varying technical degrees were used for the experiments.The experimental results revealed that an enhanced bag-of-words model can perform better than complex techniques that implement higher N-gram with part-of-speech tagging.

Keywords


Bag-of-words, Forum Metadata, Web Forum,Question Detection, Dimensionality Reduction, Web Forum Question



DOI: https://doi.org/10.17485/ijst%2F2015%2Fv8i32%2F123222