Fraudulent claims have been a big drawback in motor insurance despite the insurance industry having vast amounts of motor claims data. Analyzing this data can lead to a more efficient way of detecting reported fraudulent claims. The challenge is how to extract insightful information and knowledge from this data and use it to model a fraud detection system. Due to constant evolution and dynamic nature of fraudsters, some approaches utilized by insurance firms, such as impromptu audits, whistle-blowing, staff rotation have become infeasible. Machine learning techniques can aid in fraud detection by training a prediction model using historical data. The performance of the models is affected by class imbalance and the determination of the most relevant features that might lead to fraud detection from data. In this paper we examine various fraud detection techniques and compare their performance efficiency. We then give a summary of techniques’ strengths and weaknesses in identifying claims as either fraudulent or non-fraudulent, and finally propose a fraud detection framework of an ensemble model that is trained on dataset balanced using SMOTE and with relevant features only. This proposed approach would improve performance and reduce false positives.
Keywords
Insurance, Fraud, Class Imbalance, SMOTE, Feature Selection, Ensemble Learning
User
Font Size
Information