Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Analysis of Phishing Detection Using Logistic Regression and Random Forest


Affiliations
1 Department of Computer Science, CHRIST (Deemed to be University), Bangalore, Karnataka, India
     

   Subscribe/Renew Journal


In the present era of technology, cybercriminals are getting smarter day by day. From the past few years, the cybercrimes have increased to an extent that most of the big companies are finding it difficult to prevent cybercrimes. One such cyber-attack is phishing where the victims are lured in entering their sensitive information like usernames, passwords, bank details, etc. It’s very easy for an attacker to get sensitive information through phishing. The attacker should know some information about the victim’s profile so that the victims can be easily tricked. A phished URL that the victims receive is very tough to differentiate as looks similar to the original URL. In this paper, we have made use of the information in the URL to determine if the URL is phished or not. So, it is not necessary for the user to enter the website and expose themselves to the malicious code. We have also discussed the metadata that is present in the URL. In this paper, we also make use of metadata to classify a URL. Random forest and logistic regression are the two algorithms used to classify the URL present in the dataset as phished or not phished. After using the classification algorithm on the given datasets, we found that the random forest algorithm has better accuracy in classifying if a URL is legit.

Keywords

Classification, Cyber Attack, Logistic Regression, Phishing, Random Forest, URL Phishing.
Subscription Login to verify subscription
User
Notifications
Font Size


  • T. Dakpa, and P. Augustine, “Study of phishing attacks and preventions,” International Journal of Computer Applications, vol. 163, no. 2, pp. 5-8, April 2017.
  • R. G. Brody, E. V. Mulig, and V. Kimball, “Phishing, pharming and identity theft,” Academy of Accounting and Financial Studies Journal, vol. 11, no. 3, pp. 43-56, 2007.
  • A. Mahalakshmi, N. S. Goud, and G. V. Murthy, “A survey on phishing and it’s detection techniques based on support vector method (SVM) and software defined networking (SDN),” International Journal of Engineering and Advanced Technology, vol. 8, no. 2s, pp. 498-503, December 2018.
  • H. Thakur, and S. Kaur, “A survey paper on phishing detection,” International Journal of Advanced Research in Computer Science, vol. 7, no. 4, pp. 64-68, January 2017.
  • R. B. Basnet, and A. H. Sung, “Mining web to detect phishing URLs,” 2012 11th International Conference on Machine Learning and Applications, Boca Raton, FL, 2012.
  • S. Jagadeesan, A. Chaturvedi, and S. Kumar, “URL phishing analysis using random forest,” International Journal of Pure and Applied Mathematics, vol. 118, no. 20, pp. 4159-4163, 2018.
  • D. N. Pande, and P. S. Voditel, “Spear phishing: Diagnosing attack paradigm,” International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, pp. 2720-2724, 2017.
  • A. Subasi, E. Molah, F. Almkallawi, and T. J. Chaudhery, “Intelligent phishing website detection using random forest classifier,” 2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA), Ras Al Khaimah, pp. 1-5, 2017.
  • J. Wang, T. Herath, R. Chen, A. Vishwanath, and H. R. Rao, “Research article phishing susceptibility: An investigation into the processing of a targeted spear phishing email,” IEEE Transactions on Professional Communication, vol. 55, no. 4, pp. 345-362, December 2012.
  • C. Lin, C. Tien, C. Chen, C. Tien, and H. Pao, “Efficient spear-phishing threat detection using hypervisor monitor,” International Carnahan Conference on Security Technology (ICCST), Taipei, pp. 299-303, 2015.
  • UCI Machine Learning Repository. [Online]. Available: https://archive.ics.uci.edu/ml
  • J. C. S. Fatt, C. K. Leng, and S. S. Nah, “Phishdentity: Leverage website favicon to offset polymorphic phishing website,” 2014 Ninth International Conference on Availability, Reliability and Security, Fribourg, 2014.
  • T. Ayodele, “Types of machine learning algorithms,” New Advances in Machine Learning, 2010.
  • M. Khonji, Y. Iraqi, and A. Jones, “Phishing detection: A literature survey,” IEEE Communications Surveys & Tutorials, vol. 15, no. 4, pp. 2091-2121, Fourth Quarter 2013.
  • N. Stembert, A. Padmos, M. S. Bargh, S. Choenni, and F. Jansen, “A study of preventing email (Spear) phishing by enabling human intelligence,” 2015 European Intelligence and Security Informatics Conference, Manchester, 2015.
  • http://dataaspirant.com/2017/05/22/random-forest-algorithm -machine-learing/

Abstract Views: 302

PDF Views: 0




  • Analysis of Phishing Detection Using Logistic Regression and Random Forest

Abstract Views: 302  |  PDF Views: 0

Authors

S. Gokul
Department of Computer Science, CHRIST (Deemed to be University), Bangalore, Karnataka, India
P. K. Nizar Banu
Department of Computer Science, CHRIST (Deemed to be University), Bangalore, Karnataka, India

Abstract


In the present era of technology, cybercriminals are getting smarter day by day. From the past few years, the cybercrimes have increased to an extent that most of the big companies are finding it difficult to prevent cybercrimes. One such cyber-attack is phishing where the victims are lured in entering their sensitive information like usernames, passwords, bank details, etc. It’s very easy for an attacker to get sensitive information through phishing. The attacker should know some information about the victim’s profile so that the victims can be easily tricked. A phished URL that the victims receive is very tough to differentiate as looks similar to the original URL. In this paper, we have made use of the information in the URL to determine if the URL is phished or not. So, it is not necessary for the user to enter the website and expose themselves to the malicious code. We have also discussed the metadata that is present in the URL. In this paper, we also make use of metadata to classify a URL. Random forest and logistic regression are the two algorithms used to classify the URL present in the dataset as phished or not phished. After using the classification algorithm on the given datasets, we found that the random forest algorithm has better accuracy in classifying if a URL is legit.

Keywords


Classification, Cyber Attack, Logistic Regression, Phishing, Random Forest, URL Phishing.

References