Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Analysis of Phishing Detection Using Logistic Regression and Random Forest


Affiliations
1 Department of Computer Science, CHRIST (Deemed to be University), Bangalore, Karnataka, India
     

   Subscribe/Renew Journal


In the present era of technology, cybercriminals are getting smarter day by day. From the past few years, the cybercrimes have increased to an extent that most of the big companies are finding it difficult to prevent cybercrimes. One such cyber-attack is phishing where the victims are lured in entering their sensitive information like usernames, passwords, bank details, etc. It’s very easy for an attacker to get sensitive information through phishing. The attacker should know some information about the victim’s profile so that the victims can be easily tricked. A phished URL that the victims receive is very tough to differentiate as looks similar to the original URL. In this paper, we have made use of the information in the URL to determine if the URL is phished or not. So, it is not necessary for the user to enter the website and expose themselves to the malicious code. We have also discussed the metadata that is present in the URL. In this paper, we also make use of metadata to classify a URL. Random forest and logistic regression are the two algorithms used to classify the URL present in the dataset as phished or not phished. After using the classification algorithm on the given datasets, we found that the random forest algorithm has better accuracy in classifying if a URL is legit.

Keywords

Classification, Cyber Attack, Logistic Regression, Phishing, Random Forest, URL Phishing.
Subscription Login to verify subscription
User
Notifications
Font Size



  • Analysis of Phishing Detection Using Logistic Regression and Random Forest

Abstract Views: 413  |  PDF Views: 0

Authors

S. Gokul
Department of Computer Science, CHRIST (Deemed to be University), Bangalore, Karnataka, India
P. K. Nizar Banu
Department of Computer Science, CHRIST (Deemed to be University), Bangalore, Karnataka, India

Abstract


In the present era of technology, cybercriminals are getting smarter day by day. From the past few years, the cybercrimes have increased to an extent that most of the big companies are finding it difficult to prevent cybercrimes. One such cyber-attack is phishing where the victims are lured in entering their sensitive information like usernames, passwords, bank details, etc. It’s very easy for an attacker to get sensitive information through phishing. The attacker should know some information about the victim’s profile so that the victims can be easily tricked. A phished URL that the victims receive is very tough to differentiate as looks similar to the original URL. In this paper, we have made use of the information in the URL to determine if the URL is phished or not. So, it is not necessary for the user to enter the website and expose themselves to the malicious code. We have also discussed the metadata that is present in the URL. In this paper, we also make use of metadata to classify a URL. Random forest and logistic regression are the two algorithms used to classify the URL present in the dataset as phished or not phished. After using the classification algorithm on the given datasets, we found that the random forest algorithm has better accuracy in classifying if a URL is legit.

Keywords


Classification, Cyber Attack, Logistic Regression, Phishing, Random Forest, URL Phishing.

References