Applying Machine Learning Algorithms for Detecting Phishing Websites: Applications of SVM, KNN, Decision Trees, and Random Forests

Main Article Content

Olaolu Kayode-Ajala

Abstract

Phishing attacks present a significant risk to both individual and organizational data security. Such attacks often mimic legitimate websites to steal sensitive information. Traditional countermeasures like blacklists and rule-based systems have shown limitations in tackling this dynamic threat. This research applied machine learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Decision Trees, and Random Forests to automate and enhance the process of detecting phishing websites. A dataset of 6,157 benign and 4,898 phishing URLs was used for the purpose of this study. Each URL is characterized by 30 different features extracted from various sources, like WHOIS database and the webpage's HTML content, covering different aspects like SSL State, URL length, and the presence of specific symbols in the URL. SVM provided an accuracy rate of 95% with a precision of 0.95 and 0.94 for phishing and benign URLs, respectively. KNN demonstrated an overall accuracy of 94%, almost matching the SVM model's performance. Decision Trees and Random Forest models showed the highest accuracy of 96% and 97%, respectively. These models were found to be highly precise, demonstrating F1-scores above 0.93 for both classes. Important features contributing to the model's success were also identified, with SSL_State showing the highest level of importance across both Decision Trees and Random Forests models. Feature importance analysis revealed that the models rely heavily on specific features like "SSL_State," "URL_of_Anchor_External," and "Web_Traffic" for classification. Interestingly, these features also have moderate to strong correlations with the target variable, reinforcing their significance in phishing website detection.

Article Details

How to Cite
Kayode-Ajala, O. (2022). Applying Machine Learning Algorithms for Detecting Phishing Websites: Applications of SVM, KNN, Decision Trees, and Random Forests. International Journal of Information and Cybersecurity, 6(1), 43–61. Retrieved from https://publications.dlpress.org/index.php/ijic/article/view/41
Section
Articles