JMCER

Phishing Emails Detection Models: A Comparative Study

  • Received
    August 31, 2023
  • Revised
    September 15, 2023
  • Accepted
    October 16, 2023
  • Published
    October 16, 2023

Authors

  • Rian Sh. Al-Yozbaky
  • Mafaz Alanezi

Abstract:

Today, phishing emails are the largest problem affecting internet services because they upset customers and cost businesses money. Methods that use the Natural Language Processing (NLP) principles also have many limitations and exhibit a flawed performance, especially regarding the non-English languages (such as the Arabic languages), given the lack of NLP for the Arabic language and the fact that this language has a rich vocabulary that delivers the same grammar and meaning. In this paper, viewed the previous models presented by other researchers, and also presented their RAPH model for the purpose of phishing detection. Where the model relies in its work on textual analysis of the content of e- mail messages and compares them with special datasets that include most of the commonly used words in electronic phishing. The results showed the effectiveness of the RAPH model, as it achieved a correct detection rate for phishing messages with a rate of (98.4%), while it achieved an error rate for legal messages with a rate of 7.5%.

Keywords

Phishing email Detection, Natural Language Processing, Python Libraries, RAPH, NLP Features

References

APWG (2022a) ‘APWG Phishing Trends Report 2nd Quarter 2022’, Anti-Phishing Working Group (APWG) [Preprint], (September). Available at: http://www.apwg.org/.

APWG (2022b) ‘Phishing E-mail Reports and Phishing Site Trends 4 Brand-Domain Pairs Measurement 5 Brands & Legitimate Entities Hijacked by E-mail Phishing Attacks 6 Use of Domain Names for Phishing 7-9 Phishing and Identity Theft in Brazil 10-11 Most Targeted Industry’, APWG Phishing Activity Trends Report 1st Quarter 2022, 1(1), p. 13. Available at: https://docs.apwg.org/reports/apwg_trends_report_q1_2022.pdf.

Bountakas, P. and Xenakis, C. (2023) ‘HELPHED: Hybrid Ensemble Learning PHishing Email Detection’, Journal of Network and Computer Applications, 210, p. 103545. Available at: https://doi.org/https://doi.org/10.1016/j.jnca.2022.103545.

Burns, A.J., Johnson, M.E. and Caputo, D.D. (2019) ‘Spear phishing in a barrel: Insights from a targeted phishing campaign’, Journal of Organizational Computing and Electronic Commerce, 29(1), pp. 24–39. Available at: https://doi.org/10.1080/10919392.2019.1552745.

Butt, U.A. et al. (2022) ‘Cloud-based email phishing attack using machine and deep learning algorithm’, Complex and Intelligent Systems [Preprint]. Available at: https://doi.org/10.1007/s40747-022-00760-3.

Chandra, I. (2019) ‘Project_BTECH_20’.

Chernyshev, M., Zeadally, S. and Baig, Z. (2019) ‘Healthcare Data Breaches: Implications for Digital Forensic Readiness’, Journal of Medical Systems, 43(1). Available at: https://doi.org/10.1007/s10916-018-1123-2.

Christian, L. and MacLellan, S. (2018) ‘Governing Cyber Security in Canada, Australia and the United States’, Center for International Governance Innovation, p. 48. Available at: https://www.cigionline.org/sites/default/files/documents/SERENE-RISCweb.pdf.

Dutta, A.K. (2021) ‘Detecting phishing websites using machine learning technique’, PLoS ONE, 16(10 October), pp. 1–17. Available at: https://doi.org/10.1371/journal.pone.0258361.

Fang, Y. et al. (2019a) ‘Phishing Email Detection Using Improved RCNN Model With Multilevel Vectors and Attention Mechanism’, IEEE Access, 7, pp. 56329–56340. Available at: https://doi.org/10.1109/ACCESS.2019.2913705.

Fang, Y. et al. (2019b) ‘Phishing Email Detection Using Improved RCNN Model With Multilevel Vectors and Attention Mechanism’, IEEE Access, 7, pp. 56329–56340. Available at: https://doi.org/10.1109/ACCESS.2019.2913705.

Halgaš, L., Agrafiotis, I. and Nurse, J.R.C. (2020) ‘Catching the Phish: Detecting Phishing Attacks Using Recurrent Neural Networks (RNNs)’, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11897 LNCS, pp. 219–233. Available at: https://doi.org/10.1007/978-3-030-39303-8_17.

Hameed, A. (2022) User ticketing system with automatic resolution suggestions. BARCELONA.

Hameed, M.A. and Gamagedara, N.A. (2016) ‘A model for the adoption process of information system security innovations in organisations: A theoretical perspective’, Proceedings of the 27th Australasian Conference on Information Systems, ACIS 2016, pp. 1–12.

Hiransha, M. et al. (2018) ‘Deep learning based phishing E-mail detection CEN-Deepspam’, CEUR Workshop Proceedings, 2124(Iwspa), pp. 16–20.

Ingle, P., Kanade, H. and Lanke, A. (2016) ‘Voice Based Email System for Blinds Introduction’, 3(1), pp. 25–30.

Journal, I. (2022) ‘Design and Implementation of LIDAR System for Distance Measurements’, Interantional Journal of Scientific Research in Engineering and Management, 06(10). Available at: https://doi.org/10.55041/ijsrem16626.

Lee, J. et al. (2021) ‘D-Fence: A flexible, efficient, and comprehensive phishing email detection system’, Proceedings – 2021 IEEE European Symposium on Security and Privacy, Euro S and P 2021, (September), pp. 578–597. Available at: https://doi.org/10.1109/EuroSP51992.2021.00045.

Lee, Y., Saxe, J. and Harang, R. (2020) ‘CATBERT: Context-Aware Tiny BERT for Detecting Social Engineering Emails’. Available at: http://arxiv.org/abs/2010.03484.

Maleki, N. and Ghorbani, A.A. (2019) ‘A Behavioral Based Detection Approach for Business Email Compromises’. Available at: https://unbscholar.lib.unb.ca/islandora/object/unbscholar%3A10122.

Alanezi, M., & Mahmood, B. (2021, October). Projecting Social Networks in Dynamic Environments for Tracking Purposes. In 2021 2nd International Conference on ICT for Rural Development (IC-ICTRuDev) (pp. 1-5). IEEE.

Paul, R. and Mukhopadhyay, N. (2021) ‘A Novel Python-based Voice Assistance System for reducing the Hardware Dependency of Modern Age Physical Servers’, International Research Journal of Engineering and Technology, (May), pp. 1425–1431. Available at: www.irjet.net.

Peng, T., Harris, I. and Sawa, Y. (2018) ‘Detecting Phishing Attacks Using Natural Language Processing and Machine Learning’, Proceedings – 12th IEEE International Conference on Semantic Computing, ICSC 2018, 2018-Janua, pp. 300–301. Available at: https://doi.org/10.1109/ICSC.2018.00056.

Ripa, S.P., Islam, F. and Arifuzzaman, M. (2021) ‘The Emergence Threat of Phishing Attack and The Detection Techniques Using Machine Learning Models’, in 2021 International Conference on Automation, Control and Mechatronics for Industry 4.0 (ACMI), pp. 1–6. Available at: https://doi.org/10.1109/ACMI53878.2021.9528204.

Saabith, A.L.S., Vinothraj, T. and Fareez, M.M.M. (2021) ‘A review on Python libraries and Ides for Data Science’, International Journal of Research in Engineering and Science, 09(11), pp. 36–53. Available at: https://www.ijres.org/papers/Volume-9/Issue-11/Ser-2/G09113653.pdf%0Ahttps://www.researchgate.net/profile/Vinothraj-Thangarajah/publication/357898994_A_Review_on_Python_Libraries_and_IDEs_for_Data_Science/links/620249344d89183b338b49c2/A-Review-on-Python-.

Salahdine, F., Mrabet, Z. El and Kaabouch, N. (2021) ‘2021 IEEE 12th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, UEMCON 2021’, 2021 IEEE 12th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, UEMCON 2021 [Preprint].

Smith, C. (2017) ‘Arrow Documentation’.

Sonowal, G. (2020a) ‘Phishing Email Detection Based on Binary Search Feature Selection’, SN Computer Science, 1(4). Available at: https://doi.org/10.1007/s42979-020-00194-z.

Sonowal, G. (2020b) ‘Phishing Email Detection Based on Binary Search Feature Selection’, SN Computer Science, 1(4), pp. 1–14. Available at: https://doi.org/10.1007/s42979-020-00194-z.

Srinivas, P. et al. (2020) ‘Raspberry Pi Based Personal Voice Assistant Using Python’, International Journal of Engineering Applied Sciences and Technology, 04(11), pp. 105–108. Available at: https://doi.org/10.33564/ijeast.2020.v04i11.020.

Thripuranthakam, L. et al. (2022) ‘Stock Market Prediction Using Machine Learning and Twitter Sentiment Analysis: A Survey’, International Journal of Research in Engineering, Science and Management, 5(4), pp. 144–149. Available at: http://journals.resaim.com/ijresm/article/view/1968.

Verma, A. and Sharma, B. (2022) ‘Dynamic E-Certificate Designing with Automatic Mailing System using Python and SQLite3’, (October). Available at: https://doi.org/10.13140/RG.2.2.17907.20000.

Wei, B. et al. (2019) ‘A deep-learning-driven light-weight phishing detection sensor’, Sensors (Switzerland), 19(19), pp. 1–13. Available at: https://doi.org/10.3390/s19194258.

Yin, T. and Henter, R. (2018) ‘Translate Python Documentation’.

Zerrouki, T. (2023) ‘PyArabic : A Python package for Arabic text’, Journal of Open Source Software, 8, pp. 10–15. Available at: https://doi.org/10.21105/joss.04886.

[fbcomments]