An  English-Swahili Email Spam Detection Model for Improved Accuracy Using Convolutional Neural Networks

Leshan  Sankaine; John G.  Ndia; Dennis  Kaburu

doi:10.58496/MJCS/2025/036

PDF

Published: 2025-06-27

DOI: https://doi.org/10.58496/MJCS/2025/036

Keywords:

Email Spam, Machine Learning, CNN, Bilingual, NLP, Cybersecurity, Swahili Spam Detection

Leshan Sankaine

School of Computing and Informatics, Mount Kenya University, Thika, Kenya

https://orcid.org/0009-0009-4414-8473

John G. Ndia

School of Computing and Information Technology, Murang’a University of Technology, Murang’a, Kenya

https://orcid.org/0000-0002-9223-8096

Dennis Kaburu

School of Computing and Information Technology, Jomo Kenya University of Agriculture and Technology, Thika, Kenya.

https://orcid.org/0000-0003-1850-3418

Abstract

E-mail has become an essential tool for digital communication, facilitating global networking and information exchange. However, spam emails, particularly those in multilingual contexts, pose a significant threat to cybersecurity. In 2023, cyber-related attacks cost Africa approximately USD 10 billion, with the Kenyan economy suffering losses of USD 383 million, 45% of which resulted from phishing and spam emails. While spam detection has been extensively studied for English, low-resource languages such as Swahili lack sufficient research and datasets. Swahili is spoken by about approximately 200 million people, mainly from East Africa. The same speakers use English as a medium of communication. This, therefore, highlights the need to research English-Swahili spam detection. This study recommends a convolutional neural network (CNN)-based model to increase spam detection accuracy in English-Swahili emails. The dataset comprises 8,829 ham emails and 2,749 spam emails, totaling 11,578 messages. The model was trained and evaluated via accuracy, precision, recall, and F1- score metrics. The results indicate a 99.4% accuracy rate, 99.3% precision, 98.2% precision, and 98.7% F1 score. These findings demonstrate good performance and effectiveness.

Issue

Vol. 5 No. 2 (2025)

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

An English-Swahili Email Spam Detection Model for Improved Accuracy Using Convolutional Neural Networks (L. . Sankaine, J. G. . Ndia, & D. . Kaburu , Trans.). (2025). Mesopotamian Journal of CyberSecurity, 5(2), 590-605. https://doi.org/10.58496/MJCS/2025/036

References

[1] J. Sharma, Sonia, K. Kumar, P. Jain, R. H. C. Alfilh, and H. Alkattan, “Enhancing Intrusion Detection Systems with Adaptive Neuro-Fuzzy Inference Systems,” Mesopotamian J. CyberSecurity, vol. 5, no. 1, pp. 1–10, 2025, doi: 10.58496/MJCS/2025/001.

[2] A. M. Salman, B. T. Al-nuaimi, A. A. Subhi, H. Alkattan, and R. H. C. Alfilh, “Enhancing Cybersecurity with Machine Learning : A Hybrid Approach for Anomaly Detection and Threat Prediction,” vol. 5, no. 1, pp. 202–215, 2025.

[3] K. Aparna and S. Halder, “Detection of Multilingual Spam SMS Using NaïveBayes Classifier,” 5th IEEE Int. Conf. Cybern. Cogn. Mach. Learn. Appl. ICCCMLA 2023, pp. 89–94, 2023, doi: 10.1109/ICCCMLA58983.2023.10346960.

[4] I. Panda and S. Dash, “A Review on Enhancing Spam detection With Advance Machine learning,” vol. 12, no. 1, pp. 17–22, 2024.

[5] D. Teja, S. K. Kumar, and D. M. Chandra, “Chat Analysis and Spam Detection of Whatsapp using Machine Learning Chat Analysis and Spam Detection of Whatsapp using Machine Learning,” no. November, 2023.

[6] A. Karim, S. Azam, B. Shanmugam, and K. Kannoorpatti, “Efficient Clustering of Emails into Spam and Ham: The Foundational Study of a Comprehensive Unsupervised Framework,” IEEE Access, vol. 8, pp. 154759–154788, 2020, doi: 10.1109/ACCESS.2020.3017082.

[7] J. Cao and C. Lai, “A bilingual multi-type spam detection model based on M-BERT,” Proc. - IEEE Glob. Commun. Conf. GLOBECOM, vol. 2020-Janua, 2020, doi: 10.1109/GLOBECOM42002.2020.9347970.

[8] S. Rao, A. K. Verma, and T. Bhatia, “A review on social spam detection: Challenges, open issues, and future directions,” Expert Syst. Appl., vol. 186, no. March, p. 115742, 2021, doi: 10.1016/j.eswa.2021.115742.

[9] Z. Zhang, Z. Deng, W. Zhang, and L. Bu, “MMTD: A Multilingual and Multimodal Spam Detection Model Combining Text and Document Images,” Appl. Sci., vol. 13, no. 21, p. 11783, 2023, doi: 10.3390/app132111783.

[10] A. Iyengar, G. Kalpana, S. Kalyankumar, and S. Gunanandhini, “Integrated SPAM detection for multilingual emails,” 2017 Int. Conf. Inf. Commun. Embed. Syst. ICICES 2017, no. Icices, pp. 2–5, 2017, doi: 10.1109/ICICES.2017.8070784.

[11] K. I. Roumeliotis, N. D. Tselikas, and D. K. Nasiopoulos, “Next-Generation Spam Filtering: Comparative Fine-Tuning of LLMs, NLPs, and CNN Models for Email Spam Classification,” Electron., vol. 13, no. 11, pp. 1–24, 2024, doi: 10.3390/electronics13112034.

[12] P. Li and Q. Xu, Computational Modeling of Bilingual Language Learning: Current Models and Future Directions, vol. 73, no. December 2023. 2023. doi: 10.1111/lang.12529.

[13] V. R. Chirra, H. D. Maddiboyina, Y. Dasari, and R. Aluru, “Review of Computer Engineering Studies Performance Evaluation of Email Spam Text Classification Using Deep Neural Networks,” vol. 7, no. 4, pp. 91–95, 2020.

[14] V. S. Tida and S. H. Hsu, “Universal Spam Detection using Transfer Learning of BERT Model,” Proc. 55th Hawaii Int. Conf. Syst. Sci., 2022, doi: 10.24251/hicss.2022.921.

[15] A. M. Al-Zoubi, A. M. Mora, and H. Faris, “A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and Weighted Swarm Support Vector Machines,” IEEE Access, vol. 11, no. June, pp. 72250–72271, 2023, doi: 10.1109/ACCESS.2023.3293641.

[16] S. Salloum, Enhancing Cybersecurity: Machine Learning and Natural Language Processing for Arabic Phishing Email Detection. 2024. [Online]. Available: https://salford-repository.worktribe.com/output/2363839

[17] I. S. Mambina, J. D. Ndibwile, D. Uwimpuhwe, and K. F. Michael, “Uncovering SMS Spam in Swahili Text Using Deep Learning Approaches,” IEEE Access, vol. 12, no. January, pp. 25164–25175, 2024, doi: 10.1109/ACCESS.2024.3365193.

[18] R. Priyangka, “Multilingual Spam Detection Using Random Forest,” vol. 12, no. 04, pp. 336–338, 2023.

[19] R. F. Busyra and A. S. Girsang, “Applying Long Short-Term Memory Algorithm for Spam Detection on Ministry Websites,” J. Syst. Manag. Sci., vol. 14, no. 2, pp. 1–20, 2024, doi: 10.33168/JSMS.2024.0201.

[20] M. Tuan Vu, Q. A. Tran, F. Jiang, and V. Q. Tran, “Multilingual Rules for Spam Detection,” J. Mach. to Mach. Commun., vol. 1, no. 2, pp. 107–122, 2015, doi: 10.13052/jmmc2246-137x.122.

[21] E. Ramanujam, K. Shankar, and A. Sharma, “Multi-lingual Spam SMS detection using a hybrid deep learning technique,” Proc. - 2022 IEEE Silchar Subsect. Conf. SILCON 2022, pp. 1–6, 2022, doi: 10.1109/SILCON55242.2022.10028936.

[22] R. A. Omar and A. Tjahyanto, “Evaluation of the performance of a machine learning algorithms in Swahili-English emails filtering system relative to Gmail classifier,” 2018 Int. Conf. Inf. Commun. Technol. ICOIACT 2018, vol. 2018-Janua, pp. 266–269, 2018, doi: 10.1109/ICOIACT.2018.8350713.

[23] E. H. Tusher, M. A. Ismail, M. A. Rahman, A. H. Alenezi, and M. Uddin, “Email Spam: A Comprehensive Review of Optimize Detection Methods, Challenges, and Open Research Problems,” IEEE Access, vol. 12, no. October, pp. 143627–143657, 2024, doi: 10.1109/ACCESS.2024.3467996.

[24] L. Forrester and C. Ruiz, “Classical Theories of Criminology: Deterrence,” Introd. to Criminol. Crim. Justice, 2024.

[25] L. H. Son, A. Kumar, S. R. Sangwan, A. Arora, A. Nayyar, and M. Abdel-Basset, “Sarcasm detection using soft attention-based bidirectional long short-term memory model with convolution network,” IEEE Access, vol. 7, pp. 23319–23328, 2019, doi: 10.1109/ACCESS.2019.2899260.

[26] S. Biere and M. B. Analytics, “Hate Speech Detection Using Natural Language Processing Techniques,” Vrije Univ. Amsterdam, p. 30, 2018.

[27] P. Teja Nallamothu and M. Shais Khan, “Machine Learning for SPAM Detection,” Asian J. Adv. Res., vol. 6, no. 1, pp. 167–179, 2023.

[28] R. Y. Choi, A. S. Coyner, J. Kalpathy-Cramer, M. F. Chiang, and J. Peter Campbell, “Introduction to machine learning, neural networks, and deep learning,” Transl. Vis. Sci. Technol., vol. 9, no. 2, pp. 1–12, 2020, doi: 10.1167/tvst.9.2.14.

[29] L. Anselin, An Introduction to Spatial Data Science with GeoDa: Volume 1: Exploring Spatial Data. CRC Press, 2024.

[30] W. J. Youden, “Statistical Techniques,” NBS Spec. Publ., no. 300–301, p. 421, 1969.