A Brief Review on Preprocessing Text in Arabic Language Dataset: Techniques and Challenges

Ahmed Adil Nafea; Muhmmad Shihab Muayad; Russel R Majeed; Ashour  Ali; Omar M. Bashaddadh; Meaad Ali Khalaf; Abu Baker Nahid Sami; Amani Steiti

doi:10.58496/BJAI/2024/007

PDF

Published: 2024-05-18

DOI: https://doi.org/10.58496/BJAI/2024/007

Keywords:

Arabic language, Artificial Intelligence , Preprocessing, Natural Language Processing (NLP), Deep Learning, Machine Learning

Ahmed Adil Nafea

Department of Artificial Intelligence, College of Computer Science and IT, University of Anbar, Ramadi, Iraq

https://orcid.org/0000-0003-2293-1108

Muhmmad Shihab Muayad

Department of Computer Networking Systems, College of Computer and Information Technology University of Anbar, Anbar, Iraq

https://orcid.org/0009-0007-2366-1628

Russel R Majeed

College of Education for Pure Sciences, University of Thi-Qar, Thi-Qar, Iraq

https://orcid.org/0009-0002-4327-9067

Ashour Ali

Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), Bangi, Selangor, Malaysia

https://orcid.org/0000-0001-7266-1623

Omar M. Bashaddadh

Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), Bangi, Selangor, Malaysia

https://orcid.org/0009-0004-7266-4399

Meaad Ali Khalaf

Department of computer science, AUL University, Beirut, Lebanon

https://orcid.org/0009-0002-0463-2876

Abu Baker Nahid Sami

Department of Computer Science, University of Anbar Ramadi, Iraq

https://orcid.org/0009-0007-9507-8466

Amani Steiti

Department of Computer Systems And Networks, Faculty of Information Engineering, University Tishreen, Latakia, Syria.

Abstract

Text preprocessing plays an important role in natural language processing (NLP) tasks containing text classification, sentiment analysis, and machine translation. The preprocessing of Arabic text still presents unique challenges due to the language's rich morphology, complex grammar, and various character sets. This brief review studied various techniques utilized for preprocessing Arabic text data. This study discusses the challenges specific to Arabic text and current an overview of key preprocessing steps including normalization, tokenization, stemming, stop-word removal, and noise reduction. This survey analyzes preprocessing techniques on NLP tasks and focus on current research trends and future directions in Arabic text preprocessing.

Issue

Vol. 2024 (2024)

Section

Articles

How to Cite

A Brief Review on Preprocessing Text in Arabic Language Dataset: Techniques and Challenges (A. A. Nafea, M. S. Muayad, R. R. Majeed, A. Ali, O. M. Bashaddadh, M. A. Khalaf, A. B. N. Sami, & A. Steiti , Trans.). (2024). Babylonian Journal of Artificial Intelligence, 2024, 46-53. https://doi.org/10.58496/BJAI/2024/007

Article Sidebar

Main Article Content

Abstract

Article Details

Issue

Section

How to Cite