Improving arabic sentiment analysis on social media: a comparative study on applying different pre-processing techniques

Essam Kazem Al-Yasiri; Ahmed Al-Azawei

Authors

Al-Yasiri EK Department of Software, College of Information Technology, University of Babylon, Iraq
Al-Azawei A Department of Software, College of Information Technology, University of Babylon, Iraq

Keywords:

Social Networking Sites, Arabic sentiment analysis, Pre-processing techniques, Classifying Arabic text, Data mining algorithms

Abstract

Regardless of the clear growth of Arabic texts on social networking sites (SNSs), it is still difficult to understand or summarize users' opinions or perspectives on a specific topic. Accordingly, Arabic text classification is one of the most challenging topics. This is because of several issues related to the nature of the Arabic language and words that have different variation in meaning. In this paper, after tokenizing the Arabic words, we investigate the role of several pre-processing techniques before classifying Arabic text into different categories. Arabic words were converted into vectors using the term frequency-inverse document frequency (TF-IDF) technique. The findings show that applying Linear Support Vector Machine (LSVC) with stop words and without stemming techniques can outperform the application of Decision Tree (DT) and Random Forest (RF) methods. It was found that the effectiveness of the proposed LSVC is 99.37%. These outcomes are significant to identify users' opinions on SNSs and can have many implications on political, social, economic, and business sectors.

References

N. A. Abdulla, N. A. Ahmed, M. A. Shehab, and M. Al-Ayyoub, “Arabic sentiment analysis: Lexicon-based and corpus-based,” in 2013 IEEE Jordan conference on applied electrical engineering and computing technologies (AEECT), 2013, pp. 1–6.

A. Al-Azawei, “Predicting the adoption of social media: An integrated model and empirical study on Facebook usage,” Interdiscip. J. Information, Knowledge, Manag., vol. 13, pp. 233–258, 2018.

“The Statistics Portal.” [Online]. Available: https://www.statista.com/statistics/303681/twitterusers-worldwide/. [Accessed: 30-Nov-2018].

A. A. Jamal, R. O. Keohane, D. Romney, and D. Tingley, “Anti-Americanism and antiinterventionism in Arabic twitter discourses,”

Perspect. Polit., vol. 13, no. 1, pp. 55–73, 2015.

E. Haddi, X. Liu, and Y. Shi, “The role of text preprocessing in sentiment analysis,” Procedia Comput. Sci., vol. 17, pp. 26–32, 2013.

A. Wahbeh, M. Al-Kabi, Q. Al-Radaideh, E. AlShawakfa, and I. Alsmadi, “The effect of stemming on Arabic text classification: an empirical study,” Int. J. Inf. Retr. Res., vol. 1, no. 3, pp. 54–70, 2011.

A. Ayedh, G. Tan, K. Alwesabi, and H. Rajeh, “The effect of preprocessing on Arabic document categorization,” Algorithms, vol. 9, no. 2, p. 27, 2016.

S. F. Sayeedunnissa, A. R. Hussain, and M. A. Hameed, “Supervised opinion mining of social network data using a bag-of-words approach on the cloud,” in Proceedings of Seventh International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA 2012), 2013, pp. 299–309.

S. A. Yousif, V. W. Samawi, I. Elkabani, and R. Zantout, “Enhancement of Arabic Text Classification Using Semantic Relations with Part

of Speech Tagger,” W Trans. Adv. Electr. Comput. Eng., pp. 195–201, 2015.

Y. A. Alhaj, J. Xiang, D. Zhao, M. A. A. AlQaness, M. A. Elaziz, and A. Dahou, “A Study of the Effects of Stemming Strategies on Arabic

Document Classification,” IEEE Access, 2019.

R. M. Sallam, H. M. Mousa, and M. Hussein, “Improving Arabic text categorization using normalization and stemming techniques,” Int. J. Comput. Appl., vol. 135, no. 2, pp. 38–43, 2016.

B. Al-Shargabi, W. Al-Romimah, and F. Olayah, “A comparative study for Arabic text classification algorithms based on stop words elimination,” in Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications, 2011, p. 11.

A. M. Alayba, V. Palade, M. England, and R. Iqbal, “Improving sentiment analysis in Arabic using word representation,” in 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), 2018, pp. 13–18.

F. S. Gharehchopogh and Z. A. Khalifelu, “Analysis and evaluation of unstructured data: Text mining versus natural language processing,” Int. J. Acad. Res. Comput. Eng., no. November, 2011.

C. Aroran and Dr.Rachna, “Sentiment Analysis on Twitter Data,” Int. Res. J. Eng. Technol., vol. 14, no. 2, pp. 831–837, 2017.

F. Thabtah, O. Gharaibeh, and R. Al-Zubaidy, “Arabic text mining using rule based classification,” J. Inf. Knowl. Manag., vol. 11, no.

, p. 1250006, 2012.

T. Kanan and E. A. Fox, “Automated Arabic text classification with P-S temmer, machine learning, and a tailored news article taxonomy,” J. Assoc. Inf. Sci. Technol., vol. 67, no. 11, pp. 2667–2683, 2016.

D. Sarkar, Text analytics with Python: A practical real-world approach to gaining actionable insights from your data. Apress, New York, 2016.

A. Krouska, C. Troussas, and M. Virvou, “The effect of preprocessing techniques on Twitter sentiment analysis,” in 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA), 2016, pp. 1–5.

Z. Li, “A data classification algorithm of internet of things based on neural network,” Int. J. Online Eng., vol. 13, no. 09, pp. 28–37, 2017.

X. Wu, and V. Kumar, “The Top Ten Algorithms in Data Mining,”Data Mining and Knowledge Discovery Series, CRC Press, United Statesof America, 2009.

G. Stein, B. Chen, A. S. Wu, and K. A. Hua, “Decision tree classifier for network intrusion detection with GA-based feature selection,” in

Proceedings of the 43rd annual Southeast regional conference-Volume 2, 2005, pp. 136–141.

S. Agarwal, G. N. Pandey, and M. D. Tiwari, “Data mining in education: data classification and decision tree approach,” Int. J. e-Education, eBusiness, e-Management e-Learning, vol. 2, no. 2, p. 140, 2012.

T. A. Wotaifi and E. S. Al-Shamery, “Fuzzy-Filter Feature Selection for Envisioning the Earnings of Higher Education Graduates,” Compusoft, vol. 7, no. 12, pp. 2969–2975, 2018.

A.-L. Boulesteix, S. Janitza, J. Kruppa, and I. R. König, “Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 2, no. 6, pp. 493–507, 2012.

B. Luo, Q. Zhang, and S. D. Mohanty, “DataDriven Exploration of Factors Affecting Federal Student Loan Repayment,” arXiv Prepr.

arXiv1805.01586, 2018.

H. M. Habeeb, A. Al-Azawei, and N. Al-A’araji, “Developing a Healthcare Recommender System Using an Enhanced Symptoms-Based Collaborative Filtering Technique,” J. Comput. Theor. Nanosci., vol. 16, no. 3, pp. 925–931, 2019.

Improving arabic sentiment analysis on social media: a comparative study on applying different pre-processing techniques

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Download

Indexing

Information