An expandable and up-to-date lexicon for sentiment analysis of Arabic Tweets
Keywords:
Sentiment analysis, Lexicon-based approach, Social media, Modern Standard Arabic (MSA)Abstract
Sentiment analysis is the process of identifying the subjective opinion within a text. And it gains a huge interest due to its several benefits in developing economy, politic, and sociology. And since twitter is considered a rich source of people’s thoughts and opinions, it is urged to benefit from it to explore public opinions. Many researches have been conducted for English language, while Arabic language still got limited number of sentiment analysis studies, especially in the context of Arab dialects in social media. A lexicon-based approach is adopted to perform sentiment analysis on Arabic tweets, which rely on detecting sentiment words. These sentiment words are loaded in a sentiment lexicon where words are annotated by its sentiment polarity. One of the main issues of handling Arabic tweets is the changing nature of twitter, where new words that imply sentiment values emerged, and many slang words are evolved. In this paper, an expandable and up-to-date lexicon for Arabic (EULA) is developed to overcome the issue of inventing new words and phrases in social media. EULA rely on a pre-built lexicon of MSA sentiment words, and a set of rules to expand and enrich it with dialectical polarity words from a small amount of labelled tweets, and a large amount of unlabeled tweets. For evaluation, eight different corpuses of Arabic tweets were selected. And a pre-processing phase that includes normalization and stemming is implemented to reduce the number of unique words to be analyzed for sentiment analysis. Experiments show that EULA improved the lexicon-based approach`s accuracy and F-1 score by more than 20% on average.
References
J. Eisenstein, “What to do about bad language on the internet,” Naacl-Hlt, pp. 359–369, 2013.
S. Golder and M. Macy, “Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures,” Science, vol. 339, no. February, pp. 819–822, 2013.
S. Volkova, T. Wilson, and D. Yarowsky, “Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams,” Proc. 51st Annu. Meet. Assoc. Comput. Linguist. (Volume 2 Short Pap., pp. 505–510, 2013.
E. Refaee and V. Rieser, “Benchmarking Machine Translated Sentiment Analysis for Arabic Tweets,” Proc. NAACL-HLT 2015 Student Res. Work., pp. 71–78, 2015.
H. Elsahar and S. R. El-Beltagy, “A fully automated approach for Arabic slang lexicon extraction from microblogs,” Lect. Notes Comput. Sci., vol. 8403 LNCS, no. PART 1, pp. 79–91, 2014.
E. Riloff, and J. Wiebe, “Learning extraction patterns for subjective expressions,” Proc. 2003 Conf. Empir. methods Nat. Lang. Process. -
, pp. 105–112, 2003.
J. Wiebe, T. Wilson, and C. Cardie, “Annotating expressions of opinions and emotions in language,” Lang. Resour. Eval., vol. 39, no. 2–3, pp. 165–210, 2005.
P. Turney, “Thumbs up or thumbs down? Semantic Orientation applied to Unsupervised Classification of Reviews,” Proc. 40th Annu. Meet. Assoc. Comput. Linguist., no. July, pp. 417–424, 2002.
A. Aue and M. Gamon, “Customizing Sentiment Classifiers to New Domains: A Case Study.,” Proc. Recent Adv. Nat. Lang. Process., vol. 3, no. 3, pp. 16–18, 2005.
L. Albraheem and H. S. Al-Khalifa, “Exploring the problems of sentiment analysis in informal Arabic,” Proc. 14th Int. Conf. Inf. Integr. Web-based Appl. Serv. - IIWAS ’12, p. 415, 2012.
R. M. Duwairi, N. A. Ahmed, and S. Y. Al-Rifai, “Detecting sentiment embedded in Arabic social media - A lexicon-based approach,” J. Intell. Fuzzy Syst., vol. 29, no. 1, pp. 107–117, 2015.
H. S. Ibrahim, S. M. Abdou, and M. Gheith, “Idioms-proverbs lexicon for modern standard Arabic and colloquial sentiment analysis,” Int. J. Nat. Lang. Comput., vol. 4, no. 2, pp. 95–109, 2015.
Samhaa R. El-Beltagy, “WeightedNileULex: A Scored Arabic Sentiment Lexicon for Improved Sentiment Analysis,” Lang. Process. Pattern Recognit. Intell. Syst. Spec. Issue Comput. Linguist. Speech& Image Process. Arab. Lang., no. February, 2017.
S. M. Mohammad, M. Salameh, and S. Kiritchenko, “How translation alters sentiment,” J. Artif. Intell. Res., vol. 55, no. January, pp. 95–130, 2016.
Shoeb, M., & Gupta, V. K. (2013). A crypt analysis of the tiny encryption algorithm in key generation. International Journal of Communication and Computer Technologies, 1(38).
M. Abdul-mageed, M. Diab, and S. Kübler, “SAMAR : Subjectivity and sentiment analysis for Arabic social media,” Comput. Speech Lang., pp. 1–18, 2013.
P. D. Turney and M. L. Littman, “Measuring Praise and Criticism: Inference of Semantic Orientation from Association,” ACM Trans. Inf. Syst., vol. 21, no. 4, pp. 315–346, 2003.
H. Mobarz, M. Rashown, and I. Farag, “using automated lexical resources in arabic sentence subjectivity,” Int. J. Artif. Intell. Appl., vol. 5, no. 6, 2014.
H. S. Ibrahim, S. M. Abdou, and M. Gheith, “sentiment analysis for modern standard arabic and colloquial,” 2015 IEEE 2nd Int. Conf. Recent Trends Inf. Syst. ReTIS 2015 - Proc., vol. 4, no. 2, pp. 353–358, 2015.
M. Al-ayyoub, S. B. Essa, and I. Alsmadi, “Lexicon-based sentiment analysis of Arabic tweets,” Int. J. Soc. Netw. Min., vol. 2, no. July 2016, pp. 101–114, 2015.
S. El-Beltagy and A. Ali, “Open issues in the sentiment analysis of Arabic social media: A case study,” … Inf. Technol. (IIT), 2013 9th
…, pp. 1–6, 2013.
N. A. Abdulla, N. A. Ahmed, M. A. Shehab, M. Al-Ayyoub, M. N. Al-Kabi, and S. Al-rifai, “Towards Improving the Lexicon-Based Approach for Arabic Sentiment Analysis,” International Journal of Information Technology and Web Engineering, vol. 9, no. 3. pp. 55–71, 2014.
shah, a., sanghvi, k., sureja, d., & seth, a. (2018). insilico drug design and molecular docking studies of some natural products as tyrosine kinase inhibitors. international journal of pharmaceutical research, 10(2).
N. a Abdulla, N. a Ahmed, M. a Shehab, and M. Al-ayyoub, “Arabic Sentiment Analysis: Corpus-based and Lexicon-based,” Jordan Conf. Appl. Electr. Eng. Comput. Technol., vol. 6, no. 12, pp. 1–6, 2013.
M. Abdul-Mageed and M. Diab, “SANA: A Large Scale MultiGenre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis,” Proc. Lang. Resour. Eval. Conf., pp. 1162–1169, 2014.
G. Badaro, R. Baly, and H. Hajj, “A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining,” Arab. Nat. Lang. Process. Work. co-located with EMNLP 2014, Doha, Qatar, pp. 176–184, 2014.
D. Vilares, C. Gómez-Rodríguez, and M. A. Alonso, “Universal, Unsupervised, Uncovered Sentiment Analysis,” no. July, 2016.
K. Graff, D., Maamouri, M., Bouziri, B., Krouna, S. and T. S., and Buckwalter, “Standard Arabic Morphological Analyzer (SAMA) Version 3.1.,” Linguist. Data Consort. LDC2009E73., p. 2018, 2009.
S. Baccianella, A. Esuli, and F. Sebastiani, “SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining,” Proc. Seventh Int. Conf. Lang. Resour. Eval., vol. 0, no. January, pp. 2200–2204, 2010.
S. M. Mohammad, M. Salameh, and S. Kiritchenko, “Sentiment Lexicons for Arabic Social Media,” Tenth Int. Conf. Lang. Resour. Eval. Lr. 2016, no. September, pp. 33–37, 2016.
A. Assiri, A. Emam, and H. Al-Dossari, “Towards enhancement of a lexicon-based approach for Saudi dialect sentiment analysis,” J. Inf. Sci., p. 016555151668814, 2017.
M. Mataoui, “A Proposed Lexicon-Based Sentiment Analysis Approach for the Vernacular Algerian Arabic.,” Res. Comput. Sci., vol. 110, no. April 2016, pp. 55–70, 2016.
N. Al-twairesh, “AraSenTi-Tweet : A Corpus for Arabic Sentiment Analysis of Saudi Tweets,” Procedia Comput. Sci., vol. 117, no. November, pp. 63–72, 2017.
M. Nabil, “ASTD : Arabic Sentiment Tweets Dataset,” no. September, pp. 2515–2519, 2015.
E. Refaee and V. Rieser, “An Arabic twitter corpus for subjectivity and sentiment analysis,” Proc. Lang. Resour. Eval. Conf., no. spring 2013, pp. 2268–2273, 2014.
Munot, N. M., Lasure, P., & Girme, S. S. (n.d.). Design and Evaluation of Chronotropic Systems for Colon Targeted Drug Delivery. International Journal of Pharmacy Research and Technology (Vol. 2, pp. 13–17).
B. Ihnaini and M. Mahmuddin, “An expandable and up-to-date lexicon for sentiment analysis of Arabic tweets,” J. Eng. Appl. Sci., vol. 13, no. 17, pp. 7313–7322, 2018.
S. R. El-Beltagy, “NileULex: A Phrase and Word Level Sentiment Lexicon for Egyptian and Modern Standard Arabic,” to Appear Proc. Lr. 2016, no. April, pp. 2900–2905, 2016.
T. Wilson, J. Wiebe, and P. Hoffman, “Recognizing contextual polarity in phrase level sentiment analysis,” Acl, vol. 7, no. 5, pp. 12–21, 2005.
M. Hu and B. Liu, “Mining and summarizing customer reviews,” Proc. 2004 ACM SIGKDD Int. Conf. Knowl. Discov. data Min. KDD 04, vol. 04, p. 168, 2004.
T. Al-Moslmi, M. Albared, A. Al-Shabi, N. Omar, and S. Abdullah, “Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis,” J. Inf. Sci., p. 0165551516683908, 2017.
K. Schouten and F. Frasincar, “Survey on Aspect-Level Sentiment Analysis,” IEEE Trans. Knowl. Data Eng., vol. 28, no. 3, pp. 813–830, 2016.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2018 COMPUSOFT: An International Journal of Advanced Computer Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.
©2023. COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY by COMPUSOFT PUBLICATION is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY. Permissions beyond the scope of this license may be available at Creative Commons Attribution 4.0 International Public License.