B. Ihnaini, M. Mahmuddin


Sentiment analysis is the process of identifying the subjective opinion within a text. And it gains a huge interest due to its several benefits in developing economy, politic, and sociology. And since twitter is considered a rich source of people’s thoughts and opinions, it is urged to benefit from it to explore public opinions. Many researches have been conducted for English language, while Arabic language still got limited number of sentiment analysis studies, especially in the context of Arab dialects in social media. A lexicon-based approach is adopted to perform sentiment analysis on Arabic tweets, which rely on detecting sentiment words. These sentiment words are loaded in a sentiment lexicon where words are annotated by its sentiment polarity. One of the main issues of handling Arabic tweets is the changing nature of twitter, where new words that imply sentiment values emerged, and many slang words are evolved. In this paper, an expandable and up-to-date lexicon for Arabic (EULA) is developed to overcome the issue of inventing new words and phrases in social media. EULA rely on a pre-built lexicon of MSA sentiment words, and a set of rules to expand and enrich it with dialectical polarity words from a small amount of labeled tweets, and a large amount of unlabeled tweets. For evaluation, eight different corpuses of Arabic tweets were selected. And a pre-processing phase that includes normalization and stemming is implemented to reduce the number of unique words to be analyzed for sentiment analysis. Experiments show that EULA improved the lexicon-based approach`s accuracy and F-1 score by more than 20% on average.


Sentiment analysis; Lexicon-based approach; Social media; Modern Standard Arabic (MSA)

