A NEW RICH LEXICAL RESOURCE FOR CLASSICAL ARABIC

  • Mustapha Khalfi PhD. Student
  • Arsalane Zarghili Professor
  • Ouafae Nahli Researcher
Keywords: Information Extraction, Arabic Lexicon, Machine-readable dictionary, Arabic Lexical Resource, Al Qamus Al Muhit

Abstract

Currently, large lexical resources are getting a high potential relevance for information systems and need of Lexical resources in Natural Language Processing (NLP) fields is paramount. To contribute meet these needs, we build a lexical resource from the famous dictionary al=qāmūs al=muḥīṭ (AQAM). Using a rule based approach, we have designed a system that allows extracting morpho-syntactical, semantics and lexical information from the famous dictionary. So, we obtained a digitized and structured version of AQAM, enriched by morpho-syntactical and lexical explicit information. In addition, the obtained resource is enriched by English translations of lemma and accompanying senses using a bilingual English-Arabic dictionary. Then we present an overview of an experiment alignment of the section of the letter bā’ on Princeton’s WordNet (PWN) and Suggested Upper Merged Ontology (SUMO). This experience turned out to be interesting because it revealed that mapping an Arabic lexical resource on an English resource shows commonality between the two languages, but it allows especially to emphasize the non-equivalences between them. All obtained resources are represented in XML format and distributed under free license.

Downloads

Download data is not yet available.

References

A. M. Al-Zoghby, A. Elshiwi, and A. Atwan, “Semantic Relations Extraction and Ontology Learning from Arabic Texts---A Survey,” in Intelligent Natural Language Processing: Trends and Applications, K. Shaalan, A. E. Hassanien, and F. Tolba, Eds. Cham: Springer International Publishing, 2018, pp. 199–225.

K. Shaalan, “A Survey of Arabic Named Entity Recognition and Classification,” J. Comput. Linguist., vol. 40, no. March 2013, pp. 469–510, 2014, doi: 10.1162/COLI

S. Alqrainy, H. Muaidi, and M. S. Alkoffash, “Article: Context-Free Grammar Analysis for Arabic Sentences,” Int. J. Comput. Appl., vol. 53, no. 3, pp. 7–11, 2012.

S. Elkateb, W. Black, H. Rodríguez, M. Alkhalifa, P. Vossen, A. Pease,and C. Fellbaum, “Building a WordNet for Arabic,” in Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), 2006

H. Rodríguez, D. Farwell, J. Farreres, M. Bertran, M. Alkhalifa, M. A.Martí, W. Black, S. Elkateb, J. Kirk, A. Pease, P. Vossen, and C. Fellbaum, “Arabic WordNet: Current state and future extensions,” in The fourth global WordNet conference, 2008, pp. 387–405.

M. AlKhalifa and H. Rodríguez, “Automatically extending NE coverage of Arabic WordNet using Wikipedia,” in the 3rd international conference on Arabic language processing CITALA’09, 2009.

H. Rodríguez, D. Farwell, J. Ferreres, M. Bertrán, M. Alkhalifa, and M. A. Martí, “Arabic WordNet: Semi-automatic Extensions using Bayesian Inference,” in the the 6th Conference on Language Resources and Evaluation LREC2008, 2008.

R. Del Gratta and O. Nahli, “Enhancing Arabic WordNet with the use of Princeton WordNet and a bilingual dictionary,” in 2014 Third IEEE International Colloquium in Information Science and Technology (CIST), 2014, pp. 278–284, doi: 10.1109/CIST.2014.7016632.

Y. Regragui, L. Abouenour, F. Krieche, K. Bouzoubaa, and P. Rosso, “Arabic WordNet: New Content and New Applications,” in Proceeding of the 8th Global Wordnet Conference (GWN 2016), 2016.

S. Boudelaa,W.D Marslen-Wilson, "Aralex: A lexical database for Modern Standard Arabic". Behavior Research Methods. vol. 42, pp. 481–487. 2010. https://doi.org/10.3758/BRM.42.2.481.

M. Attia, P. Pecina, A. Toral, L. Tounsi, and J. van Genabith, “A Lexical Database for Modern Standard Arabic Interoperable with a Finite State Morphological Transducer,” in Systems and Frameworks for Computational Morphology, 2011, pp. 98–118.

D. Namly and K. Bouzoubaa, “LMF conversion of an editorial dictionary: the case of the Contemporary Arabic dictionary,” in Journée d’étude Ressources langagières de l’arabe pour le TAL : construction, standardisation, gestion et exploitation, 2015.

M. Alkhatib, A. A. Monem, and K. Shaalan, “The Suggested Upper Merged Ontology: A Large Ontology for the Semantic Web and its Applications,” Procedia ComputerScience., vol. 117, no. 2, pp. 759–776, 2017, doi: https://doi.org/10.1007/s13369-017-2737-2.

T. Zerrouki and A. Balla, “Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems,” Data inBrief., vol. 11, pp. 147–151, 2017, doi: 10.1016/j.dib.2017.01.011.

N. Ide and J. Véronis, “Encoding Dictionaries,” in Text Encoding Initiative: Background and Context, N. Ide and J. Véronis, Eds. Dordrecht: Springer Netherlands, 1995, pp. 167–179.

O. Nahli, F. Frontini, M. Monachini, F. Khan, A. Zarghili, and M. Khalfi, “Al Qamus al Muhit, a Medieval Arabic lexicon in LMF,” inProceedingsof the Tenth International Conference on Language Resources andEvaluation (LREC 2016). Paris, France: European Language ResourcesAssociation (ELRA), may 2016, pp. 943–950.

R. Duwairi, M. Al-Refai, and N. Khasawneh, “Stemming Versus Light Stemming as Feature Selection Techniques for Arabic Text Categorization,” in 2007 Innovations in Information Technologies (IIT), 2007, pp. 446–450.

M. D. al-fīrūz ‘ābādī, "al=qāmūs al=muḥīṭ", 8th edition. Beirut: mu’assasat ar-risālah, 1998.

M. az=zabīdī, "tāǧ al=’arūs min ǧawāhir al=qāmūs". Maṭba`at al-Kuwayt, 2001.

C. Fellbaum, “WordNet: An Electronic Lexical Database,” MIT Press, 1998.

G. A. Miller, “WordNet: A Lexical Database for English,” Commun. ACM, vol. 38, no. 11, pp. 39–41, 1995, doi: 10.1145/219717.219748.

B. Ladislav, D. Petr, and O. Vojtech, “Knowledge Base Modeling and Design Procedure,” in Information Modelling and Knowledge Bases XXIII, H. Jaak, K. Yasushi, T. Takehiro, J. Hannu, and Y. Naofumi, Eds. IOS Press, 2012, pp. 331–343.

A. Pease, I. Niles, and J. Li, “The Suggested Upper Merged Ontology: A Large Ontology for the Semantic Web and its Applications,” AAAI Tech. Rep. WS-02-11, 2002.

I. Niles and A. Pease, “Linking Lexicons and Ontologies: Mapping WordNet to the Suggested Upper Merged Ontology,” in PROCEEDINGS OF THE 2003 INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGINEERING (IKE 03), LAS VEGAS, 2003, pp. 412–416.

H. Wehr, “A Dictionary of Modern Written Arabic,” 3rd Editio., J. M. COWAN, Ed. Spoken Language Service, Inc, 1976, p. XI.

H. A. Salmoné, "An advanced learner’s Arabic-English dictionary": incl. an Engl. index. Libr. du Liban, 1978.

O. Nahli, “Arabic Language Alignment with English Ontologies: Some Ontological Reflections,” in 5th {IEEE} International Congress on Information Science and Technology, CiSt 2018, Marrakech, Morocco, October 21-27, 2018, 2018, pp. 254–260, doi: 10.1109/CIST.2018.8596580.

W. van Langendonck, "Theory and Typology of Proper Names". ser.Studies and monographs. Bod Third Party Titles, 2007.

M. Khalfi, O. Nahli, and A. Zarghili, “Classical dictionary Al-Qamus in lemon,” Colloq. Inf. Sci. Technol. Cist, vol. 0, pp. 325–330, 2016, doi: 10.1109/CIST.2016.7805065.

R. Ba`labakkī, "The Arabic Lexicographical Tradition: From the 2nd/8th to the 12th/18th Century". ser. Handbook of Oriental Studies. Brill, 2014.

Published
2020-10-31
How to Cite
Khalfi, M., Zarghili, A., & Nahli, O. (2020). A NEW RICH LEXICAL RESOURCE FOR CLASSICAL ARABIC. COMPUSOFT: An International Journal of Advanced Computer Technology, 9(10), 3863-3885. Retrieved from https://ijact.in/index.php/ijact/article/view/1196