At a time when artificial intelligence is reshaping how knowledge is produced and shared, a fundamental question emerges: whose languages are being represented, and whose are being left behind?
This question was at the heart of the 2nd Workshop on NLP for Languages Using Arabic Script (AbjadNLP 2026), held on March 28 in Rabat in a hybrid format, as part of the 19th Conference of the European Chapter of the Association for Computational Linguistics.
The workshop brought together a global network of researchers committed to advancing Natural Language Processing (NLP) for languages that remain largely underrepresented in today’s AI systems—languages written in or historically adapted to the Arabic script, spanning regions, cultures, and centuries of knowledge. Some are, to this day, written in multiple scripts that are not handled equally well by computational techniques. Others are primarily oral languages that have only recently acquired a written form. This is the case for dialects of Arabic, which have attracted significant attention in recent years as NLP has started encompassing tasks relying primarily on oral communication.
In NLP, however, the treatment of spoken language is mediated by a written form and there have been varied attempts to develop an reasoned written form for dialectal Arabic. The workshop opened with a keynote delivered by Dr. Violetta Cavalli-Sforza, who highlighted the issues inherent in developing an orthography for Moroccan Arabic and surveyed the major efforts undertaken in that direction, using Arabic and other scripts, and focusing on the uses of written forms of oral language and the undeniable relationship of Darija to formal Arabic even as it has absorbed aspects of the languages from its geographical and historical context.
From Arabic dialects to Urdu, Persian, Pashto, Kurdish, and beyond, the conversations underscored a critical reality: language is not only a medium of communication—it is a vessel of identity, heritage, and intellectual diversity. Ensuring its presence in digital systems is not merely a technical challenge, but a strategic and ethical imperative.
The AbjadNLP initiative reflects a broader shift in AI research—one that recognizes that innovation must be inclusive to be meaningful. Workshop participants were thrilled to observe that the community of researchers attending to Abjad languages has grown substantially in size over time. By fostering collaboration, resource development, and tool creation for under-resourced languages, it contributes to building a more equitable digital future.
