Making language data available and representative worldwide | Metaglossia: The Translation World | Scoop.it
Oxford Languages are working to make our language data available as widely as possible to support under-resourced languages and global varieties of English.

"If you have a smartphone or use some of the biggest search engines, then you have Oxford Languages’ dictionaries at your fingertips.


While the Oxford English Dictionary is our flagship title, we don’t just hold English language data. In support of our mission at OUP—to advance the University’s objective of excellence in research, scholarship, and education by publishing worldwide—we work with cutting-edge technology providers to make Oxford Languages’ data available as widely as possible.


One of our aims is to digitize under-resourced languages to support localization. We service over 60 languages, and in 2024/25, we launched 10 new language datasets, ranging from Indonesian, to Sanskrit, to Assamese. For such languages, we might develop the content with out partners, or we may acquire and develop it by working with native linguists, local agencies, authors, institutes, foundations, and our in-house development teams.


Under-resourced languages
Sometimes our customers will request a new language dataset for their digital products, but we also look for gaps in the market. In high demand and under-resourced, in 2024, we successfully added the leading Indonesian Monolingual Dictionary to our language portfolio. Sourcing, developing and investing in under-resourced languages helps to widen access to these languages, while also digitally preserving culture and history.


Alexandra Feeley
Director of Business and Market Development
“In countries where English is commonly spoken but not the main language, you are forced to use English for technology because the features don’t tend to support native speakers. When I open my phone or my email nowadays, I expect predictive text, to fill in the blanks, to spell check. But when you look at Indian languages or African languages for example, there isn’t that same level of native digitalization.


“This is why we have created resources to allow technologists to develop the tools for those under-resourced languages. If you can experience something in your native language, it becomes an extension of you and it’s then a lot easier to relate to products and to expand your usage of things.”


Some of the under-resourced languages we’re working on include Hebrew and Catalan. When we work on such projects, our teams make sure we’re best representing the language and how it is spoken by reviewing corpora, including inflection coverage and having complete and short definitions.


World Englishes
The Oxford English Dictionary (OED) is widely regarded as the accepted authority on the English language. However, English is not the same language that it was when the First Edition was published in 1928.


Danica Salazar
OED World English Editor


“Since then, it has become a truly global language, spoken by billions of people of immensely varied origins and backgrounds—and as these people continue to contribute to the richness and diversity of the English lexicon, so will the OED continue to adapt its policies and practices in order to ensure that these contributions are represented in the dictionary.”


Collectively, we refer to global varieties of English as ‘World Englishes’, supporting our goal in the lead up to the centenary of the OED’s First Edition in 2028 in widening the geographical coverage of the dictionary. Our World English programme recognizes that English is a world language, and so British English is no longer regarded as the dominant form of English but just one of many varieties. Each quarterly update of the OED now includes examples from different World Englishes. You can find out more in the March 2025 update, which features ‘untranslatable’ words.


As language continues to evolve, we regularly update our datasets to make sure our customers’ dictionary displays, games, mobile applications, and other solutions stay current with modern English. You can find out more about this here.


Another ongoing project is the Oxford Dictionary of African American English (ODAAE), which will apply the depth and rigour of the OED’s historical methodology specifically to the study of African American English. A diverse team of lexicographers and researchers are creating a dictionary that will illuminate the history, meaning, and significance of this body of language. More than 1,350 meanings for 1,100 words are now in draft with 300 words finalized.


John McCullough, Lexicographer at the ODAAE, said:


“What is really important about the ODAAE is our opportunity to represent speakers of African American English in a way that is both accurate and respectful to the enduring legacy of the language, and provide high-quality research evidence that highlights its importance to the cultural and linguistic landscape of English throughout history.


“This is a language variety that has thrived in its expression of Black identity, often despite and in spite of historical marginalization and stigmatization. We are proud of the work we have done to include a wide range of entries that reflect the ways in which AAE is a distinct yet inextricable foundation of American English and continues to linguistically innovate and spearhead cultural change.”


Anansa Benbow, also an ODAAE Lexicographer, said:


“African American English has undeniably influenced global English. I am proud to help document its lexicon through my work on the ODAAE, a project that is about amplifying voices, histories, and identities, as well as honouring and preserving the richness of African American English. It is a project that speaks to the heart of our mission at OUP.”


New technologies help our data go further
The OED Labs initiative is helping the shape the future of the Oxford English Dictionary research experience through new technologies.


We have been piloting an AI search assistant on OED.com for users to search across the dictionary’s content quickly, without needing to understand the many different filters that are available. We are also exploring how we support our lexicographers to use AI to research, revise, and publish OED entries more quickly, as well as developing prototypes to investigate how OED data can further empower research. 


Elinor Hawkes, Senior Product Manager, notes:


“The OED has a long history of embracing new technologies and we’re excited to see what the future holds. Our dictionary data not only includes contemporary and historical definitions, but also data how, when, where, and by whom words were used. By coupling this rich dataset with emerging technologies, we are able to support new avenues of research better than ever before.”
3 July 2025
You can find out more about Oxford Languages 👇🏿
https://corp.oup.com/spotlights/making-language-data-available-and-representative-worldwide/
#metaglossia_mundus