 Your new post is loading...
|
Scooped by
Charles Tiayon
March 7, 1:16 PM
|
" WAXAL: A large-scale open resource for African language speech technology March 6, 2026 WAXAL provides a critical, open-access foundation for African speech technology. Featuring a large corpus of ASR and TTS data for 27 native languages under a highly permissive license, WAXAL empowers the African AI ecosystem to build robust speech systems that better reflect the region's unique linguistic diversity. Quick links WAXAL dataset Paper Share Voice-enabled technologies like virtual assistants and automated transcription have transformed how we interact with computers. However, their benefits disproportionately favor a handful of high-resource languages. This divide has left hundreds of millions of people — particularly in Sub-Saharan Africa, home to over 2,000 distinct languages — unable to access essential technology in their native tongues. Several years ago, the team at Google Research set out to help tackle this problem. Watch the film To address this critical need, we introduce WAXAL: a large-scale, openly accessible speech dataset that initially covers 27 Sub-Saharan African languages spoken by over 100 million speakers across more than 26 countries. Developed through a multi-year effort beginning in 2021, in collaboration with African academic and community organizations, WAXAL provides the high-quality, permissively licensed data necessary to build robust speech systems. Setting a foundational milestone, this initial release features approximately 1,846 hours of transcribed natural speech for automatic speech recognition (ASR) and over 565 hours of high-fidelity recordings for text-to-speech (TTS). We are releasing these resources under a Creative Commons license (CC-BY-4.0) to catalyze research and enable inclusive voice-enabled technologies tailored to the unique linguistic characteristics of the continent. We intend for the WAXAL collection to continuously evolve and expand to include additional languages as part of our ongoing effort to bridge the digital divide. Introducing WAXAL By addressing critical data scarcity for over 100 million speakers, WAXAL aims to empower the regional AI research ecosystem. To support the development of robust speech technologies, the corpus integrates two specialized datasets designed to provide comprehensive coverage for both speech recognition and synthesis tasks. WAXAL-ASR (Spontaneous Understanding): Comprising approximately 1,846 hours of transcribed audio, this dataset captures natural, unscripted speech. Instead of reading scripts, diverse participants were asked to describe visual stimuli covering 50+ topics in their native language. This image-prompted elicitation captured authentic linguistic variations, including tonal nuances and code-switching. This method successfully yielded more natural speech than traditional methods. Examples from Google’s Open Images used as prompts to elicit natural speech for the ASR dataset. WAXAL-TTS (High-Fidelity Generation): Designed to facilitate the creation of natural-sounding synthetic voices, this dataset contains over 565 hours of high-quality, phonetically balanced audio. The TTS collection process was highly collaborative: local community members worked in pairs to draft scripts of 10,000–20,000 words, alternating reader and recorder roles. To ensure professional-grade acoustics, some participants used project funding to build custom studio boxes. The resulting recordings were then segmented, matched with the script text, and reviewed for accuracy and quality. TTS recording box at University of Ghana. The WAXAL corpus's dual focus on unscripted ASR data and high-fidelity TTS audio is designed to enable the development of full-duplex conversational systems. Specifically, the ASR component facilitates the modeling of varied, spontaneous speech input typical of real-world scenarios, while the high-quality TTS component provides the clean reference data required for generating clear, natural output. The table below lists the 27 languages currently included in the dataset: Breakdown of the current WAXAL dataset, showing the 27 initial Sub-Saharan African languages and the availability of ASR and TTS data for each. Anchoring in the African AI ecosystem Crucial to the WAXAL project was our commitment to working with, and contributing directly to, the African AI ecosystem. The data collection effort was led entirely by African academic and community organizations, guided by Google experts on world-class data collection practices. This collaborative approach ensured the corpus was built by and for the community it serves; with shared methodology each partner focused on a specific subset of languages. Our partners included Makerere University, which collected ASR and/or TTS data for nine different languages, and the University of Ghana, which focused its efforts on eight languages, using the ASR image-prompted data collection methodology outlined above. Additional key collaborators were Digital Umuganda, in partnership with Addis Ababa University, who were instrumental in leading the ASR collection for several regional languages. For the high-quality, studio-recorded voices, Media Trust, Loud n Clear and African Institute for Mathematical Sciences Senegal spearheaded the TTS recordings across various regional languages. This framework is fundamentally rooted in the principle that our partners retain ownership of the collected data toward the shared commitment to make all datasets openly available for the broader community. This deep collaboration and open-access philosophy have already enabled notable derivative research and publications. Through this framework, our partners have already enabled new research, such as the development of a cookbook for community-driven collection of impaired speech . This research resulted in the first open-source dataset for Akan speakers with conditions like cerebral palsy and stammering, and demonstrated that in-person, image-prompted elicitation is more effective than text-based prompts for these populations. This work provides a vital roadmap for developing inclusive speech technologies in low-resource environments. Furthermore, the initiative supported a major study that introduced a 5,000-hour speech corpus for five Ghanaian languages — Akan, Ewe, Dagbani, Dagaare, and Ikposo. This work established infrastructure for building robust ASR and TTS systems tailored to the linguistic diversity of West Africa by using a controlled crowdsourcing approach to capture natural, spontaneous intonations. Other essential research has focused on benchmarking four state-of-the-art models (Whisper, XLS-R, MMS, and W2v-BERT) across 13 African languages. This study analyzed how performance scales with increased training data, offering key insights into data efficiency and highlighting that scaling benefits are strongly dependent on linguistic complexity and domain alignment. Finally, a systematic literature review was published, cataloging 74 datasets across 111 African languages to map the current frontier of speech technology. This review emphasized the urgent need for multi-domain conversational corpora and the adoption of linguistically informed metrics, such as Character Error Rate (CER), to better evaluate performance in morphologically rich and tonal language contexts. Conclusion and future directions WAXAL represents a key milestone in bridging the digital divide, offering a high-quality, open-access speech resource for 27 Sub-Saharan African languages. Developed through deep collaboration with African academic and community organizations, this initiative empowers the continent’s AI ecosystem and preserves linguistic diversity. We hope WAXAL will continue to serve as a vital resource for the digital preservation of African languages and a foundation for future innovations. Google remains committed to this effort, with plans to continuously expand the WAXAL dataset." Tavonga Siyavora, Senior Product Manager, and Abdoulaye Diack, Program Manager, Google Research https://research.google/blog/waxal-a-large-scale-open-resource-for-african-language-speech-technology/ #Metaglossia #metaglossia_mundus #métaglossie
Are humans the only beings on the planet that use language to communicate?
"Burg Giebichenstein
Kunsthochschule Halle
“Language can only deal meaningfully with a special, restricted segment of reality. The rest, and it is presumably the much larger part, is silence.” George Steiner
Are humans the only beings on the planet that use language to communicate? Can we decipher the nonhuman world around us without harnessing it to our own socialization, syntax, and lexicon? Is interspecies communication even possible? Translation has been described as a precondition that underlies all (human) cultural transactions upon which communication is based. It also is inherently political and stands at the forefront of so many of today’s questions around identity, gender, post-colonial criticism, feminist critique, machine translation and canon creation, yet its connection within the context of the nonhuman turn, interspecies communication, and eco-criticism has not yet been fully explored.
Whether we are talking about classic linguistic and literary translation, or any number of related fields including: language and literature, cultural studies, performance, visual and media arts—the core question that translators and theorists of translation have been debating about for centuries remains the same: is it possible to translate without interpreting? Is linguistic and cultural equivalence even possible? These questions become all the more urgent in the limit-case of interspecies communication. Can we apply empathic modes of translation to nonhuman articulations, wherein translation involves a form of metamorphosis, not of text, but of the translator. As such, translators are something of a hybrid species with one foot in each culture and language, and whose very existence revolves around traveling between worlds. Translators have something of a mythical being about them, akin to a chameleon or centaur. In this course, we will not be engaging in a scientific exploration of interspecies communication, but examining theories around empathic translation-- a process that sees translation not merely as the transformation of a text, but of the translator themself.
Emerging and classical theories of translation can offer a paradigm for engaging with plant and animal articulation, not language as such, but different forms of articulation perceived through the senses, one in which our hearing and seeing,“once intertwined and attentive to the calls and cries of animals, all but disappeared with the invention of the alphabet, retreating into a kind of silence.”
In David Abram's words: “By giving primacy to perception we can see the natural world, not as inert and passive, but as dynamic and participatory. The winds, rivers and birds speak in their own way (if we listen), the sounds of nature not only have informed indigenous languages, but language in general--humans are but one being intertwined with other beings and ‘presences.’ This perspective sees the landscape as a sensuous field, and human perception as but one point of view that is in reciprocity, in expressive communication, with other points of view and ways of being.”
How can theories of translation help us make sense of this new view of a world teeming with language and sentience? What theories abound in reference to multiplicity of “language,” even as Walter Benjamin would argue for a “universal (human) language.” What practical tools does translation studies offer, and what bridges can it forge between the disciplines? The first half of the seminar focuses on key theoretical concepts relevant to the history and practice of translation. In the second half, students will engage in translation experiments that intersect with their own artistic/design practice. A final project should be considered a first draft of something that could develop later into a larger project.
The course will be taught in English and German.
This seminar is ideally suited to students interested in: Literature, Translation Theory / Translation / Cultural Studies / Critical Theory, Creative Writing/ Post-humanism, Trans-humanism, Eco-criticism, the More-than-Human Turn.
Teachers
Dr. Zaia Alexander"
https://www.burg-halle.de/en/course/l/talk-with-the-animals-translation-in-a-more-than-human-world
#Metaglossia
#metaglossia_mundus
#métaglossie
"Last year at the Vatican Library, I had the chance to see a portion of the Bible with an incredible history. It wasn’t the famous Codex Vaticanus but a translation of the Gospels into Persian from the 1740s.
While a translation of the Gospels into the language of a Muslim empire is itself noteworthy, the history behind this particular text is even more remarkable. It represents one of two times when the ruler of Iran (or Persia, as it was called by the West before 1935) praised the Bible and furthered its spread in the region.
At a time when Iran is often associated with hostility toward Christianity, these episodes remind us that God can work through unlikely and even evil leaders. I find encouragement—and a prompting to pray—when I reflect on unexpected ways God used infamous Iranian leaders to spread the gospel. Let me introduce you to two of them.
Nader Shah (1688–1747) Iran’s most ruthless leader in its history arguably was Nader Shah, who ruled Persia from 1736 to 1747 and led a constant stream of military campaigns. His sack of Delhi in 1739 perhaps best demonstrated his military might and brutality. After taking the city, a revolt arose that the shah crushed, resulting in the deaths of up to 20,000 civilians.
The shah, characterized as a “notorious despot and mass murderer who wrought destruction on a large scale and ruined his country,” also brought together Jewish, Catholic, and Armenian scholars in Persia to translate the Old and New Testaments. This included the copy of the Gospels that Catholic missionaries sent to the Vatican Library.
I find encouragement—and a prompting to pray—when I reflect on how God has used Iranian leaders to support the spread of the gospel.
After the missionaries completed translating the Gospels, they went to present the translation to Nader Shah. As they waited an hour for an audience with the shah, they saw 18 people led to his chamber who later were carried out as lifeless bodies, having been strangled. With a trepidation reminiscent of Esther approaching the Persian King Ahasuerus, they entered the shah’s court expecting martyrdom. However, the shah received the Persian translation and rewarded them with silver equivalent to a few years’ wages.
Nader Shah’s motivations for developing a Persian translation of the Bible are unclear. He may have sought to understand Judaism and Christianity in his empire more fully. Perhaps he hoped to syncretize the religions. Whatever his motivations, he was the unlikely catalyst for the first effort to translate the whole Bible into Persian.
Fath-Ali Shah Qajar (1772–1834) If Nader Shah was one of the most ruthless leaders of Iran, Fath-Ali Shah Qajar was perhaps one of the most opulent. He ruled for a relatively stable period over three decades from 1797 to 1834. He’s easily recognizable in portraits with his long beard, thin waist, and bejeweled attire.
In 1812, evangelical missionary Henry Martyn completed a translation of the New Testament into Persian. Martyn, who knew William Wilberforce, Charles Simeon, and William Carey, worked tirelessly in Shiraz, Persia, to translate the New Testament.
When he finished, he attempted to present a beautiful bound copy to Fath-Ali Shah. Martyn reached the shah’s encampment but couldn’t enter his court to present the New Testament. However, one secretary read to the shah three tracts Martyn had written to present the gospel to Muslims. Martyn died four months later, at the young age of 31, while trying to return to England.
While Martyn didn’t live to see it, the British ambassador to Persia presented his Persian New Testament to Fath-Ali Shah in 1814. After reviewing the New Testament, the shah sent a letter commending it. He asserted that Martyn had translated the text “in a style most befitting sacred books, that is, in an easy and simple diction.” He said he’d command his attendants to read him the New Testament from beginning to end and support its distribution around Persia. Those who were “virtuously engaged” in spreading the New Testament and teaching its meaning, the shah said, would be “deservedly honored with . . . royal favor.”
While there are certainly elements of diplomatic flattery in this letter, the shah’s approval had far-reaching consequences. Throughout the 19th century, missionaries like Peter Gordon and William Glen distributed hundreds of copies across Persia with a relative degree of freedom.
God’s Sovereignty and Iranian Leaders These two stories of Persian leaders supporting the Bible’s translation and distribution are surprising in light of current religious restrictions in Iran. But it’s not that surprising in light of biblical history.
In the Old Testament, the Lord sovereignly uses Persian leaders to protect his people and further his covenant plan for redemption. King Ahasuerus circulates a letter that saves the Jewish people from certain destruction (Est. 8:11–13). Nehemiah receives a letter of support from the Persian King Artaxerxes to help rebuild the walls of Jerusalem (Neh. 2). King Cyrus sends incredible amounts of gold and silver to support the rebuilding of the temple in Jerusalem (Ezra 1:2–4).
In the Old Testament the Lord sovereignly uses Persian leaders to protect his people and further his covenant plan for redemption.
God sovereignly works to move kings and rulers—even the most pagan kings and the most ruthless rulers—to do his will. In Ezra 1:1, we see that the Lord “stirred up the spirit of Cyrus king of Persia.” The connection between God’s sovereignty and his directing of a Persian king is crystal clear in Isaiah 44:24–45:25. This passage first emphasizes that it’s the Lord “who made all things, who alone stretched out the heavens” (v. 24). Turning to Cyrus, the Lord states that he “shall fulfill all [God’s] purpose” (v. 28). In the next verse, Cyrus is referred to as God’s anointed and the one “whose right hand [God has] grasped” (45:1).
Let’s pray for the next ruler of Iran. Pray that, as the Lord has done before in history, he’d use the next leader to protect his people and further the spread of the gospel message. Both Christians and Muslims have suffered greatly in Iran in recent decades, yet the gospel is still advancing.
We should pray for an end to suffering in Iran. But we can also trust that amid uncertainty, missiles, and war, our sovereign God guides the hand and thwarts the will of rulers.
Free eBook by Tim Keller: ‘The Freedom of Self-Forgetfulness’ Imagine a life where you don’t feel inadequate, easily offended, desperate to prove yourself, or endlessly preoccupied with how you look to others. Imagine relishing, not resenting, the success of others. Living this way isn’t far-fetched. It’s actually guaranteed to believers, as they learn to receive God’s approval, rather than striving to earn it.
In Tim Keller’s short ebook, The Freedom of Self-Forgetfulness: The Path To True Christian Joy, he explains how to overcome the toxic tendencies of our age一not by diluting biblical truth or denying our differences一but by rooting our identity in Christ.
TGC is offering this Keller resource for free, so you can discover the “blessed rest” that only self-forgetfulness brings." https://www.thegospelcoalition.org/article/iran-leaders-praised-bible/ #Metaglossia #metaglossia_mundus #métaglossie
Also sentenced: Murat Ybyraiuly — Translator–Reporter; arrested in August 2023, charges not publicly disclosed; sentenced to 5.5 years.
Oñalğan Múlikuly — Translator–Reporter; arrested in January 2023; sentenced in 2024 to 7 years.
Janibek Jaudatuly — Translator; arrested in January 2023; sentenced in 2024 to 7.5 years.
"Adil Semeykhanuly has reportedly been sentenced to six-and-a-half years for his “negative interpretation” of Kazakh poet Abai Kunanbaev.
by Serikzhan Bilash and Tilek Niyazbek
A respected Kazakh language editor and cultural researcher, Adil Semeykhanuly, has reportedly been sentenced to six and a half years in prison in China’s Xinjiang region after more than a year of detention and house arrest, according to Kazakh language media and colleagues familiar with the case.
Colleagues say Adil Semeykhanuly received a 6½-year prison sentence over allegations of a “negative interpretation” of the teachings of the Kazakh poet Abai Kunanbaev (1845–1904). Semeykhanuly, a long-time editor at the “Шынжаң” (Shynzhan) newspaper and a recognised scholar of Abai, was first detained in January 2024. Sources say he spent seven to eight months in custody before being placed under house arrest due to insufficient evidence, according to relatives.
On 20 August 2025, he was reportedly sentenced on charges that he “negatively propagated the teachings of Abai” and “formed a separate public opinion,” accusations observers describe as politically broad and vague.
Kazakh outlets report that four other Kazakh intellectuals working in the same media environment were also arrested and later sentenced:
Tegis Zäybekuly — Deputy Editor of the Kazakh Editorial Department; arrested in October 2024. His sentence remains unknown.
Murat Ybyraiuly — Translator–Reporter; arrested in August 2023, charges not publicly disclosed; sentenced to 5.5 years.
Oñalğan Múlikuly — Translator–Reporter; arrested in January 2023; sentenced in 2024 to 7 years.
Janibek Jaudatuly — Translator; arrested in January 2023; sentenced in 2024 to 7.5 years.
Colleagues describe the series of arrests as part of an intensifying crackdown on Kazakhlanguage publishing, translation work, and cultural expression in Xinjiang.
Semeykhanuly’s participation in a 2005 Chinese delegation to Kazakhstan for the 160th anniversary of Abai in Semey was reportedly cited as one of the incidents that Chinese authorities scrutinized. He was widely regarded as a mentor to young journalists and a prolific contributor of cultural essays.
Monument to Abai in Beijing, with Chinese and Kazakh flags.
A survey of Chinese public court databases, government bulletins, and legal notices found no official records confirming the arrests or sentences of Semeykhanuly or the four other intellectuals. The absence of public documentation is common in politically sensitive cases in Xinjiang, where legal processes remain opaque. The news has, however, been confirmed by Kazakh sources.
The case also stands in contrast to cultural diplomacy between the two nations: China maintains an Abai monument in Beijing, and state media frequently describe the poet as a “bridge of friendship.” Kazakhstan hosts at least five Confucius Institutes established in partnership with Chinese universities. These public gestures of mutual cultural respect sit uneasily alongside the sentencing of an Abai scholar over an alleged “negative interpretation” of the poet’s teachings.
Human rights organizations continue to report systemic pressure on Uyghur, Kazakh, and other Turkic intellectuals, including charges related to ideology, cultural activity, or perceived separatism. Families often report limited access to information and fear repercussions for speaking publicly.
As of publication, Chinese authorities have not acknowledged the reported arrests or sentences. Requests for comment were sent to Xinjiang regional authorities and the Chinese Embassy in Kazakhstan."
by Serikzhan Bilash | Mar 9, 2026 | News China
https://bitterwinter.org/kazakh-scholar-sentenced-in-xinjiang-for-misinterpreting-a-poet/
#Metaglossia
#metaglossia_mundus
#métaglossie
"Literature can bring culture and emotions to life and open up new perspectives — and the same is true for learning German. In this episode, we explore how texts, stories and theatre can enrich the experience of learning the language. We also talk about a theatre project our studio guest, Jonas Teupert, lecturer and director of the German program at the University of Melbourne, staged together with students. Also present is the student who played the lead role in the production: Anindo Minifie. Published 9 March 2026 2:29pm By Julia Grewe" https://www.sbs.com.au/language/german/en/podcast-episode/how-literature-and-theatre-bring-language-to-life-episode-6/7k6cxjh80 #Metaglossia #metaglossia_mundus #métaglossie
"Held on the theme: “Terminology development in the Ghanaian language,” the workshop and lecture was attended by students for 21 colleges of education, graduate students from several universities, traditional leaders, entrepreneurs, policymakers among other stakeholders.
Prof Appah noted that Ghana had adequate human, linguistic and institutional resources for the cause but was obstructed by inadequate funding.
He made a case for the introduction of a government-sponsored national terminology programme and a register to streamline the development of terminologies.
Through the programme, government would provide funding for research and other critical activities for the gathering, development, and dissemination of the terminologies, he proposed.
Prof. Appah made a direct call on the Ghana Tertiary Education Commission (GTEC), Ghana National Research Fund, and GetFund to help fund their activities on creating terminologies.
While appealing to government, he entreated the Linguistic Association of Ghana to demonstrate their seriousness by forming a research team to start work.
Prof Appah, stressed that local terminologies would help to decolonise education in Ghana, demystifying all complex concepts taught in a foreign language and clearing all impediments.
“The people and teachers of the languages we teach who don’t speak English are not participating in knowledge creation and so if you don’t have the capacity to think, practice, read, and access knowledge in your own language, then you lack linguistic sovereignty,” he added.
The UG principal proposed a teacher education and assessment reform that would promptly adopt new creations.
Dr Vincent Erskine Aziaku, Head of Department of Ghanaian languages and Linguistics, explaining the purpose of the workshop, maintained that Ghana remained under colonisation as it continued to depend on a foreign language.
The problem, he noted, had been the lack of terminologies, intimating that “terminology development is the only way we can succeed in having our language.”
Dr Samuel Owoahene Acheampong, Faculty of Ghanaian Languages Education, University of Education, Winneba (UEW), underscored the need for standardisation to ensure coherence and consistency in the terminologies.
He appealed to government to put together a standardisation council to verify all terminologies to ensure authors were not producing contradictory contents.
Mr Scoon Boakye Appiah, Founder and CEO of AyaPrep, an education technology company, entreated stakeholders to leverage technology to promote the use of Ghanaian language in teaching and learning.
GNA
Edited by Alice Tettey/Linda Asante Agyei
Provided by SyndiGate Media Inc."
https://www.msn.com/en-xl/africa/ghana/stakeholders-advocate-national-terminology-programme-for-ghanaian-languages/ar-AA1WNFfh
#Metaglossia
#metaglossia_mundus
#métaglossie
"“Writing, Reviewing, Translating: Women, Words, and Worlds” on February 17 at Mir Anis Hall, JMI.
The Sarojini Naidu Centre for Women’s Studies (SNCWS), Jamia Millia Islamia, in collaboration with The Book Review Literary Trust successfully organised a one-day national symposium on “Writing, Reviewing, Translating: Women, Words, and Worlds” on February 17 at Mir Anis Hall, JMI.
Chandra Chari, Founder Editor of The Book Review Literary Trust addressed the gathering about the origins and objectives of The Book Review journal and its sustained commitment to fostering critical literary culture in India. She underscored the importance of book reviewing as a vital intellectual practice and emphasised the role of women in shaping contemporary literary discourse.
The first session, titled “Reviewing, Writing, Publishing Women – A Critical Exploration of Gendered Literary Landscapes,” was moderated by Dr. Aakriti Mandhwani. The panel featured Dr. Semeen Ali, Rachna Kalra, Dr. Malvika Maheshwari, Dr. Sucharita Sengupta, and Dr. Kanupriya Dhingra. The speakers reflected on questions of identity and authorship, editorial gatekeeping, the politics of literary knowledge, and the sustainability of women’s writing in South Asia. Discussions highlighted the need to move beyond reductive categorisations of “women’s writing,” to encourage mentorship and alternative platforms, and to view reviewing as both scholarship and resistance.
A session “Writing the City,” moderated by Dr Faiz Ullah, explored literary engagements with urban spaces, particularly Delhi. Speakers Ananya Vajpeyi, Ekta Chauhan, and Aishwarya Jha reflected on the city as a site of memory, transformation, and affect. The discussion examined urban villages, shifting cityscapes, nostalgia, and the interplay between lived experience and literary imagination.
It was followed by a session titled “Writing/Translating Women,” which was moderated by Dr. Amina Hussain, Assistant Professor, SNCWS. The panel included renowned Hindi author Mridula Garg, noted translator Prof. Arjumand Ara, Dr. Deeba Zafir, and Dr. Firdous Azmat Siddiqui. The speakers addressed the epistemic marginalisation of women’s writing, the complexities of translation, intersectional concerns of caste and class, and representations of Muslim women in literature and history. The session emphasised that writing must provoke critical reflection, that translation demands ethical responsibility, and that marginal voices must be represented with nuance and sensitivity.
The symposium reaffirmed Jamia Millia Islamia’s commitment to fostering inclusive and critical academic spaces that foreground women’s voices in literature, scholarship, and translation, and to promoting dialogue that bridges disciplines..."
TNN | Mar 7, 2026, 12:30 IST https://share.google/RqcgYC2ybChNrLPSg
#Metaglossia
#metaglossia_mundus
#métaglossie
"Modern, AI-native platforms designed around Arabic constraints are now seen as essential for governing quality, ensuring consistency, and speeding up the localization of all written assets.
RIYADH: Faced with a globalized workforce and cross-border operations, companies across the Middle East are now embedding live translation into the fabric of daily work, adopting a hybrid human-artificial intelligence strategy to break down language barriers.
For years, multilingual translation in the region was a logistical feature reserved for annual shareholder meetings, flagship conferences, or international trade shows. Today, it has become a daily operational necessity.
Nour Al-Hassan, founder and CEO of Tarjama and Arabic.AI, said in an interview with Arab News that “as companies in the Middle East expand globally, multilingual communication is no longer occasional, it is part of everyday work.”
Tarjama is a MENA-based language technology company that launched Arabic.AI, an advanced, specialized platform for the Arabic language, to deliver high-quality, culturally nuanced, and industry-specific translation and content solutions.
Tarjama and Arabic.AI's founder and CEO Nour Al-Hassan. (Supplied)
Al-Hassan’s sentiments were echoed by Edward Crook, vice president of strategy at AI-powered neural machine translation service, DeepL, who told Arab News: “In the UAE and Saudi Arabia, 84 percent of professionals have integrated AI translation into their daily workflows, signalling a rapid shift from using language AI tools for big events to making them a staple of daily operations.”
Oddmund Braaten, CEO of multilingual event technology company Interprefy, told Arab News that such language support was previously used only for major external sessions, with in-person interpreters brought in for specific language pairs.
“What has changed is that live translation is now part of everyday operations,” he said, adding that the organization runs recurring virtual trainings and frequent internal briefings, and multilingual access is built in by default.
“This has allowed Arabic- and English-speaking teams, along with additional language groups, to participate on equal terms,” Braaten explained.
According to the CEO, internal training uses remote simultaneous interpretation with live captions, especially for technical content. For larger audiences, “AI speech translation is added to extend language coverage.”
This same language experience is maintained for both in-person and remote participants.
As companies across the UAE, Saudi Arabia, Qatar, and Bahrain diversify their economies and attract talent from across the globe, the demand for inclusive communication has moved from the event stage to the weekly team huddle, the training webinar, and the internal strategy update.
This transition from occasional to essential is underscored by new research. A study by Interprefy revealed that 82 percent of Middle Eastern business event organizers now report high demand for multilingual services.
Crucially, 61 percent see clear value in using live translation for webinars, 55 percent for business meetings, and 54 percent for internal “all hands” sessions.
This aligns with broader regional adoption trends that were observed from 2024 onwards. According to a separate survey by DeepL, 84 percent of professionals in Saudi Arabia and the UAE have already integrated AI translation tools into their daily workflows.
The drivers are enhancing productivity, developing new language skills, and, for 46 percent of professionals, successfully expanding business into new markets. Crook added that the primary drivers are “both internal and external: from developing language skills, to boosting time efficiency, and managing supplier relationships.”
To meet this surging, everyday demand, businesses are increasingly adopting a pragmatic, hybrid approach. They are moving beyond a one-size-fits-all model to a two-track system that balances nuance with scale, and cost with critical accuracy.
According to Interprefy, for sensitive negotiations, confidential board discussions, legal proceedings, or complex technical workshops, the expertise of professional human interpreters remains irreplaceable.
This ensures subtlety, cultural nuance, and absolute accuracy where the stakes are highest.
This approach is mirrored by companies such as Tarjama. As Al-Hassan explained: “Tarjama combines professional human translation with its AI-driven CleverSo platform to deliver a hybrid model that mirrors the evolution of translation tools, enhancing productivity without replacing human expertise.”
For the constant stream of daily interactions, such as project check-ins, company-wide broadcasts, training modules, and supplier communications, AI-powered live translation and captions provide scalable, instantaneous, and cost-effective understanding.
This layer ensures that language is never a barrier to participation, collaboration, or swift decision-making in fast-moving environments.
He explained how this hybrid model is practically implemented, noting it usually takes one of three forms.
In some cases, professional interpreters cover the main spoken languages, while AI speech translation is added for languages spoken by a small number of participants. In other situations, professional interpreters are combined with live captions or subtitles. In a third scenario, all three are used together.
This foundational shift extends beyond spoken communication to the very systems that manage a company’s multilingual content. As organizations generate more material for diverse audiences, they require specialized technology that handles the region’s dominant language with native fluency.
For the written word, financial statements and marketing campaigns, Arabic-first Translation Management Systems are becoming critical. As highlighted in a 2025 report by Tarjama on its CleverSo platform, generic systems built for Latin scripts struggle with right-to-left layout, segmentation, and Arabic user interface needs, leading to inaccurate translations that hurt conversion and trust.
Al-Hassan emphasized the need for specialized systems, stating: “High expectations for Arabic quality, multiple dialects, and regulatory requirements mean generic tools are not enough. Businesses now need specialized systems that fit into daily workflows and handle language with consistency, security, and cultural awareness.”
Modern, AI-native platforms designed around Arabic constraints are now seen as essential for governing quality, ensuring consistency, and speeding up the localization of all written assets, from scanned PDFs to mobile app strings.
Regarding quality for high-stakes content, Al-Hassan added: “Our approach is built on a fundamental principle: quality cannot be inspected in; it must be designed in from the very beginning.”
The trend is set to intensify. With 77 percent of Saudi and Emirati professionals believing AI will positively impact daily work efficiency by 2029, the integration of intelligent language tools is becoming a benchmark for competitive, inclusive, and globally agile businesses.
Crook confirmed this outlook, saying that some “77 percent believe AI will be the fundamental driver of workplace efficiency by 2029.”
When justifying the investment in everyday multilingual communication, business leaders point to measurable returns. Braaten shared that leaders justify it by removing friction, reducing risk, and enabling effective contribution at scale. The returns are visible in productivity, with fewer follow-up meetings and faster team alignment, as well as in employee inclusion and retention.
Oddmund Braaten, CEO of multilingual event technology company Interprefy. (Supplied)
He also noted that 85 percent of organizers report attendee frustration when multilingual support is not available.
Clients of companies like Tarjama quantify the return on investment on two levels. Al-Hassan stated that internally, they measure faster turnaround times and lower costs, while externally, they look at quicker market entry and faster campaign launches. For most, the real value combines improved internal efficiency with accelerated growth across markets.
As AI translation becomes ubiquitous, Tarjama sees the next competitive frontier in consultancy-driven localization of complex business, government, and advisory content, addressing challenges around regulatory compliance and scalable market launches.
This shift is operational, as explained by Braaten who gave an example of a GCC-headquartered organization now using live translation more frequently. According to the CEO, the firm — with teams and stakeholders across the Middle East and Europe — is now “delivering ongoing professional training rather than just a limited number of annual events.”
Al-Hassan describes this as a shift from “conference-scale” to “workflow-scale” translation, where “translation is built into business systems” and “content moves through the workflow and becomes multilingual as part of the process.”" Miguel Hadchity 01 February 2026 https://www.arabnews.com/node/2631332/amp #Metaglossia #metaglossia_mundus #métaglossie
"No, USCIS does not require certified translations to be notarized. What's more important is the certification statement by the translator or translation agency confirming that the translated document is accurate and complete to the best of their knowledge.
Some people still go ahead and notarize their documents. While it's not an official requirement, it simply adds an extra level of authenticity to your document.
In a notarized translation, a notary public verifies the identity of the person signing the translator's certification. The notary does not verify the translation itself; they only confirm that the translator signed the certification in their presence.
Common USCIS Translation Mistakes That Cause Delays
Even small translation errors can affect your USCIS application. If authorities cannot verify the information on your documents, they may issue a Request for Evidence (RFE), which can delay your application.
Here are common mistakes that may affect your application;
1. Partial translations
Translating some parts of your documents can jeopardize your application. USCIS requires a full translation of the entire document, including stamps, annotations, and seals, so that officers can review it in context.
2. Missing certification statement
Every translated document must include a signed certificate confirming the translation is complete and accurate. Without the statement, USCIS may treat the document as invalid.
3. Incorrect name spellings
If your names don't match across your documents, it can raise concerns about your identity or relationship claims, especially if you're applying for a family visa.
4. Formatting inconsistencies
Translated documents that do not follow the structure and formatting of the original document can make it difficult for officers to locate key information. This can also result in delays or RFEs.
5. Using unauthorized translators
If a translation is found inaccurate or the translator's credentials are in question, USCIS may reject the application or request additional documents. To avoid this, it's always best to use professional translation services that specialize in USCIS document translation
FAQs
Can a family member translate my documents for USCIS?
USCIS does not explicitly ban family members from translating your documents. However, it raises concerns about bias. It's best to use an independent translator or translation agency to avoid any issues.
Do I need to submit both the original document and the translation?
Absolutely. When submitting your documents to USCIS, always submit the translation alongside the original documents so officers can verify the information.
What should the translator's certification statement include?
The certification statement must state that the translation is complete and accurate and that the translator is fluent in both languages. It should also include the translator's name, signature, and date
Will USCIS reject my application if the translation is incorrect?
There's a high chance that they will. Alternatively, they may issue a Request for Evidence (RFE) requesting additional documents or a corrected translation, which can increase the processing time for your document. It's always important to submit accurate documents from the start.
How long does it take to get a certified translation for USCIS?
The turnaround time depends on the provider you're using. Translayte provides USCIS translations in 12 hours or less, depending on the length of your document and the language pair.
Media Contact:
Sophia Orji
Content Manager
Email: sophia.orji@translayte.com
Website: https://translayte.com
SOURCE: BDXL Ltd"
https://www.bignewsnetwork.com/news/278905951/uscis-translation-requirements-2026
#Metaglossia
#metaglossia_mundus
#métaglossie
"BENNINGTON -- The Bennington Writing Seminars, the MFA in Writing program at Bennington College, announced the launch of a new dual-genre concentration in Literary Translation. Applicants and current students studying Fiction, Nonfiction, or Poetry will be able to add Literary Translation as a secondary concentration, lengthening the program from four to five terms.
“Bennington College has a great history as a center for the translation of literature,” said Bennington Writing Seminars Executive Director Mark Wunderlich, “and we are happy to now offer instruction in literary translation in our graduate writing program. Students will now be able to spend two terms studying with some of the finest translators in the field and leave with a fully translated work.”
Bennington College alum and Bennington Writing Seminars faculty member and National Book Award-winning translator Bruna Dantas Lobato designed the program to enable students to engage with a global literary community.
“We translate literature to engage with the world and its many languages, to be in conversation with and open to modes of thinking and being besides our own,” she said. “Literary translation is the rewriting of a literary text in a new language and all the transformations that act entails, as the text travels to a new cultural, linguistic, and aesthetic context. Translation broadens and deepens our understanding of humanity and language, shows us there are more possibilities beyond our reach, and pushes us to challenge our own perspective. It is thanks to translation and translators that readers aren’t cut off from the rest of the world, living in intellectual isolation.”
Dual-genre will extend the program from four to five semesters, three in the student's main genre and two in Literary Translation. Dual-genre applicants may apply to study Fiction, Nonfiction, or Poetry as their main genre, and Literary Translation, Fiction, Nonfiction, or Poetry as their secondary genre.
Applications are now open, and the coursework begins next January.
Lobato is a writer and translator. Her fiction has appeared in The New Yorker, Guernica, A Public Space, The Dial, and The Common. She was awarded the 2023 National Book Award in Translated Literature for The Words that Remain by Stênio Gardel. Originally from Natal, Brazil, she lives in Iowa and teaches at Grinnell College. Her debut novel, "Blue Light Hours," is out now from Grove Atlantic." https://www.benningtonbanner.com/local-news/bennington-college-launches-literary-translation-program/article_b3a9cf80-c81a-474e-9327-1951d9b23f16.html #Metaglossia #metaglossia_mundus #métaglossie
"The High-Performance Language Technologies (HPLT) project is developing very large-scale multilingual resources for large language models and machine translation.
Massive text collections for pre-training are the ‘crude oil’ of the large language model (LLM) era. The process of ‘refining’ high-quality datasets from web data at scale presupposes computational infrastructure and technological muscle that is often characteristic of corporate environments, as evidenced, for example, by some notable generally available pre-training datasets: C4,¹ FineWeb 1 & 2,2,3 MADLAD-400,⁴ or Nemotron-CC.⁵ With a few notable exceptions, this line of work tends to capitalise on the English language.
Here, we present the open-source results6,9,10 of the European R&D consortium HPLT – a project that has been funded under the auspices of the Horizon Europe programme in 2022–2025. Together with a myriad of additional results, HPLT has produced massive pre-training datasets of high-quality texts in close to 200 distinct language–script combinations. Its 2025 monolingual data release, HPLT 3.0, comprises some 30 trillion sub-word tokens in total, of which close to half represent languages other than English. We make this resource publicly available under the most permissive terms of use possible. We further share a state-of-the-art and open-source data preparation pipeline, an innovative multilingual evaluation framework, as well as hundreds of language models pre-trained on HPLT data.
Fig. 1
Furthermore, the project has produced novel bilingual datasets for more than 50 language pairs, hundreds of associated machine translation models, open-source pipelines for data preparation, model training, and evaluation, as well as synthesised additional pre-training data for underrepresented languages by machine translation of very high-quality English documents. In our view, it is the totality of generally available and very large-scale resources and the documentation of the underlying processes that bears promise of ‘democratising’ the current LLM and MT landscape.
Organisation
The HPLT consortium comprised partners from five different universities (Charles University in Prague and the Universities of Edinburgh, Helsinki, Oslo, and Turku), two national HPC centres (CESNET in the Czech Republic and Sigma2 in Norway), and a language engineering company (Prompsit) from all around Europe. The project has received about €4.1m from the Horizon Europe programme and £960,000 from UK Research and Innovation, and ran from September 2022 through December 2025. The project was coordinated by Jan Hajič (Charles University), with technical coordination by Kenneth Heafield (Edinburgh) and Stephan Oepen (Oslo) in its first and second halves, respectively.
Data curation
HPLT has gathered and processed more than ten petabytes of raw web data. The project has released more than 30 billion tokens (word-like units) of high-quality textual data, accompanied by rich metadata, for close to 200 distinct languages. The process of extracting, cleaning, annotating, and filtering texts from raw web archives is schematically depicted in Fig. 1, composed of about a dozen modules.
Raw web archives were drawn from three sources: the Internet Archive (IA), host of the iconic Wayback Machine); the non-profit Common Crawl Foundation (CC); and the ArchiveBot volunteer infrastructure for long-term web archiving. Sub-tasks like, for example, the extraction of ‘running text’ from marked-up document formats, language identification at the document and paragraph levels, ‘fuzzy’ near-deduplication, annotation with a wealth of text quality and regulatory compliance signals, and final filtering based on all available information, each directly impact the practical utility of the final data sets. Here, text quality versus overall volume present separate and typically antithetical dimensions for optimisation, creating a rich space for different design choices and trade-offs. This remains an active area of research. The open-source HPLT processing pipelines are highly flexible and parameterisable, where default values represent the current state of knowledge.
Monolingual statistics
To put the HPLT monolingual data into perspective, Table 1 (below) presents document and token counts (see note) for the English and multilingual (non-English) partitions of the data, as well as counts for a small sample of individual languages. For ease of comparison, these statistics are accompanied with average document lengths and per-language proportions, and contrasted with corresponding figures for three other publicly available multilingual datasets mentioned above.
Table 1: Note: For the purpose of comparable statistics across languages and different datasets, all token counts are computed using the Gemma-3 tokenizer,⁸ a SentencePiece model with a vocabulary of 256K sub-words, providing good coverage for all target languages
As is evident from these numbers, HPLT 3.0 is by far the largest publicly available such dataset, and its multilingual breadth compares favourably to other widely used resources. In Gemma-3 tokens, the multilingual HPLT 3.0 partition is about 2–3 times larger than FineWeb and the earlier version HPLT 2.0, respectively, and five times larger than the older MADLAD-400 dataset. In terms of average document length, which often is correlated with text quality, HPLT 3.0 and 2.0 pattern alike, markedly ahead of FineWeb but well behind MADLAD-400. For a small selection of European languages, the table shows languages ranging between a ‘mere’ billion of available tokens to others with hundreds of billions.
In-depth analytics
Training data quality arguably is the most important factor in model quality, but in-depth data inspection at scale is a challenging endeavour. HPLT has developed an open-source tool, HPLT Analytics, to compute a broad range of fine-grained statistics and enable interactive visualisation and exploration. The datasets are internally structured in documents, paragraph-like segments, and tokens. Descriptive frequency and length statistics, combined with basic correlation analysis with metadata like internet domains or predicted text register labels, can reveal distributional trends or outliers. Annotations are predominantly available at the document level, but in some cases also for smaller units. Contrasting the distributions of document versus segment language predictions, for example, allows insights into both degrees of in-document ‘code switching’ and uncertainty in language identification, typically among closely related languages.
Multilingual evaluation
As an additional tool to gauge data quality and experimentally inform design choices in training data preparation (as well as in language model training), the project has developed a framework for automated large-scale multilingual evaluation, dubbed HPLT-e. In its current state of development, the framework comprises 127 language understanding and generation tasks across the nine European languages highlighted in Table 1.
This selection allowed both availability of native speakers in the project team and a minimum level of diversity in terms of language resources, families, and scripts. Tasks in HPLT-e are often drawn from pre-existing benchmark suites, but emphasising natively constructed (rather than translated) tasks and extending each with three to seven human-written prompts to mitigate the methodological challenge of prompt sensitivity. Similar to Penedo et al.,2,3 we pretrain separate ‘smallish’ (2B parameters) GPT-like models per language using an otherwise fixed pretraining setup, and evaluate them at regular checkpoint intervals in a zero-shot regime, carefully selecting tasks that meet a range of evaluation signal criteria, i.e. can be expected to act as informative and reliable indicators of training data quality. Such criteria include monotonicity and relative stability of model performance as pretraining progresses, ranking consistency across pretraining intervals, and multiple, indicators of limited prompt sensitivity. Fig. 2 shows a comparison of the four datasets introduced above using HPLT-e. To aggregate scores across different prompts, tasks, and languages, per-task scores are maximised across prompts and min-max normalised relative to a task-specific random baseline. Per-task scores are then averaged across task categories within each language and, finally, across languages. An alternative approach to overall aggregation is called Borda’s count, using Vote’n’Rank,⁷ which is essentially the average of per-language counts of a model outranking all the others. Models trained on all four datasets for up to 100B tokens show a monotonic performance improvement on our selected tasks. Models pretrained on (the comparatively smaller) MADLAD-400 achieve the highest multilingual score, followed by HPLT 3.0, while HPLT 2.0 and FineWeb perform on par. These results are corroborated by rank-based aggregation across tasks and languages, which yields: MADLAD-400, HPLT 3.0, and HPLT 2.0 and FineWeb.
Language models
While training data creation has taken centre stage in the HPLT work plan, the project has also developed a wealth of language models of different sizes and architectures supporting various languages and language groups.
In addition to large language models trained from scratch for Finnish and Norwegian, a common theme in this work was strong emphasis on smaller, specialised models that are efficient to run. In total, publicly available project results comprise hundreds of language models, including the following sub-groups:
55 monolingual encoder-only (BERT-like) models for a typologically diverse set of languages. When fine-tuned as embedders for ‘classic’ language understanding tasks, these models uniformly show superior performance to standard multilingual models.
57 monolingual encoder–decoder (T5-like) models, again for a typologically broad set of languages. These models exhibit competitive performance in both embedding and generation benchmarks, thus, offering a novel platform for experimentation.
38 monolingual decoder-only (GPT-like) reference models, each with 2.15B parameters and trained to 100B tokens. These models can serve a number of purposes, including as baselines for mono- and multilingual training, references for the comparison of HPLT and other data, and tools for contrasting the HPLT data quality across different.
Two larger (13B parameters), continually pretrained generative models, for Finnish and Norwegian, built on the fully open-source OLMo 2 platform. These models compare favourably to language-specific adaptations of the Mistral NeMo model, suggesting that fully transparent foundation models can yield competitive results to their merely open-weight counterparts.
Mining for bilingual text
Another wealth of open-source results from HPLT are related to machine translation (MT), notably large collections of parallel texts derived from mining the monolingual datasets for translational correspondences at the sentence of document levels. These resources are created using the additional processing block called Bitextor Pipeline in Fig. 1. The pipeline applies a multi-stage text extraction procedure that identifies documents with identical content in different languages using various matching and alignment techniques implemented as an open source toolbox.¹ Heavy parallel computing makes it possible to run such bitext mining on a scale provided by the monolingual web-crawls coming from HPLT. Traditionally, parallel texts are provided as sentence-aligned bitexts that can directly be fed into machine translation training. HPLT provides three releases of parallel text corpora with a language coverage of 57 language pairs. The data is collected in an English-centric manner aligning documents with English counterparts in our dataset. Pivoting on those English documents, we can then also derive multilingual parallel text collections spanning 1,446 language pairs. In total, HPLT provides 2.7 million sentence alignments released from our repository of parallel corpora, OPUS.²
Fig. 2
Machine Translation
Mirroring the interplay of data creation and model building in the LLM track, HPLT has worked intensely on the development and evaluation of new translation models for 100 language pairs, combined with novel infrastructures for automated training at scale and integration of benchmarking results into the OPUS dashboard. A special focus is set on efficiency, emphasising the need of compact translation models that can run locally on edge devices. Specialised models that are several magnitudes smaller than common general-purpose language models enable fast inference without losing translation performance and enable secure deployments that are independent from external services and online connections. Translation models trained including HPLT data show competitive performance in comparison, especially for lesser-resourced languages. To further reduce computational costs, we also developed a pipeline for systematic multilingual knowledge distillation that supports the transfer from expensive teacher models to compact student models that can be as small as 20 megabytes of size.
Computational infrastructure
All work in HPLT has been exceedingly compute- and storage-intensive, made possible through a combination of resources covered by the project grant and of additional substantial resources allocated to consortium members from national (Czech, Finnish, and Norwegian) quotas and through the EuroHPC system. ‘Bulk’ storage for very large-scale web data, in total close to 21 petabytes, was distributed over facilities in the Czech Republic (CESNET), Norway (Sigma2), and Finland (LUMI). Exclusive access to dedicated compute nodes tightly integrated with the storage systems made possible a first stage of lightweight document and metadata extraction (see Fig. 1), reducing the data volume for further processing by about a factor of three.
In addition to some experimentation on national superclusters, the EuroHPC LUMI system served as the main ‘workhorse’ for HPLT, where the consortium used combined allocations of around 60 million CPU and about 11.5 million GPU hours over the 40-month project duration, which is the theoretical equivalent – on average – of more than 2,000 active CPUs at all times."
6th March 2026
https://www.innovationnewsnetwork.com/hplt-high-performance-large-language-models-for-europe/67406/
#Metaglossia
#metaglossia_mundus
#métaglossie
"What is language validation? Language validation, also known as linguistic validation, is a crucial part of the translation process in which a person fluent in the target language confirms that translated content is technically accurate and captures the cultural nuances of your original training content. Without this step, you may risk employees misunderstanding the translated version in their native language.
Without proper language validation, your training program could include inaccuracies that confuse learners and erode their trust. For example, a football analogy in translated content can confuse learners, since American football is a completely different sport from football…well, everywhere else.
Poor translations can also cause bigger problems. For example, imagine your sales training course uses the common American idiom, “You ROCK!” Americans interpret that as a supportive encouragement, but a direct translation will sound more like calling the sales leader an inanimate, hard collection of minerals one might throw at an enemy. Sure, the translation is technically correct, but it doesn’t make sense. Validators who speak the target native language catch these simple errors and correct them before the final version creates a lot of confusion.
How to choose the right validation method for your document Different assets require different levels of validation. You may worry that a professional translator is your only option, but the truth is, it depends.
To help you pick the best validation method for your assets, we put together a handy framework that covers validation approaches for low-, mid-, and high-risk documents.
Low-risk documents Low-risk documents are informational or supplemental materials where minor translation errors won’t impact learner performance, create compliance issues, or lead to legal problems.
Examples include:
Internal training announcements Course welcome pages Module introductions Translation validation tips for low-risk documents Here are some easy translation validation tips and options to help you translate low-risk documents.
1. Use free online translation tools for forward translation Use Google Translate, DeepL Translator, or Reverso for a quick, free validation check. Translate your low-risk training content into the target language and scan the output to surface obvious issues like missing information and incorrect terminology. Google Translate is especially useful to validate simple text, like headings, short instructions, and summaries, where the impact of errors is low. Still, the output might contain grammatical errors and awkward literal translations.
2. Use a second tool for backward translation Reverse translation, also known as back translation, is a quality assurance method where a translated text is retranslated into the original language to ensure the back translation holds up to the original.
Translate a section of your training into the target language. Articulate customers can use Localization for this first pass and expect a highly accurate first draft. Then, use a second online translation tool to translate it back into the original language. Compare the two versions to spot meaning shifts, missing details, or overly literal phrasing.
This approach works best for summaries, introductions, and labels. But mid and high-risk documents require a more robust approach.
Mid-risk documents Mid-risk documents are assets that guide actions or influence behavior. This means translation errors could impact learner performance, but are not as likely to cause legal or safety issues.
Examples include:
Procedural guidelines Internal playbooks or messaging frameworks Quizzes or assessments that reinforce procedural knowledge Even if you don’t have an expert, you’ll benefit from submitting these to a competent reviewer to ensure clarity, tone, and cultural appropriateness.
Translation validation tips for mid-risk documents Reverse translation tricks aren’t enough for mid-risk documents like procedural guidelines, internal playbooks, and messaging frameworks, where the impact of error is higher. So, you’ll need to bring in a competent, though not necessarily expert, validator.
Here are a few validator options when skipping the review process won’t suffice.
1. Consult fluent speakers in the target languages A colleague fluent in the source and target languages can help validate mid-risk documents because they understand the organization’s intent and the target language’s cultural nuances.
For example, an employee based in Canada who speaks fluent Brazilian Portuguese can review step-by-step guidelines for accuracy, tone, inconsistent terminology, and awkward literal translations.
However, being U.S.-based, they may miss locally preferred expressions or subtler cultural nuances.
2. Have a local employee review content An employee who speaks the target language and lives in the target country (e.g., a Brazilian Portuguese native speaker who lives in Brazil) can flag language that sounds awkward, overly formal, or unnatural to local employees. Because they’re immersed in the linguistic and cultural norms of the target country, they may miss subtle differences from the original language.
3. Ask a trusted community source A friend or family member who speaks the target language can provide feedback on mid-risk documents, specifically confusing or complex language, awkward phrasing, and whether instructions are clear from an outsider’s perspective.
However, they lack the organization-specific knowledge needed to ensure content aligns with the business’s needs and training intent. They may also fail to pick up on nuances of the target language and cultural references.
High-risk documents High-risk documents are materials where translation errors could create serious problems, including legal, safety, compliance, or financial risks.
Examples include:
Compliance courses Workplace safety or regulatory training modules Legal documents These documents require professional translators or validators to ensure accuracy, maintain regulatory compliance, and protect learners and businesses from risk.
High-risk documents: When to use a professional validator Consider using a professional translator or validator when handling high-risk content like compliance course content, legal documents, and workplace safety modules.
Technically inaccurate or culturally misaligned translations of these content types can lead to legal, safety, and compliance issues that put your organization and employees at risk.
For example, say the original version of the organization’s mandatory data privacy and security training course forbids employees from sharing customers’ personal information outside of the company without written consent. But the Japanese translation erroneously allows this.
Employees sharing sensitive data could pose security risks to customers, and the company could also face severe backlash and costly regulatory fines.
A professional translator helps prevent these problems by ensuring mandatory instructions, policies, and safety procedures are translated correctly and that content aligns with local regulations like the GDPR for EU and UK-based employees.
Pros and cons of linguistic validation methods Now that you have a list of validation options, let’s compare them by considering the pros and cons of each.
Free online translation tools Pros:
Best for low-risk content (e.g., course introductions, announcements, optional resources) Free and fast Cons
Lacks context and cultural nuance Grammatical or phrasing issues Reverse translation Pros
Best for low-risk content Helps surface meaning drift and missing information Cons
Time-consuming Requires using multiple translation tools Lacks context and cultural nuance Native-speaking colleague Pros
Best for mid-risk content (e.g., procedural guidelines, internal playbooks, messaging frameworks) Fluent in source language and target language Deep understanding of organization tone and terminology Can review accuracy, tone, terminology, and awkward phrasing Cons
May miss local phrasing or cultural nuance Local employee Pros
Best for mid-risk content Deep understanding of local language and workplace norms Can validate tone, clarity, and natural/local usage Cons
May miss subtle misalignment with original source language Trusted community source Pros
Best for mid-risk content Can validate confusing or complex language and awkward phrasing Cons
Lacks organizational knowledge and training context May miss professional or industry-specific nuance May miss target language nuance Professional translator or validator Pros
Best for high-risk content (e.g., compliance courses, legal documents, workplace safety modules) High accuracy and cultural alignment Ensures consistency across programs Cons
Higher cost Longer turnaround time When choosing a translation validation method, think about your content’s risk level first. Then factor in which free or low-cost resources you have available that will deliver the quality you need to avoid major translation errors.
What to look for in a quality translation A quality translation isn’t just technically accurate. It also captures the original meaning of the source content, aligns with organizational goals, and fits the workplace training context.
Here are five things to look for when verifying translation quality:
Preservation of original meaning: Make sure the translated course or content captures the intended meaning of the source material, including tone, context, style, and cultural references. Adherence to rules for grammar and mechanics: Check that the grammar, spelling, and sentence structure are correct and consistent in the target language. Alignment with company brand and training goals: Translation quality also depends on how well the content aligns with your organization’s brand voice, audience, and training goals. Workplace training and e-learning relevance: The translated content should use language, examples, and references that align with e-learning and workplace training. Consistent terminology usage: Quality translation ensures specific words and phrases are the same in every language to prevent confusion. This is easier when you have a custom translation glossary, as is the case with Articulate Localization, a localization solution embedded in Articulate’s course authoring platform. With a top-notch translation, you save hours on rewrites, boost learner confidence, and ensure a better learning experience for your global workforce.
Validate training with the right resources Just because you don’t have access to a professional validator or linguist doesn’t mean your goal of translating course content is unattainable. Quite the opposite, actually.
With the right resources, you can ensure your training course or program delivers essential information to learners in a way that makes sense to them linguistically and culturally.
Knowing which resources to use when makes the process even easier. While generic tools work for low-risk content like module introductions, medium and high-risk documents require more robust options, especially when compliance and safety are concerned.
So weigh your options carefully. And should you reach the point of needing a professional validator, check out our blog post on how to find the right one."
https://www.articulate.com/blog/translation-validation-tips-when-you-dont-have-a-professional-validator/ #Metaglossia #metaglossia_mundus #métaglossie
"Tilde, the language technology company from Latvia, has adapted its large language model TildeOpen LLM for translation and integrated it into a machine translation platform that provides reliable high-quality translations into 34 European languages.
Until now, the model was mainly a significant scientific achievement in the development of artificial intelligence for European languages, but it had not yet been adapted for everyday use by a wider audience. Now, it is available to the public for both private translation needs and daily work.
Starting today, anyone can use the translation platform, which provides exceptionally high-quality and secure translation into 34 European languages, including Latvian, Lithuanian, and Estonian, and provides for accurate use of terminology and more natural, fluent sentences, reducing the post-editing workload of the machine-translated texts.
TildeOpen provides quality that is competitive compared to much larger global models, such as ChatGPT-4.1, even though it is about 60 times smaller. Detailed results of the comparative tests are available in the ranking of large language models TildeBench.
Organisations can deploy TildeOpen on premises or in Europe-based clouds, thus maintaining full control of their data. Unlike many global AI solutions, the data is never transferred outside Europe. This is especially important for public bodies and enterprises that handle sensitive information. At the same time, the model can be customised to suit individual needs, thus providing particularly accurate and reliable translations.
TildeOpen was published as an open-source foundational model for European languages on the Hugging Face platform in autumn 2025. It was developed in Tilde’s research laboratory on behalf of the European Commission. The model has 30 billion parameters and is trained on hundreds of billions of words in European languages, including 29 billion Latvian text units. This is the largest known amount of data used in the development of Latvian artificial intelligence. The model was developed after winning the Large AI Grand Challenge contest organised by the European Commission, using the LUMI supercomputer in Finland."
6 March, 2026
Avots: Press release
https://labsoflatvia.com/en/news/tildes-artificial-intelligence-marks-a-new-era-for-translation-in-european-languages
#Metaglossia
#metaglossia_mundus
#métaglossie
"The 2026 Finnish State Award for Foreign Translators has been presented to the distinguished Danish translator Siri Nordborg Møller (b. 1981), whose work has greatly broadened the international visibility of Finnish literature. The EUR 15,000 award has been granted annually since 1975 by the Ministry of Education and Culture based on a proposal from FILI – Finnish Literature Exchange.
Over the course of her remarkable career, Siri Nordborg Møller has translated more than 130 Finnish books. She began her work as a translator in 2006 while studying for a Master’s degree in Finnish at the University of Copenhagen. Since then, she has translated a wide range of acclaimed fiction, including seven books by Leena Krohn; works by Matias Riikonen, Johanna Sinisalo and Pajtim Statovci; crime novels by Satu Rämö Max Seeck and Arttu Tuominen; poetry by Sirkka Turkka; graphic novels by JP Ahonen and Tommi Musturi; nonfiction by Mia Kankimäki; and many internationally appealing children’s and young adult books, such as Aino Havukainen and Sami Toivonen’s Tatu and Patu series and works by Siiri Enoranta, Vilja‑Tuulia Huotarinen, Siri Kolu and Timo Parvela. She is currently working on Arttu Tuominen’s Delta crime series and Naraka, a fantasy novel by Elina Pitkäkangas.
According to Nordborg Møller, versatility has always been central to her work. She aims to accept every translation assignment offered to her so she can present Danish readers with as broad a range of Finnish literature as possible.
“Siri Nordborg Møller has an exceptionally wide ranging body of work, and she has been highly productive as a translator. Through her active contribution she has introduced the richness and diversity of Finnish literature to readers in Denmark and increased the international presence of Finnish writing. Translators play a vital role in bringing Finnish literature to new audiences abroad,” says Minister of Science and Culture Mari‑Leena Talvitie.
Finnish literature is currently being published in Danish in impressive numbers. “According to FILI’s statistics, Danish had the third‑highest number of Finnish titles published last year, after German and Estonian. Altogether, 28 Finnish books appeared in Danish, seven of them translated by Siri Nordborg Møller,” says FILI’s Director Tiia Strandén.
Siri Nordborg Møller notes that every book brings its own inspiration and challenge. From time to time, she encounters novels whose language resonates so strongly that the translation process becomes pure joy.
“Such works include Anu Kaaja’s Katie‑Kate, Katri Lipson’s The Ice Cream Man, Matias Riikonen’s Matara, Johanna Sinisalo’s Not Before Sundown, Siiri Enoranta’s Summer Storm, Vilja‑Tuulia Huotarinen’s Light Light Light and all of Leena Krohn’s works,” Nordborg Møller says.
She has also translated a substantial body of children’s and young adult literature. Picture books with minimal text may appear simple to translate, but they often contain names and wordplay that demand great ingenuity.
“The Tatu and Patu books are in a class of their own. Their humour and chaos have to be conveyed to Danish readers with energy and wit. The most challenging of all was Tatu and Patu: Monster‑Monster and Other Strange Stories, written entirely in rhyme. Translating it was enormous fun, but at times it felt almost impossible.”
Inquiries and requests for interview: Hannele Jyrkkä, Communications Manager, FILI – Finnish Literature Exchange hannele.jyrkka@finlit.fi, tel. +358 50 322 2387"
Ministry of Education and Culture
Publication date4.3.2026 10.06 Type:Press release
https://valtioneuvosto.fi/en/-/1410845/siri-nordborg-m-ller-wins-finnish-state-award-for-foreign-translators
#Metaglossia
#metaglossia_mundus
#métaglossie
"BRUSSELS — Dozens of wannabe EU translators who were forced last year to resit a grueling entry exam because of a technical blunder have now been incorrectly disqualified, they said.
Some of the nearly 10,000 would-be Eurocrats who did the online test last year and who had to repeat the exercise a few months later because of a “set-up defect” were told they were being disregarded because they hadn’t completed all the exams. They say this was an error and that they’ve done everything that was requested.
“I did sit all of them! So I do not understand! How can they be so careless? What do we do?” wrote one applicant on a Facebook group for candidates. Messages in this group and a separate private Whatsapp chat suggest dozens of people are affected. POLITICO has chosen not to name the people who wrote messages because the Facebook group is private.
The tests are run by the European Personnel Selection Office (EPSO), an interinstitutional body that organizes recruitment for institutions including the European Commission, the European Parliament and the Council of the EU. The exams are a gateway to a career in the EU civil service.
“I regret to inform you that your participation [in the process] has come to an end, since you failed to sit at least one of the tests scheduled for the competition,” according to letters sent to two candidates POLITICO spoke to, and screenshotted by several others on the Facebook group for linguist candidates.
There are scores of messages from candidates online who received that message and say they did take part in all of the required exams. Some of those candidates say they contacted TestWe, the platform that runs the online tests, which confirmed to them they had completed all of their tests.
“This is just SOOOO ridiculous,” wrote another person on Facebook, who said she had also been falsely identified as not completing all of the tests.
Two candidates who were affected told POLITICO they are aware of dozens of people who received the email.
“I was already very annoyed when I had to resit the test,” said one candidate who sat the Spanish-language competition last year and asked to remain anonymous. “Now we see all these errors, all these inconsistencies. I have proof of all the exams I sat. I just don’t think it’s fair.”
The translator tests include exams on language knowledge and verbal and numerical reasoning. | Kenzo Tribouillard/AFP via Getty Images
“We had to wait 1 year for this crap,” one frustrated person with an anonymous username wrote on the Facebook group.
Another candidate who took part in the Greek language competition, and who asked not to be named because they are considering taking legal action, said: “I took it for granted that this was just a mix up with the emails they sent. But it’s been more than a week now and we don’t have any news.”
In a statement to POLITICO, a Commission spokesperson said that EPSO took all complaints seriously.
It is “currently carrying out thorough checks on each individual complaint, in close cooperation with its service provider, TestWe,” the spokesperson said. “At this stage, it would be premature to indicate how many candidates may be concerned. All cases are being examined individually, and candidates who have not yet submitted a request can still do so. EPSO is committed to reducing delays as much as possible while ensuring that each request is handled carefully.”
‘Now or never’
The translator tests include exams on language knowledge and verbal and numerical reasoning. Successfully passing those tests and getting onto the EPSO reserve list allows people to apply for specific open positions within the institutions.
The competitions to get on the reserve list only take place once every several years.
“You feel that if you lose this chance, most probably, with all the transformations in the industry like AI, it’s now or never for many of the candidates,” said the Greek-language candidate.
To complicate things further, the reserve lists featuring the successful candidates for some languages — Dutch, Maltese and Danish — of the most recent competitions have already been published, leading candidates to worry that those people have an advantage for jobs.
“The ones who did not have this issue will actually engage in the recruitment process and might have more chances, and that could create an issue as well,” the Greek candidate added.
“How is it so difficult to arrange a test?” wrote another anonymous user on the Facebook group.
This article has been updated"
March 4, 2026 4:02 am CET
By Mari Eccles
https://www.politico.eu/article/eu-translators-botched-their-entry-exam-again/
#Metaglossia
#metaglossia_mundus
#métaglossie
"Philly has a cozy community of translators who translate from many different languages, opening up texts from other countries and cultures for English speakers.
Philly literary translators are trying to increase awareness in the field. (Courtesy Paul Dry Books)
Love Philly? So do we. Let’s be friends. The Billy Penn newsletter keeps you informed about everything Philly, with a quirky vibe of healthy skepticism and persistent optimism. Join us and sign up today.
When Russia invaded Ukraine in 2022, Marianna Suleymanova’s skills as a translator were put to use translating an anti-war journal called “Roar Review.”
“I thought, ‘well, I think I can help people here understand a little bit of what’s going on,” she said.
Now, she’s one part of a group of Philly-based translators who turn literature from other languages and cultures into English.
Suleymanova’s skills stem from her time growing up in Tashkent, Uzbekistan when it was still a part of the Soviet Union, where she spoke Russian and English. She moved to the United States at 16 and eventually went to work.
“I translated at NASA, and I translated for other industries,” she said.
Suleymanova practices literary translation in addition to her full-time career. Despite this, she says this is her true passion – allowing her to uplift the voices of Russian-language writers to English speakers.
“I think it’s important to have this alliance across languages,” she said. “Russian speakers are not who I do this for. It’s for Americans and English speakers that I do this for, wherever they may be, whether it’s in Australia. People can read my pieces anywhere they’re on the internet.”
And Suleymanova is not alone. Philly’s literary translating community is vast.
“It will continue to be important”
Literary translation is different from literal translation – authors who are literary translators tend to try and preserve the original voice and tone of the text across languages.
Many popular American books are also translated – including The Girl with the Dragon Tattoo (translated from the original Swedish written by Stieg Larsson) and Pinocchio (translated from Carlo Collodi’s original Italian).
And literary translation has been around for a long time, said Emily Hunsberger, a Philly translator who translates from Spanish.
“Work has been translated in so many languages,” she said. “It’s kind of conduit for us to read each other’s stories and learn from each other and see what’s universal among us and what things challenge our understanding because it’s so different from the culture that we’re used to.”
Despite this, Hunsberger said there hasn’t been as much visibility and awareness around literary translation in the U.S.
“I took a world literature course when I was in high school, many years ago, and all the books we read were translated works,” she said. “But never once did we learn the names of the translators. Never once did we talk about what translation is, or theories or challenges or dissecting what the act of translation is.”
She said this is part of what inspired her to get into the field. Hunsberger owns a multilingual translation company. She helps to translate content from Spanish and Portuguese to English or from English to Spanish.
She explained that this piqued her interest in literary translation.
“I had always thought that I’d love to do literary translation,” she said. “In my younger years, I did a lot of creative writing, and I took a translation course when I was in undergrad, but I had left it on the back burner. I knew how to translate, but I didn’t really know about the publishing industry or how it worked.”
She entered into the field after moving to Philly in 2020. She said she quickly recognized the value it brought to the city
“Philadelphia has so many eclectic, unique spaces for art,” she said. “And I feel like I’ve talked to a lot of people in different disciplines of art who feel like Philadelphia is a place where you can practice your art, and so I feel like literary translation is just another one of those disciplines where this is a great place to be based, to be doing your art.”
And beyond its artistic impact, literary translation is important to open others up to different worldviews, said Stephanie Schechner, a retired teacher from Widener University who translates from French.
“Because Americans don’t study languages as much as other parts of the world, translation becomes an essential way for Americans to get access to voices that represent other points of view,” she said. “That helps open the world to people who cannot read things in the original text.”
Schechner’s work focuses on a lesbian, working class French author, who goes by Mireille Best.
Schechner said she felt like translating works from an author like this would be important, as it can show Americans who are feeling like their voices aren’t heard that there are models for their experiences around the world.
“I think knowing that there were people ahead of us in the past who were fighting for their right to be individuals can give young people some hope,” she said.
Philly’s literary translation community
Philly’s literary translation community is small, but dedicated.
Hunsberger explained that Philly has an informal collective of translators called Transversal, which has helped connect Philly-area translators.
“Transversal does not have any formal or nonprofit status,” she said. “We don’t have a board or anything, everything is just very organic and informal, and anyone who’s part of the collective can organize a gathering or anything they want to.”
Transversal was started by UPenn graduate students Liz Rose, Hilah Kohen and Kate Meng Brassel several years ago.
The group now holds in-person meetups and co-working sessions, allowing for connection between members.
Schechner said the group has employed creative strategies to facilitate connection between translators in different languages – including structured work sessions.
“We do what’s called a Pomodoro,” she said. “You work for 25 minutes and we set a timer, then we take a five minute break in the middle, and say hello to each other. Then, we work for 25 more minutes, and then chat briefly at the end and then we leave. We’re just creating a space and an accountability where people could sit with each other and be in community.”
Sean Gasper Bye has worked in literary translation for many years, including time as the interim executive director at the American Literary Translator’s Association – the only national organization in the U.S. dedicated to supporting literary translators.
He said he initially got into the Philly literary translation community after moving back to the area from New York City.
“I had always thought that Philly had the makings of a great translation town, because it has such strong cultural infrastructure,” he said. “I feel like people in Philly are very worldly, are very interested in culture and are readers.”
He said with the creation of Transversal and conversations with other translators, a solid community was formed. He said local collaborators, like bookshops, have also been receptive to events and partnerships.
Suleymanova emphasized that the morale of Philly’s translation community helps to keep her motivated.
“To look across the table and see people that are as hell bent as you are about bringing these stories across borders and languages, it could feel like you have a team in this, even if somebody’s working from entirely a different language,” she said.
Hunsberger said she is excited to see community partnerships and interest around the topic growing, and hopes to continue with the momentum.
“We want to continue, this year, with doing more of that kind of community outreach and bringing in the people interested in translation, or who are already involved in literary translation in Philadelphia who we haven’t managed to meet up with yet, and doing more things with these other organizations that are doing important work in the community,” she said.
‘It’s quite solitary’
There are obstacles literary translators have to face.
“What we do is very niche,” Suleymanova said. “There’s not a lot of spotlight on it. It’s quite solitary. It takes years for this work to see the light of day.”
Philly’s literary translators try to get together to combat these issues and offer each other support. There are even events different translators will often host.
Hunsberger said there have been bigger events the community has put together as well – including a Literary Translation Workshop she hosted late last year.
💌 Love Philly? Sign up for the free Billy Penn daily newsletter and stay in the know
“We co-sponsored the practical literary translation workshop that I led at The Head & The Hand Books last November,” she said. “It was called the ‘Translingual Remix,’ and it was meant to be small, because if you’re going to have it was a two hour time block, and if you’re going to have translators working on a piece, reading, and sharing, you you can’t have too many people in the room,” she said.
“So it was intended to be a small workshop, but it was really, really cool, because the languages that people were bringing of who signed up was Hungarian, Ukrainian, Bangla, Yiddish, Italian and Spanish.”
Schencher said it can also be hard to find time for literary translation, as it often doesn’t pay enough to be a primary career.
“Many translators are otherwise employed to pay their bills, or they’re in school, and carving out translation time for almost all of us is tricky,” she said.
There can also be a lack of recognition for translators and the effort it takes to rewrite books into English, said Mahmud Rahman, a translator who translates from Bengali.
“We want translators to be more recognized, and some of us feel that the name of the translator should go on the cover of the book,” he said. “Some publishers do that. Many do not, and it’s a constant tug of war, because essentially, when you’re translating a book into a language, you’re essentially recreating it, and it’s more your work.”
Rahman emphasized Philly’s literary community oftentimes does not get recognition compared to other big cities like New York.
Bye explained there is also a lot of thought that goes into literary translating – work he says cannot be replicated by a machine.
“It’s easy to think that we’re just kind of walking dictionaries who sort or swap one word in for another, and that it can be done quite mechanically,” he said. “And that’s really not the case.
While artificial intelligence is a concern, Hunsberger said that the machines can’t replicate much of literary translation.
“The point that machines can’t really get at this point in time when it comes to literature or translation itself, is that what you would get from one translator would be different than what you get from another translator,” she said. “Because there’s also an artistic component.”
“Communication and connection”
Despite these challenges, Hunsberger said Philly’s literary translation community is special.
“I think it’s almost like an infinite well of conversation and connection,” she said.
Bye said practicing literary translation also helps to challenge our traditional ways of thinking.
“Something that is really special about translation, is that you have access to these works that came up in a different cultural context, a different historical context, a different literary context, and you can see them breaking our rules or not paying attention to our rules, because those aren’t the rules over there,” he said.
He said that Polish writing, for example, oftentimes focuses less on the genre of story and more on the writing quality – which he says may not be the same in America.
If you are interested in literary translation or joining Philly’s Transversal group, you can send them an email at transversalphl@gmail.com."
by Violet Comber-Wilen
March 3, 2026
https://billypenn.com/2026/03/03/literary-translation-services-philadelphia-language-translation/
#Metaglossia
#metaglossia_mundus
#métaglossie
"This is one of those publishing stories where the author—and in this case the translator too—waited for years, certain that their work deserved a wider audience, and was forced to stand patiently by while other authors and their books found readers, until it was finally her chance to do the same.
When Kanako Nishi won the Naoki Prize in early 2015, she was just over ten years into her career; she had written over a dozen books including novels, short stories, essay collections, and children’s books. A couple of her books had even been made into feature-length films. Her work had already received the Oda Sakunosuke Prize and the Kawai Hayao Story Prize. But winning the prestigious Naoki Prize for her epic Saraba! solidified her place as a literary luminary. Readers were undeterred by the novel’s length—732 pages, divided into two hardcover volumes—and it went on to sell over 460,000 copies, one of the top five bestselling titles of that year.
This is when I met Kanako.
This is one of those publishing stories where the author—and in this case the translator too—waited for years, certain that their work deserved a wider audience.
I had been contacted by an editor who was looking for someone to write an opinion essay about the murder of two Japanese hostages by ISIS militants. I reached out to a mutual friend who had recommended Kanako’s books to me years earlier to ask if she wanted to respond. The resulting essay, “Merry Christmas,” was our first collaboration. I was astonished by the way that this short piece managed to pack in a macro and micro outlook on religion and geopolitical events, all through the prism of a childhood memory.
Kanako was born in 1977 in Tehran, where her father was posted for work, but the Iranian Revolution there prompted her family to return to Osaka before she was two years old. When she was in elementary school, they moved again, this time to Cairo, where the family lived for four years. Perhaps because of this peripatetic early childhood, or perhaps the result of an encounter with Toni Morrison’s The Bluest Eye at a pivotal moment later in her adolescence, Kanako’s reading habits were deeply influenced by international literature alongside her formal education in Japanese literary tradition.
She’s a fan of Chimamanda Ngozi Adichie and John Irving as well as Jun’ichiro Tanizaki and Kuniko Mukoda. Saraba! is an excellent example of the I-novel, an early twentieth-century literary form in which the author writes in a naturalistic, confessional style. Kanako dared to take on this genre that had been dominated by male writers such as Osamu Dazai and Yukio Mishima, but turned the convention on its head—still hewing to its semi-autobiographical style by giving her protagonist the same formative experiences that she had in Iran and Egypt, even the same birthday as hers, she flipped the gender, making the narrator male instead of mirroring herself as female.
In 2016, fellow Japanese literary translators Lucy North and Ginny Tapley Takemori and I formed the collective Strong Women, Soft Power to promote the work of Japanese women writers in translation. As part of a series initiated by a conclave of translators that year at the London Book Fair and published here on Literary Hub, we compiled a list (just ten!) of books by Japanese women writers we’d love to see in English—needless to say, Saraba! was included. It’s worth mentioning that books by eight of these ten authors have now been translated into English and published, with a ninth first-time-in-English title forthcoming.
The English-language publishing market began to catch up with the Japanese literary landscape, where women writers had already been the cultural zeitgeist for some time. In the first three years after the National Book Awards relaunched the category for Translated Literature, Yoko Tawada won in 2018 for The Emissary, translated by Margaret Mitsutani; Yoko Ogawa was shortlisted in 2019 for The Memory Police, translated by Stephen Snyder; and Yu Miri won in 2020 for Tokyo Ueno Station, translated by Morgan Giles. Add to that the phenomenal success of Convenience Store Woman by Sayaka Murata, translated by Ginny Tapley Takemori—proving that novella-length books could succeed as standalone volumes—and editors were clamoring for more Japanese women writers in translation.
Perhaps it seemed like a risky moment to introduce a new voice with such a monumental and door-stopping tome like Saraba! But Kanako and I persisted. In 2020 I received a fellowship from the National Endowment for the Arts for the book. The attention from these grants often leads to a publishing contract but, unfortunately, not for us. I kept submitting the book to editors, all the while still translating her work, placing stories and essays in Freeman’s, Granta, Words Without Borders, and here on Literary Hub.
Article continues after advertisement
All the while, Kanako’s renown and reputation continued to grow. She was invited to international literary festivals—occasionally I got to go along too. We won a Pushcart Prize for the story, “My Ass,” published in Brick. But I still struggled to find a home for her books with an English-language publisher, even as more and more Japanese women writers like Hiroko Oyamada, Mieko Kawakami, and Emi Yagi were coming onto the scene in translation.
In 2023, Kanako published her first memoir くもをさがす (Kumo o sagasu; Looking For Spiders and Clouds), which is about moving with her family to Vancouver, BC, shortly before the pandemic and then in 2021 being diagnosed with breast cancer. Kumo o sagasu is a literary memoir, an illness diary, a lyrical exploration of cultural difference and similarity all in one volume. The book won the Yomiuri Literature Prize for Nonfiction, the Japan Booksellers’ Award, and has been the bestselling nonfiction book of the Reiwa era (2019 to present). This catapulted her to another level of success and visibility. However, given the aforementioned considerations when introducing an author in translation, it seemed like even more of a risk to attempt to do so with a work of nonfiction, however successful it may have been in Japan.
*
I often say that a book getting published in translation is like capturing lightning in a bottle. It can seem miraculous for any book to make it to publication but for an author’s work in another language, there are even more variables—the publisher and agent in the original country, often another agent working in the U.S. or U.K., the English-language editor, and of course, the translator, who may enter into the process at almost any point along this prospective line.
There is something of an art to curating a writer’s work in another language.
In our case, Kanako and I were lucky enough to meet Alexa Frank, a rising-star editor at HarperVia, an imprint of HarperCollins that is dedicated to publishing international voices. Alexa is herself a Japanese translator—a huge advantage for us since Japanese is a market few editors can access personally, without having to rely solely on reader reports or book synopses provided by the agent or publisher. Alexa was of course familiar with Kanako’s work and knew how much potential there was to being able to introduce her books to English-language readers.
Sometimes a writer will be discovered soon after they debut in their original language, and their books can be translated and published around the world along a similar timeline and in the same order as in their country. Other times there is more of a lag, a few years or a few books into an author’s career. Kanako’s case is even more unusual—with a backlist of books spanning twenty years, where do you even start? At the beginning? Or work backward, from her most recent publication? Go with her bestselling title? Or the book that is most beloved by readers in Japan? There is something of an art to curating a writer’s work in another language. Then again, why shouldn’t readers have access to an author’s full oeuvre so that they can appreciate the breadth of a writer’s talents, and judge for themselves?
After much deliberation, Alexa proposed introducing Kanako Nishi in English by starting with her debut novel Sakura, a meticulously plotted comic and tragic family drama narrated by the middle child Kaoru, who is caught between a hero brother and a turbulent sister, all of them protected by their doting parents and everyone united by their love for the eponymous family dog, Sakura.
With this selection, readers can have an experience similar to readers in Japan in encountering this fresh new voice, inflected with Kanako’s distinctively Osakan warmth and wit (the adorable Shiba Inu mutt at the center of the story certainly may be part of the appeal too). This early novel showcases the writer as she broke onto the scene decades ago, while already hinting at some of the increasingly complex themes she would revisit as her talents developed and her confidence grew. We already know that there are plenty more daring and ambitious books that await translation—hopefully you’ll have the opportunity to appreciate how Kanako’s literary style evolves and to keep up with her prodigious body of work as well."
https://lithub.com/when-persistence-pays-off-on-translating-and-publishing-the-work-of-kanako-nishi/
#Metaglossia
#metaglossia_mundus
#métaglossie
"An expert in Guernésiais, Guernsey's native language, has shared concerns about the ability of artificial intelligence (AI) to translate reliably.
Teacher Yan Marquis said there was limited data on the language, no standardised spelling and cultured nuance.
All these things could mean that AI tools such as Microsoft Copilot and Chat GPT often produced inaccurate translations, he said.
He added that, with fewer people speaking the language, there was also a risk incorrect AI translations would become a more common sight.
Marquis says: "AI is fantastic. You look at some of the results for a translation to English, it's very impressive.
"But the thing about Guernésiais is there are not the resources there to analyse the language and paint a picture of it.
"People are becoming more reliant on AI and, at the same time, there are less speakers. Both those things together mean that incorrect translations will not be noticed."
More news stories for Guernsey
Listen to the latest news for Guernsey
There is a tool that the Guernsey Language Commission has that can be used for free.
Marquis says: "It isn't instant - and people often want an instant reply - but it is worth waiting.
"I am pleased that people often come to me with translations for tattoos and birthday cards.
"It's nice people want that personalised local touch, but I have seen some instances where the translation has been wrong due to AI. It's a shame it didn't get checked.
"I have not seen anything that I am happy with or I am impressed with in terms of translations as the resource just isn't there.
"It's very hard unless you know someone who speaks the language."
AI uses already existing information to provide answers and, where it does not have any, it can "hallucinate".
AI hallucinations are instances where AI models, particularly large language models (LLMs), generate false, misleading, or illogical information while presenting it confidently as fact.
These errors often stem from limitations in training data, pattern recognition errors, or lack of grounding in real-world knowledge, often blending fabricated details with accurate information."
Courtney Sargent
Guernsey
https://www.bbc.com/news/articles/cy8ly211pn3o
#Metaglossia
#metaglossia_mundus
#métaglossie
"Call for framework contract for translation services - EU4Rivers
05.03.2026 New opportunity
Text
Austrian Development Agency (ADA), Austria’s federal agency for development cooperation and humanitarian aid, has been delegated by the European Union (EU) to implement the “EU4RIVERS” project in Albania, from October 2023 to September 2028. The lead Beneficiary is the Albanian Water Resources Management Agency (AMBU).
For the EU4 Rivers project, ADA intends to award a Service Contract for:
Contract Reference No: 6532-00/2023/SUB 02.2026
Title “Framework service contract for text translation and interpretation services”/ EU4Rivers Project”
Place Tirana, Albania
For more information, please refer to Tender documents and Annexes.
For complete information and application, please refer to:
https://www.entwicklung.at/ada/calls
The deadline for submission of applications is 18 March Time 23:59.
Should you/your company be interested in this contract, please submit your offer to eu4rivers@ada.gv.at."
https://www.eeas.europa.eu/delegations/albania/call-framework-contract-translation-services-eu4rivers_en
#Metaglossia
#metaglossia_mundus
#métaglossie
"The Neo-Latin theater play "Cenodoxus" (1602) by Jakob Bidermann is now only known to some researchers in Latin and German studies. But from 1930 to 1960, the story about the battle between heavenly and hellish powers for the soul of the Parisian scholar Cenodoxus was at the height of its popularity in German-speaking countries: actors in science and culture praised the play as a Latin "Hamlet" or "Faust."
The Austrian writer Hugo von Hofmannsthal was already working on a production for the Salzburg Festival in the 1920s, to be directed by Max Reinhardt. However, Hofmannsthal was unable to complete this project before his death in 1929.
"Depending on the report, Max Reinhardt, Richard Metzl or Joseph Gregor are said to have continued the project," says Dr. Julia Jennifer Beine, Latinist and head of the interdisciplinary Junior Research Group "Sustainability in Translation" at Julius-Maximilians-Universität Würzburg (JMU).
"As I quickly realized during my research, some of the newspaper articles from the late 1920s and early 1930s about the "Cenodoxus' production for the Salzburg Festival were very contradictory. I then began to work out the narrative surrounding the production and asked for further archives," says the JMU researcher.
Beine consulted more than 20 archives for her research into the genesis of the Salzburg "Cenodoxus" production during a research stay in Vienna.
The result: There were a total of three attempts at a "Cenodoxus" production for the Salzburg Festival, first mainly by Hofmannsthal (1920–1925), then by Metzl (1930–31) and finally by Gregor (1933), with Reinhardt always directing.
"In the surviving testimonies, some of those involved claim the rediscovery of "Cenodoxus' for themselves, but conceal the involvement of others, especially Gregor," says Beine. "If someone consciously wants to stage themselves as the great discoverer of "Cenodoxus' and heir to Hofmannsthal, they put themselves alone in the limelight and leave no room for others."
The first to fall out of the narratives is the translator of the second planned production, Ljuba Metzl.
Ljuba Metzl's translation of Cenodoxus But how did Ljuba Metzl come to translate the play?
"She was the daughter of Richard Metzl, who—according to his own account—continued to work on the adaptation as Reinhardt's assistant after Hofmannsthal's death," explains Beine.
Ljuba Metzl could have received the commission through this family connection. A newspaper article calls her a "talented young philologist." Beine found enrollment sheets in the archives of the University of Vienna that prove that Ljuba Metzl studied there for three semesters from the winter semester of 1930–31.
However, around 1930 it was not so easy for Ljuba Metzl to obtain the model for her translation.
"At that time, the drama was difficult to access. The Latin text was only available in printed editions from the 17th century in certain libraries, as was a German translation by Bidermann's pupil Joachim Meichel from 1635," explains Beine. This translation was not published by the publisher Reclam until December 1930, making it generally accessible.
The Salzburg Study Library owned a copy of the Latin text. Richard Metzl, and therefore also Ljuba Metzl, received photographs of the text through their contact with Joseph Gregor.
"Back then, it was a huge effort to get access to a text," says the JMU researcher. Gregor also used the photographs later for his own adaptation of the drama, which he brought to the stage of Vienna's Burgtheater in 1933—in a version in which he had almost completely rewritten the original. It is unclear to what extent Gregor also used Ljuba Metzl's translation, as reported in a newspaper article.
A lost translation Gregor also published the influential work "Weltgeschichte des Theaters" (1933). It also mentions "Cenodoxus."
"Gregor quotes a passage of the play in German without citing its origin, which does not come from his own adaptation and also not from Meichel's translation and which is very close to the original in terms of content," says the Würzburg Latinist. The suspicion is that it is a version by Ljuba Metzl.
But, "Her translation seems to have been lost—at least I couldn't find a manuscript in the archives. It is therefore impossible to determine who wrote the passage in 'Weltgeschichte des Theaters,'" says Beine.
The JMU researcher's discovery shows how important archives are for questioning common narratives of authorship: "Joseph Gregor was a big name in theater studies for decades. His narratives still influence research literature today. Ljuba Metzl and her story, on the other hand, are virtually unknown," says the Latinist.
Biography of Ljuba Metzl Ljubow "Ljuba" Louise Ludmilla Metzl was born in Berlin on 18 June 1911. She attended the Reformrealgymnasium in Salzburg and then went on to study at the Faculty of Arts at the University of Vienna. After completing three semesters, Ljuba Metzl's name can no longer be found in the university's enrollment sheets from the summer semester of 1932.
"In a contemporary article, she is referred to as "Ljuba Metzl-Binder," which indicates that she had married in the early 1930s. It is not clear whether this was the reason for dropping out of university," explains Beine.
Richard Metzl was probably persecuted due to the anti-Semitic ideology of the National Socialists and fled Germany with his family in August 1938. He died of an unknown cause in Paris in October 1941. There is as yet no further information about the fate of his family and his daughter." University of Würzburg edited by Stephanie Baum, reviewed by Robert Egan
https://phys.org/news/2026-03-ljuba-metzl-theater-history.html #Metaglossia #metaglossia_mundus #métaglossie
"Foreign-language documents are increasingly common in U.S. litigation. Contracts, emails, text messages, medical records, corporate filings, and financial documents frequently originate in languages other than English. In cross-border disputes and immigration matters, such materials are often central to a party’s claims or defenses.
But once these materials enter U.S. courtrooms, they must satisfy strict procedural requirements. When they do not, courts may exclude the evidence entirely – sometimes at critical stages of litigation.
For legal teams handling multilingual clients or international matters, understanding how courts treat foreign-language evidence is not optional. It is essential to litigation strategy.
English Is the Language of the Court
In federal courts and most state courts, English is the language of the official record. Judges and juries must be able to evaluate all materials presented in motions, hearings, and trial.
As a result, foreign-language documents must generally be accompanied by a complete and accurate English translation before they can be considered. Submitting the original document alone is insufficient, even if both parties understand the language involved.
The Federal Rules of Evidence reinforce the importance of reliable translation. Rule 604 requires that interpreters be qualified and give an oath or affirmation to provide a true translation. Although Rule 604 addresses live interpretation, courts apply similar principles of accuracy and reliability to written translations submitted as evidence.
Federal appellate courts have reinforced this requirement. In United States v. Rivera-Rosario, 300 F.3d 1 (1st Cir. 2002), the court addressed the use of translated recordings and emphasized the importance of reliable English translations for jury consideration. While disputes may arise over competing translations, the court made clear that English translations, not the original foreign-language material, are what juries ultimately rely upon.
Why Courts Exclude Foreign-Language Evidence
Foreign-language evidence is most often rejected due to procedural deficiencies rather than substantive disputes over the underlying facts. Common reasons include:
Lack of Certification
Courts frequently require a signed certification stating that a translation is complete and accurate. Without proper certification, opposing counsel may challenge admissibility, arguing that the translation lacks sufficient reliability.
Incomplete or Selective Translation
Submitting only excerpts of a foreign-language document can raise fairness concerns. If a translation omits context or qualifying language, courts may refuse to rely on it. Selective translation can also invite credibility challenges.
Missing Translator Affidavit
In certain proceedings -particularly immigration cases -a translator’s affidavit is required. U.S. Citizenship and Immigration Services (USCIS), for example, requires certified English translations for all foreign-language documents submitted in support of applications or petitions.
Authentication Problems
Translation does not cure authentication defects. Even a flawless English translation will not be admitted if the underlying document cannot be properly authenticated under evidentiary rules.
The Litigation Consequences Can Be Significant
Improper handling of foreign-language evidence can have real strategic consequences.
Key exhibits may be excluded at summary judgment or trial. Courts may delay proceedings while compliant translations are obtained. Briefing schedules may be disrupted. Litigation costs may increase due to supplemental submissions or expert disputes. Appeals may raise procedural objections related to translation reliability. In some cases, courts permit late submission of certified translations. In others, they decline to consider non-compliant evidence altogether. The difference often depends on timing and the materiality of the document at issue.
Special Considerations in Immigration and Cross-Border Matters
Immigration proceedings are particularly strict regarding translation requirements. USCIS regulations require that any foreign-language document submitted must be accompanied by a full English translation and a certification from the translator confirming competence and accuracy.
Similarly, in international commercial disputes, translated contracts and communications may be central to breach-of-contract claims or defenses. Any ambiguity in translation can become the subject of cross-examination or competing expert testimony.
In these contexts, legal teams often rely on properly certified professional translation to ensure compliance with evidentiary standards and minimize procedural risk.
A Representative Scenario
The following fictional example reflects situations courts regularly encounter.
A U.S. distributor and a Latin American manufacturer become involved in a dispute over the termination of a Spanish-language supply agreement. During summary judgment briefing, the distributor submits internally translated excerpts of the contract prepared by bilingual staff. The translation is not certified, and only selected clauses are included.
The manufacturer challenges the accuracy of the translation and argues that qualifying provisions were omitted. The court declines to rely on the disputed excerpts and orders submission of a properly certified, complete translation before ruling on the motion.
The delay extends briefing deadlines and increases costs for both parties. More importantly, the court expresses concern about the reliability of the initially submitted materials.
This type of procedural complication is avoidable with early planning and adherence to evidentiary standards.
Practical Best Practices for Legal Teams
To reduce evidentiary challenges related to foreign-language materials, legal teams should adopt structured internal procedures.
Team discussing project; image by pressfoto, via Freepik.com. Identify Foreign-Language Documents Early
Flag non-English documents during client intake and discovery. Waiting until dispositive motions or trial preparation increases risk.
Use Qualified Professionals for Court Submissions
Machine translation tools may assist with internal review, but they are not appropriate for filing evidence. Court submissions require accuracy, completeness, and proper certification that only a qualified legal translation services company can provide.
Obtain Proper Certification and Affidavits
Ensure that translations include signed certifications and, where necessary, sworn affidavits that meet applicable court or agency requirements.
Maintain Terminology Consistency
Consistent translation of defined terms across multiple documents reduces confusion and strengthens credibility.
Coordinate Strategically
In some cases, parties may stipulate to translations to avoid disputes. Early communication can prevent unnecessary evidentiary challenges.
The Bottom Line
Foreign-language evidence is increasingly common in U.S. courts, but compliance standards remain strict. Courts expect complete, accurate, and properly certified English translations before foreign-language materials can be admitted into the record.
For legal teams, translation should be treated as part of litigation planning – not as an administrative afterthought. Early identification of foreign-language materials, use of qualified professionals, and adherence to certification requirements can prevent costly procedural setbacks and protect the integrity of the evidentiary record." https://www.legalreader.com/when-courts-reject-foreign-language-evidence-what-legal-teams-need-to-know/ #Metaglossia #metaglossia_mundus #métaglossie
"The Taiwan Academy in Los Angeles, in collaboration with the American Literary Translators Association, also known as ALTA, has announced Grace Ting as recipient of the 2026 “Literature from Taiwan” Emerging Translator Mentorship Program.
Ting will receive nine months of professional mentorship from acclaimed translator Lin King to undertake the English translation of “Island Where the Red Spider Lilies Bloom (Higanbana ga Saku Shima),” the Akutagawa Prize-winning novel by Taiwan-born, Japan-based author Li Kotomi. Originally written in Japanese, the novel made history as the first work by a Taiwanese author to win the prestigious prize, and was later translated into Chinese by the author.
Ting, a Taiwanese American scholar specializing in gender studies, has extensive expertise in Japanese literature, queer theory and feminism. Fluent in English, Chinese and Japanese, she is currently pursuing advanced studies in literature, translation and creative writing. Her journey with Li Kotomi’s work began through academic and creative intersections. Set on a fictional island, the story features three distinct invented languages that reflect the story’s gender politics. Translating such a complex linguistic landscape requires immense imagination and creativity. Li’s Chinese translation will serve as a vital reference text for the English interpretation.
Mentor Lin King, a distinguished voice in the new generation of Mandarin-to-English literary translators, received the PEN America Robert J. Dau Short Story Prize for Emerging Writers. Her translations include “The Boy from Clearwater” and “Taiwan Travelogue.” The latter won the National Book Award for Translated Literature in late 2024 and the ALTA First Translation Prize in 2025.
ALTA is dedicated to promoting literary translation. Since 2020, the Taiwan Academy has partnered with ALTA to mentor emerging translators such as Jenna Tang, a collaboration that previously facilitated the publication of the English edition of Fang Si-Chi’s “First Love Paradise.” Translators play a critical role in bringing Taiwanese literature to the global stage, as every translation is a creative reimagining across languages. The results of the mentorship will be presented at the ALTA annual conference this fall.
For information, visit la.us.taiwan.culture.tw and literarytranslators.org."
https://beverlypress.com/2026/03/taiwanese-american-scholar-selected-for-alta-emerging-translator-program/
#Metaglossia
#metaglossia_mundus
#métaglossie
"Sometimes, despite having over a million words, we can't truly express or understand the meaning of life. However, with just around 137 words, Toki Pona is known to be the world's simplest language that conveys the meaning of life well enough like other languages but with significantly fewer words and phrases, as per IFL Science.
Canadian linguist Sonja Lang invented Toki Pona in 2001. "It was my attempt to understand the meaning of life in 120 words. There are now thousands of speakers and 137 essential words," mentioned Lang on Toki Pona's official website. Lang's invention was recognized as a world language in 2022, with ISO 639-3 adopting the code "tok" for Toki Pona. According to the science-based media channel, Toki Pona drew inspiration from other languages from around the world, including Dutch, English, Finnish, Mandarin, and Cantonese, among others. Although many other conlangs already exist, Toki Pona stands out due to its most uncomplicated vocabulary.
This one-of-a-kind language caught the attention of the popular language and etymology enthusiast on YouTube, Rob Watts (@RobWords), who tells people, "where the words we use come from and why we say the things we say." At the beginning of the video, Watts questioned, "Can a language really function with just 120 words?" and proceeded to explain the minimalism of Toki Pona. He highlighted that the name Toki Pona itself means "good/simple language," and its dictionary has nearly 11000 sample translations. Watts pointed out that Toki Pona doesn't have articles like 'a,' 'an,' or 'the,' and also is free from grammatical tenses. The verbs don't have different forms like in English or many other languages.
Most importantly, one need not wonder what grammatical genders to use in Toki Pona. For instance, learning German or French gets complicated because people find it difficult to identify the difference between masculine, feminine or neutral words. In Toki Pona, there are just three pronouns - mi (I or we), sina (you or y'all) and ona (he, she, it and they) and it also applies to the possessive forms of all these pronouns. "So the biggest way in which Toki Pona saves on vocabulary is what’s remarkable about it. Toki Pona has very few words but each one of them can be translated into a multitude of things. They are broad concepts. It’s like how our word 'fruit' covers loads of different varieties of fruit and can be used to refer to any one of them," said Watts.
"All content words can serve as objects, verbs and modifiers - like adjectives - depending on where they are in a sentence. And you can just line up the modifiers to gradually refine the idea," Watts explained. In fact, Lang spoke about the nuances of using Toki Pona in her book, "The Language of Good," which, according to Watts, serves as the Bible for Toki Pona speakers.
While many praise Toki Pona's simplicity for distilling life's essence, research reveals potential cognitive trade-offs in minimalist vocabularies. A 2014 study examining the cognitive effects of small vocabulary sizes, akin to Toki Pona's minimalist lexicon, shows that preschoolers with smaller vocabularies process words less efficiently than peers with larger ones. Children aged 30-46 months with larger expressive vocabularies showed faster and more accurate identification of familiar words, better disambiguation of novel words to unfamiliar objects, and greater sensitivity to one-feature mispronunciations of known words.
Additionally, a study of Saudi high-school EFL learners (n = 108) found that students with an average vocabulary size of about 2,025 word families were able to understand up to 90% of written texts, showing a strong quantitative link between vocabulary size and text comprehension in real-world language learning.
Calling it a "quirky" language, Watts delved much more into the other aspects of Toki Pona and in every way this language seemed to have simple, straightforward solutions using just a few words. So, for those who wish to learn a new language but are worried about its grammatical complexities, Toki Pona is a boon.
This article originally appeared 1 month ago.
https://scoop.upworthy.com/worlds-simplest-language-with-less-than-150-words-is-enough-to-understand-the-meaning-of-life-ex2
#Metaglossia
#metaglossia_mundus
#métaglossie
|
" WAXAL: A large-scale open resource for African language speech technology
March 6, 2026
WAXAL provides a critical, open-access foundation for African speech technology. Featuring a large corpus of ASR and TTS data for 27 native languages under a highly permissive license, WAXAL empowers the African AI ecosystem to build robust speech systems that better reflect the region's unique linguistic diversity.
Quick links
WAXAL dataset
Paper
Share
Voice-enabled technologies like virtual assistants and automated transcription have transformed how we interact with computers. However, their benefits disproportionately favor a handful of high-resource languages. This divide has left hundreds of millions of people — particularly in Sub-Saharan Africa, home to over 2,000 distinct languages — unable to access essential technology in their native tongues. Several years ago, the team at Google Research set out to help tackle this problem.
Watch the film
To address this critical need, we introduce WAXAL: a large-scale, openly accessible speech dataset that initially covers 27 Sub-Saharan African languages spoken by over 100 million speakers across more than 26 countries. Developed through a multi-year effort beginning in 2021, in collaboration with African academic and community organizations, WAXAL provides the high-quality, permissively licensed data necessary to build robust speech systems. Setting a foundational milestone, this initial release features approximately 1,846 hours of transcribed natural speech for automatic speech recognition (ASR) and over 565 hours of high-fidelity recordings for text-to-speech (TTS). We are releasing these resources under a Creative Commons license (CC-BY-4.0) to catalyze research and enable inclusive voice-enabled technologies tailored to the unique linguistic characteristics of the continent. We intend for the WAXAL collection to continuously evolve and expand to include additional languages as part of our ongoing effort to bridge the digital divide.
Introducing WAXAL
By addressing critical data scarcity for over 100 million speakers, WAXAL aims to empower the regional AI research ecosystem. To support the development of robust speech technologies, the corpus integrates two specialized datasets designed to provide comprehensive coverage for both speech recognition and synthesis tasks.
WAXAL-ASR (Spontaneous Understanding): Comprising approximately 1,846 hours of transcribed audio, this dataset captures natural, unscripted speech. Instead of reading scripts, diverse participants were asked to describe visual stimuli covering 50+ topics in their native language. This image-prompted elicitation captured authentic linguistic variations, including tonal nuances and code-switching. This method successfully yielded more natural speech than traditional methods.
Examples from Google’s Open Images used as prompts to elicit natural speech for the ASR dataset.
WAXAL-TTS (High-Fidelity Generation): Designed to facilitate the creation of natural-sounding synthetic voices, this dataset contains over 565 hours of high-quality, phonetically balanced audio. The TTS collection process was highly collaborative: local community members worked in pairs to draft scripts of 10,000–20,000 words, alternating reader and recorder roles. To ensure professional-grade acoustics, some participants used project funding to build custom studio boxes. The resulting recordings were then segmented, matched with the script text, and reviewed for accuracy and quality.
TTS recording box at University of Ghana.
The WAXAL corpus's dual focus on unscripted ASR data and high-fidelity TTS audio is designed to enable the development of full-duplex conversational systems. Specifically, the ASR component facilitates the modeling of varied, spontaneous speech input typical of real-world scenarios, while the high-quality TTS component provides the clean reference data required for generating clear, natural output. The table below lists the 27 languages currently included in the dataset:
Breakdown of the current WAXAL dataset, showing the 27 initial Sub-Saharan African languages and the availability of ASR and TTS data for each.
Anchoring in the African AI ecosystem
Crucial to the WAXAL project was our commitment to working with, and contributing directly to, the African AI ecosystem. The data collection effort was led entirely by African academic and community organizations, guided by Google experts on world-class data collection practices. This collaborative approach ensured the corpus was built by and for the community it serves; with shared methodology each partner focused on a specific subset of languages. Our partners included Makerere University, which collected ASR and/or TTS data for nine different languages, and the University of Ghana, which focused its efforts on eight languages, using the ASR image-prompted data collection methodology outlined above. Additional key collaborators were Digital Umuganda, in partnership with Addis Ababa University, who were instrumental in leading the ASR collection for several regional languages. For the high-quality, studio-recorded voices, Media Trust, Loud n Clear and African Institute for Mathematical Sciences Senegal spearheaded the TTS recordings across various regional languages.
This framework is fundamentally rooted in the principle that our partners retain ownership of the collected data toward the shared commitment to make all datasets openly available for the broader community. This deep collaboration and open-access philosophy have already enabled notable derivative research and publications.
Through this framework, our partners have already enabled new research, such as the development of a cookbook for community-driven collection of impaired speech . This research resulted in the first open-source dataset for Akan speakers with conditions like cerebral palsy and stammering, and demonstrated that in-person, image-prompted elicitation is more effective than text-based prompts for these populations. This work provides a vital roadmap for developing inclusive speech technologies in low-resource environments.
Furthermore, the initiative supported a major study that introduced a 5,000-hour speech corpus for five Ghanaian languages — Akan, Ewe, Dagbani, Dagaare, and Ikposo. This work established infrastructure for building robust ASR and TTS systems tailored to the linguistic diversity of West Africa by using a controlled crowdsourcing approach to capture natural, spontaneous intonations.
Other essential research has focused on benchmarking four state-of-the-art models (Whisper, XLS-R, MMS, and W2v-BERT) across 13 African languages. This study analyzed how performance scales with increased training data, offering key insights into data efficiency and highlighting that scaling benefits are strongly dependent on linguistic complexity and domain alignment.
Finally, a systematic literature review was published, cataloging 74 datasets across 111 African languages to map the current frontier of speech technology. This review emphasized the urgent need for multi-domain conversational corpora and the adoption of linguistically informed metrics, such as Character Error Rate (CER), to better evaluate performance in morphologically rich and tonal language contexts.
Conclusion and future directions
WAXAL represents a key milestone in bridging the digital divide, offering a high-quality, open-access speech resource for 27 Sub-Saharan African languages. Developed through deep collaboration with African academic and community organizations, this initiative empowers the continent’s AI ecosystem and preserves linguistic diversity. We hope WAXAL will continue to serve as a vital resource for the digital preservation of African languages and a foundation for future innovations. Google remains committed to this effort, with plans to continuously expand the WAXAL dataset."
Tavonga Siyavora, Senior Product Manager, and Abdoulaye Diack, Program Manager, Google Research
https://research.google/blog/waxal-a-large-scale-open-resource-for-african-language-speech-technology/
#Metaglossia
#metaglossia_mundus
#métaglossie