Exploring Translators’ Perceptions of Tra...

Your new post is loading...

Scooped by Charles Tiayon

May 20, 1:47 AM

Scoop.it!

From www.researchgate.net - May 20, 1:47 AM

Charles Tiayon's insight:

Exploring Translators’ Perceptions of Translation Decisions in Courtroom Translation

"Courtroom translation plays a pivotal role in ensuring justice and fairness in multilingual legal settings. Translators in this context face unique challenges, and their translation choices can significantly impact the understanding of legal rights, and the credibility of testimonies. Previous research has investigated translation decisions but has often ignored important aspects of translators’ perceptions regarding these decisions in the translatorial action. This study presents an analysis of translators’ perceptions on factors influencing translation decisions in the translation of trail records of the International Military Tribunal for the Far East (IMTFE) from English to Chinese. The discussion draws on semi-structured interviews which are thematically coded and qualitatively examined. The study shows that translators prioritized following the translation instructions provided by project initiators, which included ensuring accuracy and fidelity to the source text, maintaining the original meaning and stylistic..."

https://www.researchgate.net/publication/391108718_Exploring_Translators%27_Perceptions_of_Translation_Decisions_in_Courtroom_Translation

#metaglossia_mundus

No comment yet.

As AI giants duel, the Global South builds its own brainpower - Features

Researchers across Africa, Asia and the Middle East are building their own language models designed for local tongues, cultural nuance and digital independence

"In a high-stakes artificial intelligence race between the United States and China, an equally transformative movement is taking shape elsewhere. From Cape Town to Bangalore, from Cairo to Riyadh, researchers, engineers and public institutions are building homegrown AI systems, models that speak not just in local languages, but with regional insight and cultural depth.

The dominant narrative in AI, particularly since the early 2020s, has focused on a handful of US-based companies like OpenAI with GPT, Google with Gemini, Meta’s LLaMa, Anthropic’s Claude. They vie to build ever larger and more capable models. Earlier in 2025, China’s DeepSeek, a Hangzhou-based startup, added a new twist by releasing large language models (LLMs) that rival their American counterparts, with a smaller computational demand. But increasingly, researchers across the Global South are challenging the notion that technological leadership in AI is the exclusive domain of these two superpowers.

Instead, scientists and institutions in countries like India, South Africa, Egypt and Saudi Arabia are rethinking the very premise of generative AI. Their focus is not on scaling up, but on scaling right, building models that work for local users, in their languages, and within their social and economic realities.

“How do we make sure that the entire planet benefits from AI?” asks Benjamin Rosman, a professor at the University of the Witwatersrand and a lead developer of InkubaLM, a generative model trained on five African languages. “I want more and more voices to be in the conversation”.

Beyond English, beyond Silicon Valley

Large language models work by training on massive troves of online text. While the latest versions of GPT, Gemini or LLaMa boast multilingual capabilities, the overwhelming presence of English-language material and Western cultural contexts in these datasets skews their outputs. For speakers of Hindi, Arabic, Swahili, Xhosa and countless other languages, that means AI systems may not only stumble over grammar and syntax, they can also miss the point entirely.

“In Indian languages, large models trained on English data just don’t perform well,” says Janki Nawale, a linguist at AI4Bharat, a lab at the Indian Institute of Technology Madras. “There are cultural nuances, dialectal variations, and even non-standard scripts that make translation and understanding difficult.” Nawale’s team builds supervised datasets and evaluation benchmarks for what specialists call “low resource” languages, those that lack robust digital corpora for machine learning.

It’s not just a question of grammar or vocabulary. “The meaning often lies in the implication,” says Vukosi Marivate, a professor of computer science at the University of Pretoria, in South Africa. “In isiXhosa, the words are one thing but what’s being implied is what really matters.” Marivate co-leads Masakhane NLP, a pan-African collective of AI researchers that recently developed AFROBENCH, a rigorous benchmark for evaluating how well large language models perform on 64 African languages across 15 tasks. The results, published in a preprint in March, revealed major gaps in performance between English and nearly all African languages, especially with open-source models.

Similar concerns arise in the Arabic-speaking world. “If English dominates the training process, the answers will be filtered through a Western lens rather than an Arab one,” says Mekki Habib, a robotics professor at the American University in Cairo. A 2024 preprint from the Tunisian AI firm Clusterlab finds that many multilingual models fail to capture Arabic’s syntactic complexity or cultural frames of reference, particularly in dialect-rich contexts.

Governments step in

For many countries in the Global South, the stakes are geopolitical as well as linguistic. Dependence on Western or Chinese AI infrastructure could mean diminished sovereignty over information, technology, and even national narratives. In response, governments are pouring resources into creating their own models.

Saudi Arabia’s national AI authority, SDAIA, has built ‘ALLaM,’ an Arabic-first model based on Meta’s LLaMa-2, enriched with more than 540 billion Arabic tokens. The United Arab Emirates has backed several initiatives, including ‘Jais,’ an open-source Arabic-English model built by MBZUAI in collaboration with US chipmaker Cerebras Systems and the Abu Dhabi firm Inception. Another UAE-backed project, Noor, focuses on educational and Islamic applications.

In Qatar, researchers at Hamad Bin Khalifa University, and the Qatar Computing Research Institute, have developed the Fanar platform and its LLMs Fanar Star and Fanar Prime. Trained on a trillion tokens of Arabic, English, and code, Fanar’s tokenization approach is specifically engineered to reflect Arabic’s rich morphology and syntax.

India has emerged as a major hub for AI localization. In 2024, the government launched BharatGen, a public-private initiative funded with 235 crore (€26 million) initiative aimed at building foundation models attuned to India’s vast linguistic and cultural diversity. The project is led by the Indian Institute of Technology in Bombay and also involves its sister organizations in Hyderabad, Mandi, Kanpur, Indore, and Madras. The programme’s first product, e-vikrAI, can generate product descriptions and pricing suggestions from images in various Indic languages. Startups like Ola-backed Krutrim and CoRover’s BharatGPT have jumped in, while Google’s Indian lab unveiled MuRIL, a language model trained exclusively on Indian languages. The Indian governments’ AI Mission has received more than180 proposals from local researchers and startups to build national-scale AI infrastructure and large language models, and the Bengaluru-based company, AI Sarvam, has been selected to build India’s first ‘sovereign’ LLM, expected to be fluent in various Indian languages.

In Africa, much of the energy comes from the ground up. Masakhane NLP and Deep Learning Indaba, a pan-African academic movement, have created a decentralized research culture across the continent. One notable offshoot, Johannesburg-based Lelapa AI, launched InkubaLM in September 2024. It’s a ‘small language model’ (SLM) focused on five African languages with broad reach: Swahili, Hausa, Yoruba, isiZulu and isiXhosa.

“With only 0.4 billion parameters, it performs comparably to much larger models,” says Rosman. The model’s compact size and efficiency are designed to meet Africa’s infrastructure constraints while serving real-world applications. Another African model is UlizaLlama, a 7-billion parameter model developed by the Kenyan foundation Jacaranda Health, to support new and expectant mothers with AI-driven support in Swahili, Hausa, Yoruba, Xhosa, and Zulu.

India’s research scene is similarly vibrant. The AI4Bharat laboratory at IIT Madras has just released IndicTrans2, that supports translation across all 22 scheduled Indian languages. Sarvam AI, another startup, released its first LLM last year to support 10 major Indian languages. And KissanAI, co-founded by Pratik Desai, develops generative AI tools to deliver agricultural advice to farmers in their native languages.

The data dilemma

Yet building LLMs for underrepresented languages poses enormous challenges. Chief among them is data scarcity. “Even Hindi datasets are tiny compared to English,” says Tapas Kumar Mishra, a professor at the National Institute of Technology, Rourkela in eastern India. “So, training models from scratch is unlikely to match English-based models in performance.”

Rosman agrees. “The big-data paradigm doesn’t work for African languages. We simply don’t have the volume.” His team is pioneering alternative approaches like the Esethu Framework, a protocol for ethically collecting speech datasets from native speakers and redistributing revenue back to further development of AI tools for under-resourced languages. The project’s pilot used read speech from isiXhosa speakers, complete with metadata, to build voice-based applications.

In Arab nations, similar work is underway. Clusterlab’s 101 Billion Arabic Words Dataset is the largest of its kind, meticulously extracted and cleaned from the web to support Arabic-first model training.

The cost of staying local

But for all the innovation, practical obstacles remain. “The return on investment is low,” says KissanAI’s Desai. “The market for regional language models is big, but those with purchasing power still work in English.” And while Western tech companies attract the best minds globally, including many Indian and African scientists, researchers at home often face limited funding, patchy computing infrastructure, and unclear legal frameworks around data and privacy.

“There’s still a lack of sustainable funding, a shortage of specialists, and insufficient integration with educational or public systems,” warns Habib, the Cairo-based professor. “All of this has to change.”

A different vision for AI

Despite the hurdles, what’s emerging is a distinct vision for AI in the Global South – one that favours practical impact over prestige, and community ownership over corporate secrecy.

“There’s more emphasis here on solving real problems for real people,” says Nawale of AI4Bharat. Rather than chasing benchmark scores, researchers are aiming for relevance: tools for farmers, students, and small business owners.

And openness matters. “Some companies claim to be open-source, but they only release the model weights, not the data,” Marivate says. “With InkubaLM, we release both. We want others to build on what we’ve done, to do it better.”

In a global contest often measured in teraflops and tokens, these efforts may seem modest. But for the billions who speak the world’s less-resourced languages, they represent a future in which AI doesn’t just speak to them, but with them."

Sibusiso Biyela, Amr Rageh and Shakoor Rather

20 May 2025

https://www.natureasia.com/en/nmiddleeast/article/10.1038/nmiddleeast.2025.65

#metaglossia_mundus

From www.natureasia.com - May 20, 2025 10:49 PM

Scoop.it!

Reactions (3)

No comment yet.

AWA, l’IA qui parle votre langue : quand la tech donne une voix aux oubliés du numérique

AWA, l’IA qui parle votre langue : quand la tech donne une voix aux oubliés du numérique

"...Dans un monde où l’intelligence artificielle redéfinit les usages, Alioune Badara Mbengue, jeune entrepreneur et fondateur de la startup Andakia, a choisi de relever un défi audacieux : faire parler les machines dans les langues africaines. En donnant naissance à AWA, une interface vocale intelligente capable de comprendre et de répondre en wolof – et bientôt en pulaar et haoussa –, il ne crée pas seulement une technologie innovante, il initie une révolution culturelle et sociale...

Quelle a été la motivation derrière le projet AWA ?

En 2015, nous avons conçu Mbal-it, une poubelle intelligente s’exprimant en langues nationales pour sensibiliser au tri sélectif. Cette expérience nous a révélé le fossé technologique concernant les interfaces vocales en langues africaines...

Quels sont les défis que vous avez rencontrés au niveau technique et linguistique ?

AWA, c’est un écosystème de plusieurs modules : reconnaissance vocale, LLM, voix de synthèse, etc. Chacun a ses propres défis. Par exemple, les données audio annotées disponibles en wolof sur internet ne dépassent même pas 100h au départ, alors qu’il en faut des milliers, par exemple Whisper, le système de reconnaissance vocale développé par OpenAI, a été entraîné sur un ensemble de données massif de 680 000 heures d’audio multilingue et multi tâche collecté sur le web. Il a donc fallu collecter, nettoyer et annoter des centaines d’heures supplémentaires. Pour le LLM, on a dû construire un corpus textuel solide, et pour la synthèse vocale, enregistrer plusieurs centaines d’heures de qualité en studio.

Au-delà des données, la langue elle-même pose des défis : différence entre wolof parlé et écrit, mots wolofisés, absence de norme claire… On travaille avec des linguistes, et même aujourd’hui certaines formulations font débat. Enfin, il y a la question de la puissance de calcul et de l’ingénierie nécessaire pour faire tourner tous ces modules ensemble...

Concrètement, quels sont les cas d’usage d’AWA et ses bénéfices pour les populations ?

AWA se positionne comme une interface vocale entre les populations non lettrées et la technologie. Ce que l’IA ou le numérique offrent aujourd’hui à ceux qui maîtrisent le français, l’anglais ou le digital, AWA peut le rendre accessible à ceux qui ne savent ni lire ni écrire dans ces langues...

Mais l’enjeu dépasse les services ponctuels : imaginons que l’État digitalise l’accès aux services administratifs (extrait de naissance, demande de papiers, etc.). Sans interface inclusive, des millions de citoyens resteraient exclus car incapables d’utiliser un formulaire en ligne. AWA lève cette barrière...

Notre objectif est clair : couvrir un maximum de locuteurs en Afrique pour que l’IA ne soit pas un privilège réservé aux élites francophones ou anglophones, mais un outil accessible à tous..."

01/07/2025

https://www.socialnetlink.org/2025/07/01/awa-lia-qui-parle-votre-langue-quand-la-tech-donne-une-voix-aux-oublies-du-numerique/

#metaglossia_mundus

From www.socialnetlink.org - July 2, 2025 10:59 PM

Scoop.it!