[2405.09017] A Japanese-Chinese Parallel Corpus...

Your new post is loading...

Scooped by Charles Tiayon

May 18, 12:32 AM

Scoop.it!

From arxiv.org - May 16, 12:31 AM

Charles Tiayon's insight:

"Using crowdsourcing, we collected more than 10,000 URL pairs (parallel top page pairs) of bilingual websites that contain parallel documents and created a Japanese-Chinese parallel corpus of 4.6M sentence pairs from these websites. We used a Japanese-Chinese bilingual dictionary of 160K word pairs for document and sentence alignment. We then used high-quality 1.2M Japanese-Chinese sentence pairs to train a parallel corpus filter based on statistical language models and word translation probabilities. We compared the translation accuracy of the model trained on these 4.6M sentence pairs with that of the model trained on Japanese-Chinese sentence pairs from CCMatrix (12.4M), a parallel corpus from global web mining. Although our corpus is only one-third the size of CCMatrix, we found that the accuracy of the two models was comparable and confirmed that it is feasible to use crowdsourcing for web mining of parallel data..."

#metaglossia_mundus

No comment yet.

Language careers | Department for General Assembly and Conference Management

United Nations language staff come from all over the globe and make up a uniquely diverse and multilingual community. What unites them is the pursuit of excellence in their respective areas, the excitement of being at the forefront of international affairs and the desire to contribute to the realization of the purposes of the United Nations, as outlined in the Charter, by facilitating communication and decision-making.

United Nations language staff in numbers

The United Nations is one of the world's largest employers of language professionals. Several hundred such staff work for the Department for General Assembly and Conference Management in New York, Geneva, Vienna and Nairobi, or at the United Nations regional commissions in Addis Ababa, Bangkok, Beirut, Geneva and Santiago. Learn more at Meet our language staff.

What do we mean by “language professionals”?

At the United Nations, the term “language professional” covers a wide range of specialists, such as interpreters, translators, editors, verbatim reporters, terminologists, reference assistants and copy preparers/proofreaders/production editors. Learn more at Careers.

What do we mean by “main language”?

At the United Nations, “main language” generally refers to the language of an individual's higher education. For linguists outside the Organization, on the other hand, “main language” is usually taken to mean the “target language” into which an individual works.

How are language professionals recruited?

The main recruitment path for United Nations language professionals is through competitive examinations for language positions, whereby successful examinees are placed on rosters for recruitment and are hired as and when job vacancies arise. Language professionals from all regions, who meet the eligibility requirements, are encouraged to apply. Candidates are judged solely on their academic and other qualifications and on their performance in the examination. Nationality/citizenship is not a consideration. Learn more at Recruitment.

What kind of background do United Nations language professionals need?

Our recruits do not all have a background in languages. Some have a background in other fields, including journalism, law, economics and even engineering or medicine. These are of great benefit to the United Nations, which deals with a large variety of subjects.

Why does the Department have an outreach programme?

Finding the right profile of candidate for United Nations language positions is challenging, especially for certain language combinations. The United Nations is not the only international organization looking for skilled language professionals, and it deals with a wide variety of subjects, often politically sensitive. Its language staff must meet high quality and productivity standards. This is why the Department has had an outreach programme focusing on collaboration with universities since 2007. The Department hopes to build on existing partnerships, forge new partnerships, and attract the qualified staff it needs to continue providing high-quality conference services at the United Nations. Learn more at Outreach.

#metaglossia_mundus

From www.un.org - November 15, 2023 9:45 PM

Scoop.it!

‘Piffle’ and other words to help you survive the general election

The key drivers

The language barrier

A way around visa requirements

Easy targets

Irene Solaiman on tackling racial bias, language equity, consent, and why there are so many AI girlfriends.

What the AI boom is getting wrong (and right), according to Hugging Face’s head of global policy

I want to start off with the major AI scandal of the month, which is the lawsuit between Scarlett Johansson and OpenAI. It seems like an alarming case for AI policy professionals. Is there anything that surprised you about the case?

You’ve said before that you actually had your voice cloned without your consent, as part of a product demo that backfired. Do you think there’s a bigger problem with how the tech world thinks about consent?

The Gemini case was focused on image, but I imagine it’s even more complicated in text, which is where even more AI development is happening.

There’s also the question of low-resource languages. We’ve seen models really struggle with basic operations in languages like Bengali and Tamil, simply because there isn’t enough online text to train on. How do you think about that kind of bias?

A Multilingual AI Platform

Localized Solutions For Global Impact

Achievements And Future Plans

Numerous runners-up

N.L.-based authors get recognition