Language Tech Market News
57.4K views | +6 today
Follow
Language Tech Market News
The Home of Language Intelligence
Curated by LT-Innovate
Your new post is loading...
Your new post is loading...
Scooped by LT-Innovate
Scoop.it!

@TAUS Launches Matching Data Service

@TAUS Launches Matching Data Service | Language Tech Market News | Scoop.it

TAUS has launched Matching Data: a new technique of selecting language data for the training and tuning of machine translation (MT) engines. This new approach is a perfect fit for the new generation of Neural MT, which is much more sensitive to the quality of the training data. Matching Data empowers MT developers as well as Language Service Providers to efficiently compile customized corpora for building their own domain-specific translation solutions based on an example data set.

more...
No comment yet.
Scooped by LT-Innovate
Scoop.it!

Crowdsourcing Natural Language/Speech for Business #ML

Crowdsourcing Natural Language/Speech for Business #ML | Language Tech Market News | Scoop.it

The company's "crowd" might train effective NLP models on the following:
Accents and Dialects
Situations and Environments
Domain Language
Sentiment
Crowdsourcing may also help businesses looking to build their own AI models label the data on which they intend to train those models.

more...
Linda's curator insight, November 29, 2018 12:35 PM
Share your insight
Scooped by LT-Innovate
Scoop.it!

@Flitto Language Data to Feed #MT Systems 

@Flitto Language Data to Feed #MT Systems  | Language Tech Market News | Scoop.it

The Korean online translation service, Flitto, is focused on providing other companies with the language data they need to train their machine translation programs. Headquartered in Seoul, Flitto launched in 2012 as a translation crowdsourcing platform. It still provides translation services, ranging from a mobile app to professional translators, for about 7.5 million users. About 80% of its revenue, however, now comes from the sale of language data, called “corpus,” to customers such as Baidu, Microsoft, Tencent, NTT DoCoMo and the South Korean government’s Electronics and Telecommunications Research Institute.

 

 

 

LT-Innovate's insight:

Relevant, clean language data is crucial for successful MT. In Asia, companies are doing the collection/cleaning/distribution work. In Europe, we have TAUS for business and the EC for public sector data. All ready for a de-siloisation now massive ML is here?

more...
No comment yet.
Scooped by LT-Innovate
Scoop.it!

Flitoo Exports Its Translation Data to Baidu

Flitoo Exports Its Translation Data to Baidu | Language Tech Market News | Scoop.it

Flitto, a startup company in South Korea, is exporting translation data for Chinese and Japanese to Baidu, the biggest search portal in China. These data will be used to enhance ‘Baidu translate’, which is a translation service for Chinese. Flitto is planning to expand its global partnership based on its platform that covers both AI (Artificial Intelligence) translation and human translation.

LT-Innovate's insight:

The constraints on accessing translation data suggests there should be a new effort at creating a proper market for such data. Meanwhile, deals like this will occasionally occur. 

more...
No comment yet.
Scooped by LT-Innovate
Scoop.it!

1.5 TB Dataset of Anonymized User Interactions Released by Yahoo

The Yahoo News Feed dataset is a collection based on a sample of anonymized user interactions on the news feeds of several Yahoo properties, including the Yahoo homepage, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Movies, and Yahoo Real Estate. The dataset stands at a massive ~110B lines (1.5TB bzipped) of user-news item interaction data, collected by recording the user- news item interaction of about 20M users from February 2015 to May 2015. In addition to the interaction data, we are providing the demographic information (age segment and gender) and the city in which the user is based for a subset of the anonymized users.
more...
No comment yet.
Scooped by LT-Innovate
Scoop.it!

@Flitto Publishes Sales Figures on Language Data for Translation

@Flitto Publishes Sales Figures on Language Data for Translation | Language Tech Market News | Scoop.it

S. Korea's Flitto has collected language data through the collective intelligence of people online. "We have made efforts to resolve problems related to the collective intelligence method," he said. "We ask people to repeatedly correct errors to find the answers eventually. Our quality control system works satisfactorily."
This has enabled the startup to sharpen its competitiveness and offer high-quality language data at one-10th the price of other companies, he said.

more...
No comment yet.
Scooped by LT-Innovate
Scoop.it!

Translation Company Offers to Partner With Facebook to Combat Hate Speech

Translation Company Offers to Partner With Facebook to Combat Hate Speech | Language Tech Market News | Scoop.it

Day Translations Inc., a US translation company, has offered to partner with Facebook to help address the challenges of identifying "fake news" and "hate speech" on the social media platform. By the end of 2018, Facebook plans to assemble an international team of 20,000 native speakers of different languages in order to accurately detect hate speech.

LT-Innovate's insight:

How many hate monitors per language? How many languages? Once the data is compiled, how soon before the service could be replaced by a machine. 

more...
No comment yet.
Scooped by LT-Innovate
Scoop.it!

Lack of Speech Data Delays Samsung's English Version of 'Bixby' #VA

Lack of Speech Data Delays Samsung's English Version of 'Bixby' #VA | Language Tech Market News | Scoop.it

The English version of Samsung Electronics’ voice-assistant service Bixby has been delayed because the firm lacks the accumulation of big data, Bixby is now available only in Korean, although Samsung’s mobile chief, Koh Dong-jin, said in April, “Bixby’s English version and Chinese version will be unveiled in May and in June, respectively.” Developing Bixby in other languages is taking more time than we expected mainly because of the lack of the accumulation of big data,” a Samsung spokesperson told The Korea Herald.

LT-Innovate's insight:

Ironical that English data is the problem. Most people think English is well-supplied. This article mentions general language communication problems affecting Samsung's US launch.

more...
No comment yet.
Scooped by LT-Innovate
Scoop.it!

Multilingual Edit-a-thon to Improve Indian Content on Wikipedia

75 volunteer Wikipedia editors representing 18 Indic languages  edited and/or translated Wikipedia content related to India geographical indications in their own languages. This multilingual edit-a-thon ran from January 25 to January 31. The event is focused on improving existing Wikipedia content related to GIs of India, translating them into Indic languages and also creating new articles.

more...
No comment yet.
Scooped by LT-Innovate
Scoop.it!

India to Pool Language Data for Phone Translation

The Department of Electronics & Information Technology will notify a policy to develop local language machine-based translation and other technologies through common pooling of Indian language resources. As many as 100 central government websites will provide local language content through machine translation. 

more...
No comment yet.