language technology and business internationalization
6.7K views | +0 today
Follow
language technology and business internationalization
The language technology as a very helpful tool for business internationalization
Your new post is loading...
Your new post is loading...
Rescooped by Andoni Sagarna Izagirre from Translation Memory
Scoop.it!

2018 Machine Translation Market Size by Key Companies, Trend, Size, Growth USD 1483 million and Forecasts 2025

2018 Machine Translation Market Size by Key Companies, Trend, Size, Growth USD 1483 million and Forecasts 2025 | language technology and business internationalization | Scoop.it
Research Reports Inc Added New Report on Global Machine Translation Market valued approximately USD 435 million in 2016 is anticipated to grow with a healthy growth rate of more than 14.60 % over the forecast period 2017-2025.

Via Sergey Rybkin
more...
No comment yet.
Rescooped by Andoni Sagarna Izagirre from Language Tech Market News
Scoop.it!

How Research Should Improve Current #NLP’s Generalization Problem

How Research Should Improve Current #NLP’s Generalization Problem | language technology and business internationalization | Scoop.it
We should use more inductive biases, but we have to work out what are the most suitable ways to integrate them into neural architectures such that they really lead to expected improvements. We have to enhance pattern-matching state-of-the-art models with some notion of human-like common sense that will enable them to capture the higher-order relationships among facts, entities, events or activities. But mining common sense is challenging, so we need new, creative ways of extracting common sense. Finally, we should deal with unseen distributions and unseen tasks, otherwise “any expressive model with enough data will do the job.” Obviously, training such models is harder and results will not immediately be impressive. As researchers we have to be bold with developing such models, and as reviewers we should not penalize work that tries to do so.
Via LT-Innovate
more...
No comment yet.
Rescooped by Andoni Sagarna Izagirre from Language Tech Market News
Scoop.it!

Battling Population, Gender and Accent Bias In Speech Recognition Tech

Battling Population, Gender and Accent Bias In Speech Recognition Tech | language technology and business internationalization | Scoop.it

"We see three different biases that are developing in in the voice ecosystem — they are underrepresented demographics. So these are people in areas where the market is not powerful enough for bigger companies to reach. There is also gender biases and there's accent biases. So for instance, if you are a fluent, native English speaker in, say, Ireland, it's going to be a lot harder to use Siri than it is if you are a North American male," said Michael Henretty, Mozilla's Common Voice project lead. 


Via LT-Innovate
more...
No comment yet.
Rescooped by Andoni Sagarna Izagirre from Language Tech Market News
Scoop.it!

Amazon Patents a Real-time Speech Accent "Translator"

Amazon Patents a Real-time Speech Accent "Translator" | language technology and business internationalization | Scoop.it

Amazon has applied for a patent for an audio system that detects the accent of a speaker and changes it to the accent of the listener, perhaps helping eliminate communication barriers in many situations and industries. The patent doesn’t mean the company has made it (or necessarily that it will be granted), but there’s also no technical reason why it can’t do so.


Via LT-Innovate
more...
No comment yet.
Scooped by Andoni Sagarna Izagirre
Scoop.it!

[1807.07187] Efficient Training on Very Large Corpora via Gramian Estimation

Download: PDF Other formats (license) Current browse context: stat.ML < prev | next > new | recent | 1807 Change to browse by: cs cs.CL cs.LG stat References & Citations NASA ADS Statistics > Machine Learning Efficient Training on Very Large Corpora via Gramian Estimation Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Li Zhang, Xinyang Yi, Lichan Hong, Ed Chi, John Anderson (Submitted on 18 Jul 2018) We study the problem of learning similarity functions over very large corpora using neural network embedding models. These models are typically trained using SGD with sampling of random observed and unobserved pairs, with a number of samples that grows quadratically with the corpus size, making it expensive to scale to very large corpora. We propose new efficient methods to train these models without having to sample unobserved pairs. Inspired by matrix factorization, our approach relies on adding a global quadratic penalty to all pairs of examples and expressing this term as the matrix-inner-product of two generalized Gramians. We show that the gradient of this term can be efficiently computed by maintaining estimates of the Gramians, and develop variance reduction schemes to improve the quality of the estimates. We conduct large-scale experiments that show a significant improvement in training time and generalization quality compared to traditional sampling methods. Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Machine Learning (cs.LG) Cite as: arXiv:1807.07187 [stat.ML]   (or arXiv:1807.07187v1 [stat.ML] for this version) Submission history From: Walid Krichene [view email] [v1] Wed, 18 Jul 2018 23:45:33 GMT (733kb,D) Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
more...
No comment yet.
Scooped by Andoni Sagarna Izagirre
Scoop.it!

Bots News | House of Bots

Bots News | House of Bots | language technology and business internationalization | Scoop.it
Get the latest news on Bots Startup, Funding, Bots Developer from House of Bots. HOB is one of India's leading Bots Store and Bots News.
more...
No comment yet.
Rescooped by Andoni Sagarna Izagirre from Language Tech Market News
Scoop.it!

Baidu Announces ClariNet, an All-in-One #NN for #TTS 

Baidu Announces ClariNet, an All-in-One #NN for #TTS  | language technology and business internationalization | Scoop.it

Baidu’s ClariNet consists of four components:
Encoder, which encodes textual features into an internal hidden representation.
Decoder, which decodes the encoder representation into the log-mel spectrogram in an autoregressive manner.
Bridge-net: An intermediate processing block, which processes the hidden representation from the decoder and predicts log-linear spectrogram. It also upsamples the hidden representation from frame-level to sample-level.
Vocoder: A Gaussian autoregressive WaveNet to synthesize the waveform. It is conditioned on the upsampled hidden representation from the bridge-net.


Via LT-Innovate
more...
No comment yet.
Scooped by Andoni Sagarna Izagirre
Scoop.it!

[1802.00400] A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Download: PDF Other formats (license) Current browse context: cs.IR < prev | next > new | recent | 1802 Change to browse by: cs References & Citations NASA ADS DBLP - CS Bibliography listing | bibtex Computer Science > Information Retrieval A Comparison of Word Embeddings for the Biomedical Natural Language Processing Yanshan Wang, Sijia Liu, Naveed Afzal, Majid Rastegar-Mojarad, Liwei Wang, Feichen Shen, Paul Kingsbury, Hongfang Liu (Submitted on 1 Feb 2018 (v1), last revised 18 Jul 2018 (this version, v3)) Word embeddings have been widely used in biomedical Natural Language Processing (NLP) applications as they provide vector representations of words capturing the semantic properties of words and the linguistic relationship between words. Many biomedical applications use different textual resources (e.g., Wikipedia and biomedical articles) to train word embeddings and apply these word embeddings to downstream biomedical applications. However, there has been little work on evaluating the word embeddings trained from these resources.In this study, we provide an empirical evaluation of word embeddings trained from four different resources, namely clinical notes, biomedical publications, Wikipedia, and news. We performed the evaluation qualitatively and quantitatively. For the qualitative evaluation, we manually inspected five most similar medical words to a given set of target medical words, and then analyzed word embeddings through the visualization of those word embeddings. For the quantitative evaluation, we conducted both intrinsic and extrinsic evaluation. Based on the evaluation results, we can draw the following conclusions. First, the word embeddings trained on clinical notes and biomedical publications can capture the semantics of medical terms better, and find more relevant similar medical terms, and are closer to human experts' judgments, compared to these trained on Wikipedia and news. Second, there does not exist a consistent global ranking of word embedding quality for downstream biomedical NLP applications. However, adding word embeddings as extra features will improve results on most downstream tasks. Finally, the word embeddings trained on biomedical domain corpora do not necessarily have better performance than those trained on other general domain corpora for any downstream biomedical NLP tasks. Subjects: Information Retrieval (cs.IR) Cite as: arXiv:1802.00400 [cs.IR]   (or arXiv:1802.00400v3 [cs.IR] for this version) Submission history From: Yanshan Wang [view email] [v1] Thu, 1 Feb 2018 17:20:39 GMT (401kb,D) [v2] Fri, 9 Feb 2018 16:47:07 GMT (401kb,D) [v3] Wed, 18 Jul 2018 18:37:43 GMT (478kb,D) Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
more...
No comment yet.
Scooped by Andoni Sagarna Izagirre
Scoop.it!

Language Resources Used in Multi-Lingual Question Answering Systems - E-LIS repository

Language Resources Used in Multi-Lingual Question Answering Systems - E-LIS repository | language technology and business internationalization | Scoop.it
Olvera-Lobo, María-Dolores and Gutierrez-Artacho, Juncal Language Resources Used in Multi-Lingual Question Answering Systems. Online Information Review, 2011, vol. 35, n. 4, pp. 543-557. [Journal article (Paginated)] Text Language resources used in multi-lingual question-answering systems.pdf Download (460kB) | Preview English abstract Purpose – In the field of information retrieval, some multi-lingual tools are being created to help the users to overcome the language barriers. Nevertheless, these tools are not developed completely and it is necessary to investigate more for their improvement and application. One of their main problems is the choice of the linguistic resources to offer better coverage and to solve the translation problems in the context of the multi-lingual information retrieval. This paper aims to address this issue. Design/methodology/approach – This research is focused on the analysis of resources used by the multi-lingual question-answering systems, which respond to users' queries with short answers, rather than just offering a list of documents related to the search. An analysis of the main publications about the multi-lingual QA systems was carried out, with the aim of identifying the typology, the advantages and disadvantages, and the real use and trend of each of the linguistic resources and tools used in this new kind of system. Findings – Five of the resources most used in the cross-languages QA systems were identified and studied: databases, dictionaries, corpora, ontologies and thesauri. The three most popular traditional resources (automatic translators, dictionaries, and corpora) are gradually leaving a widening gap for others – such as ontologies and the free encyclopaedia Wikipedia. Originality/value – The perspective offered by the translation discipline can improve the effectiveness of QA systems Item type: Journal article (Paginated) Keywords: Cross-lingual question-answering system, Translation, Linguistic resources, Evaluation Subjects: L. Information technology and library technology Depositing user: Maria Dolores/ M.D. Olvera Lobo Date deposited: 23 Jul 2018 14:18 Last modified: 23 Jul 2018 14:18 URI: http://hdl.handle.net/10760/32919 References "SEEK" links will first look for possible matches inside E-LIS and query Google Scholar if no results are found. Aceves Pe´rez, R.M. (2008), “Bu´squeda de Respuestas en Fuentes Documentales Multilingu¨ es”, Dissertation, Instituto Nacional de Astrof´ısica, O´ ptica y Electro´nica, Tonantzintla. Blair-Goldensohn, S., McKeown, K. and Schlaikjer, A.H. (2004), “Answering definitional questions: a hybrid approach”, New Directions in Question Answering, Vol. 4, pp. 47-58. Bos, J. and Nissim, M. (2006), in Peters, C. (Ed.), “Cross-lingual question answering by answer translation”, Working Notes for the Cross Language Evaluation Forum (CLEF) 2006, Alicante, 20-22 September. Crouch, D., Saur´ı, R. and Fowler, A. (2005) “AQUAINT pilot knowledge-based evaluation: annotation guidelines”, Palo Alto Research Center, available at: www2.parc.com/isl/ groups/nltt/papers/aquaint_kb_pilot_evaluation_guide.pdf Cui, H., Kan, M.Y., Chua, T.S. and Xiao, J. (2004), “A comparative study on sentence retrieval for definitional question answering”, paper presented at the SIGIR Workshop on Information Retrieval for Question Answering (IR4QA), Sheffield. Diekema, A.R. (2003), “Translation events in cross-language information retrieval: lexical ambiguity, lexical holes, vocabulary mismatch, and correct translations”, dissertation, University of Syracuse, Syracuse, NY. Forner, P., Giampiccolo, D., Magnini, B., Pen˜as, A., Rodrigo, A. and Sutcliffe, R. (2010), “Evaluating multilingual question answering systems at CLEF”, Proceedings of the LREC 2010, Malta, 22-23 May. Frank, A., Kirefer, H.U., Xu, F., Uszkoreit, H., Crysmann, B., Jo¨rg, B. and Sha¨fer, U. (2006), “Question answering from structured knowledge sources”, Journal of Applied Logic, special issue on Questions and Answers: Theoretical and Applied Perspectives, Vol. 5, pp. 20-48. Grefenstette, G. (1998), Cross-Language Information Retrieval, Vol. 1, Kluwer Academic Publishers, London. Jackson, P. and Schilder, F. (2005), “Natural language processing: overview”, in Brown, E. (Ed.), Encyclopedia of Language & Linguistics, Vol. 2, Elsevier Press, Amsterdam, pp. 503-18. Lee, M., Cimino, J., Zhu, H.R., Sable, C., Shanker, V., Ely, J. and Yu, H. (2006), “Beyond information retrieval – medical question answering”, AMIA, Washington DC, 11 October. Nguyen, D., Overwijk, A., Hauff, C., Trieschnigg, D.R.B., Hiemstra, D. and de Jong, F.M.G. (2009), “WikiTranslate: query translation for cross-lingual information retrieval using only Wikipedia”, in Peters, C. et al. (Eds), CLEF 2008, LNCS 5706, Springer-Verlag, Berlin, pp. 58-65. Oard, D. and Gonzalo, J. (2001), The CLEF 2001 Interactive Track. Evaluation of Cross-Language Information Systems, Springer-Verlag, Berlin. Olvera-Lobo, M.D. and Gutierrez-Artacho, J. (2010), “Question-answering systems as efficient sources of terminological information: evaluation”, Health Information and Library Journal, Vol. 27 No. 1, pp. 2-10. Pe´rez-Coutin˜o, M., Solorio, T., Montes y Go´mez, M., Lo´pez Lo´pez, A. and Villasen˜or Pineda, L. (2004), “The use of lexical context in question answering for Spanish”, paper presented at Workshop of the Cross-Language Evaluation Forum, CLEF 2004, Bath, UK, September 15-17. Tsur, O. (2003), “Definitional question-answering using trainable text classifiers”, dissertation, University of Amsterdam, Amsterdam. Voorhees, E.M. (1999), “The TREC 8 question answering track report”, in Voorhees, E.M. and Harman, D. (Eds), Proceedings of the 8th Text REtrieval Conference, Gaithersburg, MD, November 17-19, pp. 107-30. Zweigenbaum, P. (2005), “Question answering in biomedicine”, in De Rijke, M. and Webber, B. (Eds), Proceedings of Workshop on Natural Language Processing for Question Answering, EACL 2003, ACL, Budapest, pp. 1-4. Downloads Downloads per month over past year Actions (login required) View Item
more...
No comment yet.
Scooped by Andoni Sagarna Izagirre
Scoop.it!

languagehat.com : Quantitative Methods in Historical Linguistics.

languagehat.com : Quantitative Methods in Historical Linguistics. | language technology and business internationalization | Scoop.it
Linguistics generally has seen an increase in the use of corpora and quantitative methods over the recent years. Yet journal publications in historical linguistics are less likely to use such methods. Part of the explanation is no doubt the advantage that linguistics for extant languages holds regarding greater availability of annotated text corpora and people who can answer questionnaires or take part in experiments. Yet this can only be part of the explanation. […] It is reasonable to look to cultural explanations for this. After all, the technical barriers keep getting lower and the availability of resources keep increasing. So what is special about historical linguistics? For one thing, historical linguistics (at least if we consider the historical-comparative method) has a very long, very stable, and very successful history. The methodological core of the historical-comparative method has proved remarkably stable over time. Furthermore, there is a history of failed attempts at using quantitative methods in historical linguistics. In some cases, such techniques have been tested and simply failed to work, as one would expect in any scientific endeavour. In other cases, the lack of extensive quantitative modelling by historical linguists have enticed scholars from other fields, with experience in statistical models, to step in and fill that gap. These endeavours have met with mixed reactions from mainstream historical linguistics. What seems to be missing is a positive case for using quantitative methods in historical linguistics, on the premises of historical linguistics. That, in our view, is the only way that quantitative techniques can properly cross the chasm into adoption in mainstream historical linguistics. Such a positive case must go well beyond training manuals or statistics classes. Instead, the intellectual footwork for integrating numbers with the core questions that historical linguistics faces must be done.
more...
No comment yet.
Rescooped by Andoni Sagarna Izagirre from Language Tech Market News
Scoop.it!

@iFlytek Partners to Build National AI-assisted Translation Platform in China

@iFlytek Partners to Build National AI-assisted Translation Platform in China | language technology and business internationalization | Scoop.it

Chinese AI giant iFlytek has a plan to work with China International Publishing Group (CIPG) to build a national translation platform based on its AI technologies. iFlytek has been providing real-time Chinese-to-English translation and interpretation services for government organizations and companies both in China and abroad. The latest machine can translate Chinese to 33 languages. 


Via LT-Innovate
more...
No comment yet.
Rescooped by Andoni Sagarna Izagirre from Translation Memory
Scoop.it!

3rd INTERNATIONAL SUMMER SCHOOL IN TRANSLATION TECHNOLOGY – Translation Technology Summer School

Workshops on translation technologies KU Leuven...

Via Sergey Rybkin
more...
No comment yet.
Rescooped by Andoni Sagarna Izagirre from Language Tech Market News
Scoop.it!

@Mozilla 'Common Voice' Expands Language Datasets

@Mozilla 'Common Voice' Expands Language Datasets | language technology and business internationalization | Scoop.it

Mozilla launched the first fruits of its Common Voice datasets in English back in November, a collection that contained some 500 hours of speech and constituted 400,000 recordings from 20,000 individuals. Today, Mozilla officially kick starts the process of collecting voice data for three more languages — French, German, and — a little randomly — Welsh. Another 40 tongues are currently being prepared for the data collection process, with the likes of Brazilian Portuguese, Chinese (Taiwan), Indonesian, Polish, and Dutch already halfway toward being ready to start crowdsourcing voice data.


Via LT-Innovate
more...
No comment yet.
Rescooped by Andoni Sagarna Izagirre from Language Tech Market News
Scoop.it!

Global Language Processing Market Forecasts to 2025

Global Language Processing Market Forecasts to 2025 | language technology and business internationalization | Scoop.it

Top Language Processing Manufacturers covered in this report: Addstructure, Apple, Dialogflow, DigitalGenius, Google, IBM, Klevu, Microsoft, Mindmeld, NetBase, Satisfi Labs, Twiggle, Inbenta


Via LT-Innovate
more...
No comment yet.
Rescooped by Andoni Sagarna Izagirre from Translation Memory
Scoop.it!

Global machine translation market analysis to 2024 examined in new market research report - WhaTech

Global machine translation market analysis to 2024 examined in new market research report - WhaTech | language technology and business internationalization | Scoop.it
Global Machine Translation Market Growth, Trends, Status and In-depth Analysis 2024 Covering Major Key Players - IBM Corporation, Lionbridge Technologies Inc., Moravia IT, Systran International, Google Inc., Microsoft Corporation etc.

Via Sergey Rybkin
more...
No comment yet.
Rescooped by Andoni Sagarna Izagirre from Language Tech Market News
Scoop.it!

Email Text Analytics to Gauge Corporate Morale

Email Text Analytics to Gauge Corporate Morale | language technology and business internationalization | Scoop.it
In an ideal world, employees would be honest with their bosses, and come clean about all the problems they observe at work. But in the real world, many employees worry that the messenger will be shot; their worst fears stay bottled up. Text analytics might allow firms to gain insights from their employees while intruding only minimally on their privacy. The lesson: Figure out the truth about how the workforce is feeling not by eavesdropping on the substance of what employees say, but by examining how they are saying it.

Via LT-Innovate
more...
No comment yet.
Scooped by Andoni Sagarna Izagirre
Scoop.it!

The Real Problems with Neural Machine Translation

The Real Problems with Neural Machine Translation | language technology and business internationalization | Scoop.it
TLDR: No! Your Machine Translation Model is not "prophesying", but let's look at the six major issues with neural machine translation (NMT).So I saw a Twitter thread today with the editor-in-chief of Motherboard tweeting, "Google Translate is popping out bizarre religious texts and no one is sur...
more...
No comment yet.
Scooped by Andoni Sagarna Izagirre
Scoop.it!

[1708.01677] A network approach to topic models

Download: PDF Other formats (license) Current browse context: stat.ML < prev | next > new | recent | 1708 Change to browse by: References & Citations INSPIRE HEP (refers to | cited by ) NASA ADS Statistics > Machine Learning A network approach to topic models Martin Gerlach, Tiago P. Peixoto, Eduardo G. Altmann (Submitted on 4 Aug 2017 (v1), last revised 19 Jul 2018 (this version, v2)) One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach which infers the latent topical structure of a collection of documents. Despite their success --- in particular of its most widely used variant called Latent Dirichlet Allocation (LDA) --- and numerous applications in sociology, history, and linguistics, topic models are known to suffer from severe conceptual and practical problems, e.g. a lack of justification for the Bayesian priors, discrepancies with statistical properties of real texts, and the inability to properly choose the number of topics. Here we obtain a fresh view on the problem of identifying topical structures by relating it to the problem of finding communities in complex networks. This is achieved by representing text corpora as bipartite networks of documents and words. By adapting existing community-detection methods -- using a stochastic block model (SBM) with non-parametric priors -- we obtain a more versatile and principled framework for topic modeling (e.g., it automatically detects the number of topics and hierarchically clusters both the words and documents). The analysis of artificial and real corpora demonstrates that our SBM approach leads to better topic models than LDA in terms of statistical model selection. More importantly, our work shows how to formally relate methods from community detection and topic modeling, opening the possibility of cross-fertilization between these two fields. Comments: Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Data Analysis, Statistics and Probability (physics.data-an); Physics and Society (physics.soc-ph) Journal reference: Science Advances 4, eaaq1360 (2018) DOI: 10.1126/sciadv.aaq1360 Cite as: arXiv:1708.01677 [stat.ML]   (or arXiv:1708.01677v2 [stat.ML] for this version) Submission history From: Martin Gerlach [view email] [v1] Fri, 4 Aug 2017 22:35:50 GMT (1729kb,D) [v2] Thu, 19 Jul 2018 13:14:31 GMT (3387kb,D) Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
more...
No comment yet.
Rescooped by Andoni Sagarna Izagirre from Language Tech Market News
Scoop.it!

@ScreenSystems to Launch New Subtitling Tech with @Speechmatics 

@ScreenSystems to Launch New Subtitling Tech with @Speechmatics  | language technology and business internationalization | Scoop.it

Screen Systems (UK) has been working with the Cambridge company Speechmatics to create “the holy grail” of subtitling - a software that uses artificial intelligence to give what Mr Wales calls “the most accurate speech to text engine ever seen.” The new product is being launched at a trade show in Amsterdam this September.
But aren’t there other tech companies out there with the same mission? Yes, Mr Wales admitted - “but not to the level we are. We’ve been doing subtitling for over 40 years, so we’re the experts. We’re not scared of the competition, because we realise how difficult this industry is.”


Via LT-Innovate
more...
No comment yet.
Rescooped by Andoni Sagarna Izagirre from Language Tech Market News
Scoop.it!

New Grammar Checker for Google Docs

New Grammar Checker for Google Docs | language technology and business internationalization | Scoop.it

"Our AI can catch several different types of corrections, from simple grammatical rules like how to use articles in a sentence (like “a” versus “an”), to more complicated grammatical concepts such as how to use subordinate clauses correctly. Machine learning will help improve this capability over time to detect trickier grammar issues," Google wrote in a blog post.


Via LT-Innovate
more...
No comment yet.
Scooped by Andoni Sagarna Izagirre
Scoop.it!

Nouns are Better than N-Grams

Nouns are Better than N-Grams | language technology and business internationalization | Scoop.it
A standard workflow for many varieties of text analysis is to tokenize, then remove stop words from the list of tokens, and then stem the remaining tokens. These tokens are also sometimes used to generate n-grams.
more...
No comment yet.
Rescooped by Andoni Sagarna Izagirre from Language Tech Market News
Scoop.it!

Cogito Uses Spoken Sentiment Analysis to Improve Call-centre Service

Cogito Uses Spoken Sentiment Analysis to Improve Call-centre Service | language technology and business internationalization | Scoop.it

Cogito has announced a $37 million Series C investment. The company has raised over $64 million since it emerged from the MIT Human Dynamics Lab back in 2007 trying to use the artificial intelligence technology available at the time to understand sentiment and apply it in a business context.
While it took some time for the technology to catch up with the vision, and find the right use case, company CEO and founder Joshua Feast says today they are helping customer service representatives understand the sentiment and emotional context of the person on the line and give them behavioral cues on how to proceed.


Via LT-Innovate
more...
No comment yet.
Scooped by Andoni Sagarna Izagirre
Scoop.it!

History of Corpus Linguistics

This is a brief introduction to Corpus Linguistics theory and application. GELC/CNPq. Applied Linguistics and Language Studies Graduate Program at PUCSP, Bra...
more...
No comment yet.
Rescooped by Andoni Sagarna Izagirre from Translation Memory
Scoop.it!

Datafication in the Translation Industry: Will West Meet East?

Datafication in the Translation Industry: Will West Meet East? | language technology and business internationalization | Scoop.it
In this blog post, we tackle all aspects of the need for data in the localization and translation industry as well as touching upon different scenes observed in Europe and China.

Via Sergey Rybkin
more...
No comment yet.
Scooped by Andoni Sagarna Izagirre
Scoop.it!

Building an Audio Collection for All the World's Languages - The Rosetta Project

Building an Audio Collection for All the World's Languages - The Rosetta Project | language technology and business internationalization | Scoop.it
The Rosetta Project is pleased to announce the Parallel Speech Corpus Project, a year-long volunteer-based effort to collect parallel recordings in languages representing at least 95% of the world's speakers. The resulting corpus will include audio recordings in hundreds of languages of the same set of texts, each accompanied by a transcription. This will provide a platform for creating new educational and preservation-oriented tools as well as technologies that may one day allow artificial systems to comprehend, translate, and generate them. Huge text and speech corpora of varying degrees of structure already exist for many of the most widely spoken languages in the world---English is probably the most extensively documented, followed by other majority languages like Russian, Spanish, and Portuguese. Given some degree of access to these corpora (though many are not publicly accessible), research, education and preservation efforts in the ten languages which represent 50% of the world's speakers (Mandarin, Spanish, English, Hindi, Urdu, Arabic, Bengali, Portuguese, Russian and Japanese) can be relatively well-resourced. But what about the other half of the world? The next 290 most widely spoken languages account for another 45% of the population, and the remaining 6,500 or so are spoken by only 5%--this latter group representing the "long tail" of human languages: Equal documentation of all the world's languages is an enormous challenge, especially in light of the tremendous quantity and diversity represented by the long tail. The Parallel Speech Corpus Project will take a first step toward universal documentation of all human languages, with the goal of providing documentation of the top 300 and providing a model that can then be extended out to the long tail. Eventually, researchers, educators and engineers alike should have access to every living human language, creating new opportunities for expanding knowledge and technology alike and helping to preserve our threatened diversity. This project is made possible through the support and sponsorship of speech technology expert James Baker and will be developed in partnership with his ALLOW initiative. We will be putting out a call for volunteers soon. In the meantime, please contact rosetta@longnow.org with questions or suggestions.
more...
No comment yet.