There is a beautiful simplicity in statistical machine translation, such as Google Translate. Essentially, the more data you have, the better the probability of a high-quality translation as an end result. But what do you do when you don't have enoug...
Cambridge, Massachusetts - Computer scientists at MIT and Israel’s Technion have discovered an unexpected source of information about the world’s languages: the habits of native speakers of those languages when writing in English.
The work could enable computers chewing through relatively accessible documents to approximate data that might take trained linguists months in the field to collect. But that data could in turn lead to better computational tools.
“These [linguistic] features that our system is learning are of course, on one hand, of nice theoretical interest for linguists,” says Boris Katz, a principal research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory and one of the leaders of the new work. “But on the other, they’re beginning to be used more and more often in applications. Everybody’s very interested in building computational tools for world languages, but in order to build them, you need these features. So we may be able to do much more than just learn linguistic features. … These features could be extremely valuable for creating better parsers, better speech-recognizers, better natural-language translators, and so forth.”
In fact, Katz explains, the researchers’ theoretical discovery resulted from their work on a practical application: About a year ago, Katz proposed to one of his students, Yevgeni Berzak, that he try to write an algorithm that could automatically determine the native language of someone writing in English. The hope was to develop grammar-correcting software that could be tailored to a user’s specific linguistic background.
With help from Katz and from Roi Reichart, an engineering professor at the Technion who was a postdoc at MIT, Berzak built a system that combed through more than 1,000 English-language essays written by native speakers of 14 different languages. First, it analyzed the parts of speech of the words in every sentence of every essay and the relationships between them. Then it looked for patterns in those relationships that correlated with the writers’ native languages.
Like most machine-learning classification algorithms, Berzak’s assigned probabilities to its inferences. It might conclude, for instance, that a particular essay had a 51 percent chance of having been written by a native Russian speaker, a 33 percent chance of having been written by a native Polish speaker, and only a 16 percent chance of having been written by a native Japanese speaker.
Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.
How to integrate my topics' content to my website?
Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.
Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.