Bits 'n Pieces on...
Follow
Find tag "corpus"
1.0K views | +0 today
Bits 'n Pieces on Big Data
Innovative information and insight into Big Data (if you like the content, please consider donating to my bitcoin address #1MhtqfDaAsy4TpYwjS2Kq2DMKrecupbx8c)
Curated by onur savas
Your new post is loading...
Your new post is loading...
Scooped by onur savas
Scoop.it!

Teaching machines to read between the lines (and a new corpus with entity salience annotations)

Teaching machines to read between the lines (and a new corpus with entity salience annotations) | Bits 'n Pieces on Big Data | Scoop.it

Language understanding systems are largely trained on freely available data, such as the Penn Treebank, perhaps the most widely used linguistic resource ever created. Google has previously released lots of linguistic data ourselves, to contribute to the language understanding community as well as encourage further research into these areas. 

Now, Google is releasing a new dataset, based on another great resource: the New York Times Annotated Corpus, a set of 1.8 million articles spanning 20 years. 600,000 articles in the NYTimes Corpus have hand-written summaries, and more than 1.5 million of them are tagged with people, places, and organizations mentioned in the article. The Times encourages use of the metadata for all kinds of things, and has set up a forum to discuss related research.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Google Research Blog: 11 Billion Clues in 800 Million Documents: A Web Research Corpus Annotated with Freebase Concepts

Google Research Blog: 11 Billion Clues in 800 Million Documents: A Web Research Corpus Annotated with Freebase Concepts | Bits 'n Pieces on Big Data | Scoop.it
more...
No comment yet.