Bits 'n Pieces on...
Follow
Find tag "cmu"
1.0K views | +0 today
Bits 'n Pieces on Big Data
Innovative information and insight into Big Data (if you like the content, please consider donating to my bitcoin address #3Pjof6N9xRAYXXSPZ4EAFLfHGn51ZdPcxi)
Curated by onur savas
Your new post is loading...
Your new post is loading...
Scooped by onur savas
Scoop.it!

The ClueWeb12 Dataset

The ClueWeb12 Dataset | Bits 'n Pieces on Big Data | Scoop.it

The ClueWeb12 dataset was created to support research on information retrieval and related human language technologies. The dataset consists of 870,043,929 English web pages, collected between February 10, 2012 and May 10, 2012. ClueWeb12 is a companion or successor to the ClueWeb09 web dataset."

onur savas's insight:

You have to sign a data license agreement with Carnegie Mellon University to obtain it. When uncompressed, it takes up 27.3 TB space.

more...
No comment yet.