The ClueWeb12 Dataset | Bits 'n Pieces on Big Data | Scoop.it

The ClueWeb12 dataset was created to support research on information retrieval and related human language technologies. The dataset consists of 870,043,929 English web pages, collected between February 10, 2012 and May 10, 2012. ClueWeb12 is a companion or successor to the ClueWeb09 web dataset."