Bits 'n Pieces on...
Follow
Find tag "dataset"
1.2K views | +0 today
Bits 'n Pieces on Big Data
Innovative information and insight into Big Data (if you like the content, please consider donating to my bitcoin address #3Pjof6N9xRAYXXSPZ4EAFLfHGn51ZdPcxi)
Curated by onur savas
Your new post is loading...
Your new post is loading...
Scooped by onur savas
Scoop.it!

1001 Datasets and Data Repositories

1001 Datasets and Data Repositories | Bits 'n Pieces on Big Data | Scoop.it

1001 Datasets and Data repositories ( List of lists of lists ) - rough list to compile - a rough lists of listsClick here to edit the title

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Bitcoin Transaction Network Dataset

Bitcoin Transaction Network Dataset | Bits 'n Pieces on Big Data | Scoop.it

"Bitcoin (bitcoin.org) is a digital, cryptographically secure currency. Transactions between public-key "addresses" maintained in a distributed, verified public ledger form a transaction network that can be studied by network scientists. This code processes binary-format Bitcoin .dat files generated by the Bitcoin client (bitcoin.org, tested on v0.5.3.1 or lower) into human-readable flat-file formats, retaining all available information. Furthermore, we provide a data model to facilitate storage and querying in a relational database."

more...
No comment yet.
Scooped by onur savas
Scoop.it!

BG Benchmark

BG Benchmark | Bits 'n Pieces on Big Data | Scoop.it

BG is a benchmark to evaluate performance of a data store for interactive social networking actions and sessions.These actions and sessions either read or update a very small amount of the entire data set.

One may use BG to compute either a Social Action Rating (SoAR) or a Socialites rating of a data store.These ratings compute the number of concurrent actions performed by a system when a fixed percentage of requests (say 95%) observe a latency equal to or lower than a pre-specifid threshold (say 100 msec) with the amount of unpredictable data less than a fixed threshold (say 0.01%) for some fixed duration of time (say 10 minutes). The values in the parantheses are inputs to BG. BG's output is the SoAR and Socialites rating of its target data store.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Twitter Data Grants

Twitter Data Grants | Bits 'n Pieces on Big Data | Scoop.it

With more than 500 million Tweets a day, Twitter has an expansive set of data from which we can glean insights and learn about a variety of topics, from health-related information such as when and where the flu may hit to global events like ringing in the new year. To date, it has been challenging for researchers outside the company who are tackling big questions to collaborate with us to access our public, historical data. Twitter Data Grants program aims to change that by connecting research institutions and academics with the data they need.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Time-resolved contact data to estimate potential infection routes in hospitals

Time-resolved contact data to estimate potential infection routes in hospitals | Bits 'n Pieces on Big Data | Scoop.it
A research project that aims to uncover fundamental patterns in social dynamics and coordinated human activity through a data-driven approach.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

One Hundred Million Creative Commons Flickr Images for Research

One Hundred Million Creative Commons Flickr Images for Research | Bits 'n Pieces on Big Data | Scoop.it

The Flickr Creative Commons dataset as part of Yahoo Webscope’s datasets is made available for researchers . The dataset is one of the largest public multimedia datasets that has ever been released—99.3 million images and 0.7 million videos, all from Flickr and all under Creative Commons licensing.


The dataset (about 12GB) consists of a photo_id, a jpeg url or video url, and some corresponding metadata such as the title, description, title, camera type, title, and tags. Plus about 49 million of the photos are geotagged! What’s not there, like comments, favorites, and social network data, can be queried from the Flickr API

onur savas's insight:

"A back of the envelope estimation reports 10% of all photos in the world were taken in the last 12 months, and that was calculated three years ago. "


more...
No comment yet.
Scooped by onur savas
Scoop.it!

Yelp Dataset Challenge | Yelp

Yelp Dataset Challenge | Yelp | Bits 'n Pieces on Big Data | Scoop.it

How well can you guess a review's rating from its text alone? Can you take all of the reviews of a business and predict when it will be the most busy, or when the business is open? Can you predict if a business is good for kids? Has Wi-Fi? Has Parking? What makes a review useful, funny, or cool? Can you figure out which business a user is likely to review next? How much of a business's success is really just location, location, location? What businesses deserve their own subcategory (i.e., Szechuan or Hunan versus just "Chinese restaurants"), and can you learn this from the review text? What makes a tip useful? There is a myriad of deep, machine learning questions to tackle with this rich dataset.

onur savas's insight:

Targeted for academic research though. The deadline is Thursday, July 31, 2014.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Yahoo! Webscope

Yahoo! Webscope | Bits 'n Pieces on Big Data | Scoop.it

The Yahoo Webscope Program is a reference library of interesting and scientifically useful datasets for non-commercial use by academics and other scientists. All datasets have been reviewed to conform to Yahoo’s data protection standards, including strict controls on privacy. We offer data in the following categories: Graph and Social Data, Ratings and Classification Data, Advertising and Market Data, Competition Data, Computing Systems Data, Image Data, and Language Data.

onur savas's insight:

Only available for academia though.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

YouTube Multiview Video Games Dataset

YouTube Multiview Video Games Dataset | Bits 'n Pieces on Big Data | Scoop.it

This dataset contains about 120k instances, each described by 13 feature types, with class information, specially useful for exploring multiview topics (cotraining, ensembles, clustering,..).

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Distributing the Edit History of Wikipedia Infoboxes

Distributing the Edit History of Wikipedia Infoboxes | Bits 'n Pieces on Big Data | Scoop.it
onur savas's insight:

It is available via Wikipedia itself as well but this 5.5 GB (gzipped) dataset is more managable.

more...
No comment yet.