Bits 'n Pieces on...
Follow
Find
776 views | +0 today
 
Bits 'n Pieces on Big Data R&D
Information and insight into Big Data R&D
Curated by onur savas
Your new post is loading...
Your new post is loading...
Scooped by onur savas
Scoop.it!

Data Science

Data Science | Bits 'n Pieces on Big Data R&D | Scoop.it

Data Science at IQSS combines expertise in software engineering, statistical innovation and data curation.

onur savas's insight:

Various computational social science projects, which will handle big data in the upcoming years.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Researchers hope deep learning algorithms can run on FPGAs and supercomputers

Researchers hope deep learning algorithms can run on FPGAs and supercomputers | Bits 'n Pieces on Big Data R&D | Scoop.it
The NSF has funded projects that will investigate how deep learning algorithms run on FPGAs and across systems using the high-performance RDMA interconnect. Another project, led by Andrew Ng and two supercomputing experts, wants to put the models on supercomputers and give them a Python interface.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

Hadoop at a Crossroads?

Hadoop at a Crossroads? | Bits 'n Pieces on Big Data R&D | Scoop.it
A few facts and opinions and a couple of announcements, with a prediction on where the "Hadoop stack" might be going.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

World's largest event dataset now publicly available in BigQuery

World's largest event dataset now publicly available in BigQuery | Bits 'n Pieces on Big Data R&D | Scoop.it
onur savas's insight:

Gdelt Project: http://gdeltproject.org/

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Experimental evidence of massive-scale emotional contagion through social networks

Experimental evidence of massive-scale emotional contagion through social networks | Bits 'n Pieces on Big Data R&D | Scoop.it

"We show, via a massive (N = 689,003) experiment on Facebook, that emotional states can be transferred to others via emotional contagion, leading people to experience the same emotions without their awareness. We provide experimental evidence that emotional contagion occurs without direct interaction between people (exposure to a friend expressing an emotion is sufficient), and in the complete absence of nonverbal cues.

 "
more...
No comment yet.
Scooped by onur savas
Scoop.it!

One Hundred Million Creative Commons Flickr Images for Research

One Hundred Million Creative Commons Flickr Images for Research | Bits 'n Pieces on Big Data R&D | Scoop.it

The Flickr Creative Commons dataset as part of Yahoo Webscope’s datasets is made available for researchers . The dataset is one of the largest public multimedia datasets that has ever been released—99.3 million images and 0.7 million videos, all from Flickr and all under Creative Commons licensing.


The dataset (about 12GB) consists of a photo_id, a jpeg url or video url, and some corresponding metadata such as the title, description, title, camera type, title, and tags. Plus about 49 million of the photos are geotagged! What’s not there, like comments, favorites, and social network data, can be queried from the Flickr API

onur savas's insight:

"A back of the envelope estimation reports 10% of all photos in the world were taken in the last 12 months, and that was calculated three years ago. "


more...
No comment yet.
Scooped by onur savas
Scoop.it!

From mobile phone data to the spatial structure of cities : Scientific Reports : Nature Publishing Group

From mobile phone data to the spatial structure of cities : Scientific Reports : Nature Publishing Group | Bits 'n Pieces on Big Data R&D | Scoop.it
Pervasive infrastructures, such as cell phone networks, enable to capture large amounts of human behavioral data but also provide information about the structure of cities and their dynamical properties. In this article, we focus on these last aspects by studying phone data recorded during 55 days in 31 Spanish cities. We first define an urban dilatation index which measures how the average distance between individuals evolves during the day, allowing us to highlight different types of city structure. We then focus on hotspots, the most crowded places in the city. We propose a parameter free method to detect them and to test the robustness of our results. The number of these hotspots scales sublinearly with the population size, a result in agreement with previous theoretical arguments and measures on employment datasets. We study the lifetime of these hotspots and show in particular that the hierarchy of permanent ones, which constitute the /`heart/' of the city, is very stable whatever the size of the city. The spatial structure of these hotspots is also of interest and allows us to distinguish different categories of cities, from monocentric and [ldquo]segregated[rdquo] where the spatial distribution is very dependent on land use, to polycentric where the spatial mixing between land uses is much more important. These results point towards the possibility of a new, quantitative classification of cities using high resolution spatio-temporal data.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

StreetScore

StreetScore | Bits 'n Pieces on Big Data R&D | Scoop.it

"This is a collection of map visualizations of perceived safety of street views from cities in the United States. We will be releasing a map of perceived safety for a new city each week. These maps are based on StreetScore — a machine learning algorithm designed to predict how safe a street view looks to a human observer (see FAQ). The StreetScore algorithm was created by Nikhil Naik as part of a collaboration between the Macro Connections group and the Camera Culture group at MIT Media Lab. Jade Philipoom created the visualizations presented in the StreetScore website. "

onur savas's insight:

As of today (6/4/14), NYC, Boston, Chicago and Detroit is available.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Finding the Zebra in a Herd of Ponies- A new look at anomaly detection

Finding the Zebra in a Herd of Ponies- A new look at anomaly detection | Bits 'n Pieces on Big Data R&D | Scoop.it
The second publication in the O’Reilly Practical Machine Learning series, subtitled A New Look at Anomaly Detection by Ted Dunning and me, is being released this week.  In the previous book, which focused on practical approaches to recommendation, we started with the idea that everyone thinks “I want a pony”.  Here in the second book, what we want is to find the outlier, the zebra in a herd of ponies, the fish swimming against the school of fish, the rare event.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises

CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises | Bits 'n Pieces on Big Data R&D | Scoop.it

Locating timely and useful information during crises is critical for making potentially life-saving decisions. As the use of Twitter to broadcast useful information during such situations becomes more widespread, the problem of locating it becomes more difficult. CrisisLex is a lexicon of terms that frequently appear in crisis-relevant tweets. CrisisLex can be used to collect crisis-related messages from Twitter, and to automatically identify new terms that describe a specific crisis.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

RecSys 2014 (Silicon Valley) - RecSys

RecSys 2014 (Silicon Valley) - RecSys | Bits 'n Pieces on Big Data R&D | Scoop.it
Foster City, Silicon Valley, USA, 6th-10th October 2014

 

The ACM Recommender System conference (RecSys) is the premier international forum for the presentation of new research results, systems and techniques in the broad field of recommender systems.

onur savas's insight:

Also, of interest is the First Workshop on Recommendation Systems for Television and online Video (RecSysTV) that is happening in conjunction with this conference: http://boxfish.com/recsys

more...
No comment yet.
Scooped by onur savas
Scoop.it!

GraphLab | GraphLab Conference 2014

GraphLab | GraphLab Conference 2014 | Bits 'n Pieces on Big Data R&D | Scoop.it

Monday, July 21, 2014 from 8:00 AM to 7:00 PM (PDT) at Hotel Nikko San Francisco 

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Data Mining Reddit Posts Reveals How to Ask For a Favor--And Get it | MIT Technology Review

Data Mining Reddit Posts Reveals How to Ask For a Favor--And Get it | MIT Technology Review | Bits 'n Pieces on Big Data R&D | Scoop.it
There’s a secret to asking strangers for something and getting it. Now data scientists say they’ve discovered it by studying successful requests on the web
onur savas's insight:

The paper: http://arxiv.org/abs/1405.3282

more...
No comment yet.
Scooped by onur savas
Scoop.it!

The Emerging Pitfalls Of Nowcasting With Big Data | MIT Technology Review

The Emerging Pitfalls Of Nowcasting With Big Data | MIT Technology Review | Bits 'n Pieces on Big Data R&D | Scoop.it
Statisticians have boasted of the benefits of big data. Now they’re discovering the weaknesses.
onur savas's insight:

Ref: arxiv.org/abs/1408.0699 : Nowcasting Economic And Social Data: When And Why Search Engine Data Fails, An Illustration Using Google Flu Trends

more...
No comment yet.
Scooped by onur savas
Scoop.it!

For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights

For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights | Bits 'n Pieces on Big Data R&D | Scoop.it

Technology revolutions come in measured, sometimes foot-dragging steps. The lab science and marketing enthusiasm tend to underestimate the bottlenecks to progress that must be overcome with hard work and practical engineering.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

A Chrome app to let data scientists work together

A Chrome app to let data scientists work together | Bits 'n Pieces on Big Data R&D | Scoop.it

By downloading the app for the Chrome browser, you instantly get the IPython open-source software for interactive computing, as well as multiple Python libraries. After that, multiple people can explore and process data in a browser tab in a way that’s integrated with Google Drive.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Twitter: Big Data Opportunities | Science

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Google Replaces MapReduce With New Hyper-Scale Cloud Analytics System

Google Replaces MapReduce With New Hyper-Scale Cloud Analytics System | Bits 'n Pieces on Big Data R&D | Scoop.it
Says old distributed computing system does not handle petabyte-scale analytics well enough Read More
more...
No comment yet.
Scooped by onur savas
Scoop.it!

KDD Workshop on Learning Emergencies from Social Information 2014

KDD Workshop on Learning Emergencies from Social Information 2014 | Bits 'n Pieces on Big Data R&D | Scoop.it

Mobile phone data and the content generated by hundreds of millions of users on social media such as Twitter, or Facebook, present continuous data streams of human social activities, and offer an unprecedented opportunity to mine and understand the structure and dynamics of social and information behavior in various situations. In this workshop we will call attention to researching situations following large-scale emergencies, including natural disasters, terrorist attacks, and so on. These emergency events are now among the largest threats to national security. Over the last decade, natural disasters have affected more than 2.4 billions of people. There is an indisputably increasing need for new tools to strengthen disaster resilience at all levels of society. 

 How can we deal with data collected from heterogeneous and potentially biased sources? How can we properly understand social dynamics during emergencies? How can we turn such understanding into tools for decision makers? To better prepare for future emergencies, it is valuable to deeply understand the context within which the research can be applied. 
more...
No comment yet.
Scooped by onur savas
Scoop.it!

Data Mining Reveals the Factors Driving the Price of Bitcoins | MIT Technology Review

Data Mining Reveals the Factors Driving the Price of Bitcoins | MIT Technology Review | Bits 'n Pieces on Big Data R&D | Scoop.it
Two years ago a single bitcoin was worth around $5. Today it is worth around $600. Now one economist has worked out exactly what forces are behind this dramatic increase.
onur savas's insight:

Ref: arxiv.org/abs/1406.0268 : What Are the Main Drivers of the Bitcoin Price? Evidence from Wavelet Coherence Analysis

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Open data saves New York City drivers from parking tickets

Open data saves New York City drivers from parking tickets | Bits 'n Pieces on Big Data R&D | Scoop.it

parking ticHere’s a great example of how making government data open can directly benefit you

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Bitcoin Transaction Network Dataset

Bitcoin Transaction Network Dataset | Bits 'n Pieces on Big Data R&D | Scoop.it

"Bitcoin (bitcoin.org) is a digital, cryptographically secure currency. Transactions between public-key "addresses" maintained in a distributed, verified public ledger form a transaction network that can be studied by network scientists. This code processes binary-format Bitcoin .dat files generated by the Bitcoin client (bitcoin.org, tested on v0.5.3.1 or lower) into human-readable flat-file formats, retaining all available information. Furthermore, we provide a data model to facilitate storage and querying in a relational database."

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Cloud computing beckons scientists

Cloud computing beckons scientists | Bits 'n Pieces on Big Data R&D | Scoop.it
Price and flexibility appeal as data sets grow.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

RMOA: Massive online data stream classifications with R & MOA

RMOA: Massive online data stream classifications with R & MOA | Bits 'n Pieces on Big Data R&D | Scoop.it

For those of you who don't know MOA. MOA stands for Massive On-line Analysis and is an open-source framework that allows to build and run experiments of machine learning or data mining on evolving data streams. The website of MOA (http://moa.cms.waikato.ac.nz) indicates it contains machine learning algorithms for classification, regression, clustering, outlier detection and recommendation engines.

onur savas's insight:
It is recommended especially for R users who work with a lot of data or encounter RAM issues when building models on large datasets, MOA. It uses a limited amount of memory.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

The Secret Science of Retweets | MIT Technology Review

The Secret Science of Retweets | MIT Technology Review | Bits 'n Pieces on Big Data R&D | Scoop.it
There’s a secret to persuading strangers to retweet your messages. And a machine learning algorithm has discovered it.
onur savas's insight:

The paper is at http://arxiv.org/abs/1405.3750

more...
No comment yet.