Bits 'n Pieces on...
Follow
Find
796 views | +5 today
 
Rescooped by onur savas from Big Data and NoSQL Daily
onto Bits 'n Pieces on Big Data R&D
Scoop.it!

Apache Spark for Big Analytics

Apache Spark for Big Analytics | Bits 'n Pieces on Big Data R&D | Scoop.it

Via Simon Hunanyan
more...
Simon Hunanyan's curator insight, December 23, 2013 10:09 PM

Spark, an Apache incubator project, is an open source distributed computing framework for advanced analytics in Hadoop. It's 100X faster than what they are able to achieve with MapReduce. Spark includes a machine learning library (MLLib), a graph engine (GraphX), a streaming analytics engine (Spark Streaming) and much more...

Currently, Spark supports programming interfaces for Scala, Java and Python.  The R interface is under development and this is expected to be released in the first half of 2014.

Bits 'n Pieces on Big Data R&D
Information and insight into Big Data R&D
Curated by onur savas
Your new post is loading...
Your new post is loading...
Scooped by onur savas
Scoop.it!

The Ultimate Challenge For Recommendation Engines | MIT Technology Review

The Ultimate Challenge For Recommendation Engines | MIT Technology Review | Bits 'n Pieces on Big Data R&D | Scoop.it
If you share an on-line move account with other people in your household, you probably receive some inappropriate recommendations. That may soon change.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

Large-Scale High-Precision Topic Modeling on Twitter (KDD 2014)

more...
No comment yet.
Scooped by onur savas
Scoop.it!

How This Algorithm Detected The Ebola Outbreak Before Humans Could

How This Algorithm Detected The Ebola Outbreak Before Humans Could | Bits 'n Pieces on Big Data R&D | Scoop.it
By mining the social web, machines saw the outbreak coming a week before the world knew about it.
onur savas's insight:

... and the global outbreaks in the HealthMap: http://healthmap.org/en/

more...
No comment yet.
Scooped by onur savas
Scoop.it!

A (very) brief review of published human subjects research conducted with social media companies | Simply Statistics

A (very) brief review of published human subjects research conducted with social media companies | Simply Statistics | Bits 'n Pieces on Big Data R&D | Scoop.it

More and more human subjects research is being performed by large tech companies. This is a short list of published research.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Fighting spam with BotMaker | Twitter Blogs

Fighting spam with BotMaker  | Twitter Blogs | Bits 'n Pieces on Big Data R&D | Scoop.it
Spam on Twitter is different from traditional spam primarily because of two aspects of our platform: Twitter exposes developer APIs to make it easy to interact with the platform and real-time conte...
more...
No comment yet.
Scooped by onur savas
Scoop.it!

Data Science at IQSS, Harvard

Data Science at IQSS, Harvard | Bits 'n Pieces on Big Data R&D | Scoop.it

Data Science at IQSS combines expertise in software engineering, statistical innovation and data curation.

onur savas's insight:

Various computational social science projects, which will handle big data in the upcoming years.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Researchers hope deep learning algorithms can run on FPGAs and supercomputers

Researchers hope deep learning algorithms can run on FPGAs and supercomputers | Bits 'n Pieces on Big Data R&D | Scoop.it
The NSF has funded projects that will investigate how deep learning algorithms run on FPGAs and across systems using the high-performance RDMA interconnect. Another project, led by Andrew Ng and two supercomputing experts, wants to put the models on supercomputers and give them a Python interface.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

Hadoop at a Crossroads?

Hadoop at a Crossroads? | Bits 'n Pieces on Big Data R&D | Scoop.it
A few facts and opinions and a couple of announcements, with a prediction on where the "Hadoop stack" might be going.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

World's largest event dataset now publicly available in BigQuery

World's largest event dataset now publicly available in BigQuery | Bits 'n Pieces on Big Data R&D | Scoop.it
onur savas's insight:

Gdelt Project: http://gdeltproject.org/

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Experimental evidence of massive-scale emotional contagion through social networks

Experimental evidence of massive-scale emotional contagion through social networks | Bits 'n Pieces on Big Data R&D | Scoop.it

"We show, via a massive (N = 689,003) experiment on Facebook, that emotional states can be transferred to others via emotional contagion, leading people to experience the same emotions without their awareness. We provide experimental evidence that emotional contagion occurs without direct interaction between people (exposure to a friend expressing an emotion is sufficient), and in the complete absence of nonverbal cues.

 "
more...
No comment yet.
Scooped by onur savas
Scoop.it!

One Hundred Million Creative Commons Flickr Images for Research

One Hundred Million Creative Commons Flickr Images for Research | Bits 'n Pieces on Big Data R&D | Scoop.it

The Flickr Creative Commons dataset as part of Yahoo Webscope’s datasets is made available for researchers . The dataset is one of the largest public multimedia datasets that has ever been released—99.3 million images and 0.7 million videos, all from Flickr and all under Creative Commons licensing.


The dataset (about 12GB) consists of a photo_id, a jpeg url or video url, and some corresponding metadata such as the title, description, title, camera type, title, and tags. Plus about 49 million of the photos are geotagged! What’s not there, like comments, favorites, and social network data, can be queried from the Flickr API

onur savas's insight:

"A back of the envelope estimation reports 10% of all photos in the world were taken in the last 12 months, and that was calculated three years ago. "


more...
No comment yet.
Scooped by onur savas
Scoop.it!

From mobile phone data to the spatial structure of cities : Scientific Reports : Nature Publishing Group

From mobile phone data to the spatial structure of cities : Scientific Reports : Nature Publishing Group | Bits 'n Pieces on Big Data R&D | Scoop.it
Pervasive infrastructures, such as cell phone networks, enable to capture large amounts of human behavioral data but also provide information about the structure of cities and their dynamical properties. In this article, we focus on these last aspects by studying phone data recorded during 55 days in 31 Spanish cities. We first define an urban dilatation index which measures how the average distance between individuals evolves during the day, allowing us to highlight different types of city structure. We then focus on hotspots, the most crowded places in the city. We propose a parameter free method to detect them and to test the robustness of our results. The number of these hotspots scales sublinearly with the population size, a result in agreement with previous theoretical arguments and measures on employment datasets. We study the lifetime of these hotspots and show in particular that the hierarchy of permanent ones, which constitute the /`heart/' of the city, is very stable whatever the size of the city. The spatial structure of these hotspots is also of interest and allows us to distinguish different categories of cities, from monocentric and [ldquo]segregated[rdquo] where the spatial distribution is very dependent on land use, to polycentric where the spatial mixing between land uses is much more important. These results point towards the possibility of a new, quantitative classification of cities using high resolution spatio-temporal data.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

StreetScore

StreetScore | Bits 'n Pieces on Big Data R&D | Scoop.it

"This is a collection of map visualizations of perceived safety of street views from cities in the United States. We will be releasing a map of perceived safety for a new city each week. These maps are based on StreetScore — a machine learning algorithm designed to predict how safe a street view looks to a human observer (see FAQ). The StreetScore algorithm was created by Nikhil Naik as part of a collaboration between the Macro Connections group and the Camera Culture group at MIT Media Lab. Jade Philipoom created the visualizations presented in the StreetScore website. "

onur savas's insight:

As of today (6/4/14), NYC, Boston, Chicago and Detroit is available.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Robots With Their Heads in the Clouds

Robots With Their Heads in the Clouds | Bits 'n Pieces on Big Data R&D | Scoop.it
The five elements of cloud robotics
more...
No comment yet.
Scooped by onur savas
Scoop.it!

Chikungunya threat inspires new DARPA challenge

Chikungunya threat inspires new DARPA challenge | Bits 'n Pieces on Big Data R&D | Scoop.it
Defense Department announces prize for infectious disease forecasting model
onur savas's insight:

The DARPA Challenge: https://www.innocentive.com/ar/challenge/9933617?cc=DARPApress&utm_source=DARPA&utm_campaign=9933617&utm_medium=press

 

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Teaching machines to read between the lines (and a new corpus with entity salience annotations)

Teaching machines to read between the lines (and a new corpus with entity salience annotations) | Bits 'n Pieces on Big Data R&D | Scoop.it

Language understanding systems are largely trained on freely available data, such as the Penn Treebank, perhaps the most widely used linguistic resource ever created. Google has previously released lots of linguistic data ourselves, to contribute to the language understanding community as well as encourage further research into these areas. 

Now, Google is releasing a new dataset, based on another great resource: the New York Times Annotated Corpus, a set of 1.8 million articles spanning 20 years. 600,000 articles in the NYTimes Corpus have hand-written summaries, and more than 1.5 million of them are tagged with people, places, and organizations mentioned in the article. The Times encourages use of the metadata for all kinds of things, and has set up a forum to discuss related research.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Facebook uses ethnography to deliver more relevant ads

Facebook uses ethnography to deliver more relevant ads | Bits 'n Pieces on Big Data R&D | Scoop.it


“As researchers focusing on Facebook’s advertising, we led research trips with a cross-functional team of product managers, marketers, and engineers to Indonesia, Turkey, and South Africa to develop a solid understanding of cultural differences across these countries. [...] Forming a richer understanding of how businesses and people connect with each other—both on and off of Facebook—around the world works will help us develop better ad solutions that drive a positive feedback cycle: we will make better experiences for the people who use Facebook and for the businesses and brands who want to connect with their core customers and prospects.”

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Enabling a new future for cloud computing - US National Science Foundation (NSF)

Enabling a new future for cloud computing - US National Science Foundation (NSF) | Bits 'n Pieces on Big Data R&D | Scoop.it

The National Science Foundation (NSF) today announced two $10 million projects to create cloud computing testbeds--to be called "Chameleon" and "CloudLab"--that will enable the academic research community to develop and experiment with novel cloud architectures and pursue new, architecturally-enabled applications of cloud computing.

 

onur savas's insight:

Chameleon: http://www.chameleoncloud.org/

 

CloudLab: http://www.cloudlab.us/

 

more...
No comment yet.
Scooped by onur savas
Scoop.it!

The Emerging Pitfalls Of Nowcasting With Big Data | MIT Technology Review

The Emerging Pitfalls Of Nowcasting With Big Data | MIT Technology Review | Bits 'n Pieces on Big Data R&D | Scoop.it
Statisticians have boasted of the benefits of big data. Now they’re discovering the weaknesses.
onur savas's insight:

Ref: arxiv.org/abs/1408.0699 : Nowcasting Economic And Social Data: When And Why Search Engine Data Fails, An Illustration Using Google Flu Trends

more...
No comment yet.
Scooped by onur savas
Scoop.it!

For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights

For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights | Bits 'n Pieces on Big Data R&D | Scoop.it

Technology revolutions come in measured, sometimes foot-dragging steps. The lab science and marketing enthusiasm tend to underestimate the bottlenecks to progress that must be overcome with hard work and practical engineering.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

A Chrome app to let data scientists work together

A Chrome app to let data scientists work together | Bits 'n Pieces on Big Data R&D | Scoop.it

By downloading the app for the Chrome browser, you instantly get the IPython open-source software for interactive computing, as well as multiple Python libraries. After that, multiple people can explore and process data in a browser tab in a way that’s integrated with Google Drive.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Twitter: Big Data Opportunities | Science

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Google Replaces MapReduce With New Hyper-Scale Cloud Analytics System

Google Replaces MapReduce With New Hyper-Scale Cloud Analytics System | Bits 'n Pieces on Big Data R&D | Scoop.it
Says old distributed computing system does not handle petabyte-scale analytics well enough Read More
more...
No comment yet.
Scooped by onur savas
Scoop.it!

KDD Workshop on Learning Emergencies from Social Information 2014

KDD Workshop on Learning Emergencies from Social Information 2014 | Bits 'n Pieces on Big Data R&D | Scoop.it

Mobile phone data and the content generated by hundreds of millions of users on social media such as Twitter, or Facebook, present continuous data streams of human social activities, and offer an unprecedented opportunity to mine and understand the structure and dynamics of social and information behavior in various situations. In this workshop we will call attention to researching situations following large-scale emergencies, including natural disasters, terrorist attacks, and so on. These emergency events are now among the largest threats to national security. Over the last decade, natural disasters have affected more than 2.4 billions of people. There is an indisputably increasing need for new tools to strengthen disaster resilience at all levels of society. 

 How can we deal with data collected from heterogeneous and potentially biased sources? How can we properly understand social dynamics during emergencies? How can we turn such understanding into tools for decision makers? To better prepare for future emergencies, it is valuable to deeply understand the context within which the research can be applied. 
more...
No comment yet.
Scooped by onur savas
Scoop.it!

Data Mining Reveals the Factors Driving the Price of Bitcoins | MIT Technology Review

Data Mining Reveals the Factors Driving the Price of Bitcoins | MIT Technology Review | Bits 'n Pieces on Big Data R&D | Scoop.it
Two years ago a single bitcoin was worth around $5. Today it is worth around $600. Now one economist has worked out exactly what forces are behind this dramatic increase.
onur savas's insight:

Ref: arxiv.org/abs/1406.0268 : What Are the Main Drivers of the Bitcoin Price? Evidence from Wavelet Coherence Analysis

more...
No comment yet.