Bits 'n Pieces on...
Follow
Find
894 views | +0 today
 
Scooped by onur savas
onto Bits 'n Pieces on Big Data R&D
Scoop.it!

Twitter open sources Storm-Hadoop hybrid called Summingbird

Twitter open sources Storm-Hadoop hybrid called Summingbird | Bits 'n Pieces on Big Data R&D | Scoop.it
Twitter has open sourced a “streaming MapReduce” system called Summingbird that makes Hadoop and Storm play nicer together so applications that require both batch and stream processing can do their jobs with as little complexity as possible.
more...
webDEVILopers's curator insight, September 4, 2013 8:56 AM

In the case of Twitter, Hadoop handles batch processing, Storm handles stream processing, and the hybrid system is called Summingbird. It’s not a tool for every job, but it sounds pretty handy for those it’s designed to address. Hybrid systems like this are actually becoming more common as companies realize they can’t survive in a real-time world with Hadoop alone.

From around the web

Bits 'n Pieces on Big Data R&D
Information and insight into Big Data R&D
Curated by onur savas
Your new post is loading...
Your new post is loading...
Scooped by onur savas
Scoop.it!

Where The Cloud Meets The Grid

Where The Cloud Meets The Grid | Bits 'n Pieces on Big Data R&D | Scoop.it
Companies build or rent grid machines when data length doesn't fit into HDFS, or the latency of parallel interconnects is too slow in the cloud. This review ex…
more...
No comment yet.
Scooped by onur savas
Scoop.it!

How the Department of Defense is using big data to combat sex trafficking

How the Department of Defense is using big data to combat sex trafficking | Bits 'n Pieces on Big Data R&D | Scoop.it
The Defense Advanced Research Projects Agency (DARPA) shows its humanitarian side.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

LabHack

LabHack | Bits 'n Pieces on Big Data R&D | Scoop.it

LabHack is the first Air Force hackathon aimted at building solutions to real-world challenges that AFRL researchers encounter every day.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

The LinkedIn Economic Graph Challenge

The LinkedIn Economic Graph Challenge | Bits 'n Pieces on Big Data R&D | Scoop.it
How would you use data from LinkedIn to solve the world's economic challenges and create economic opportunities for people? Submit your proposals to the LinkedIn Economic Graph Challenge.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

Mirador

Mirador | Bits 'n Pieces on Big Data R&D | Scoop.it
Mirador is a tool for visual exploration of complex datasets. It enables users to discover correlation patterns and derive new hypotheses from the data.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

Assessing the international spreading risk associated with the 2014 West African Ebola outbreak

Assessing the international spreading risk associated with the 2014 West African Ebola outbreak | Bits 'n Pieces on Big Data R&D | Scoop.it
The 2014 West African Ebola Outbreak is so far the largest and deadliest recorded in history. The affected countries, Sierra Leone, Guinea, Liberia, Nigeria, and recently Senegal have been struggling...
more...
No comment yet.
Scooped by onur savas
Scoop.it!

Course: Tackling the Challenges of Big Data (November 4, 2014 - December 16, 2014)

Course: Tackling the Challenges of Big Data (November 4, 2014 - December 16, 2014) | Bits 'n Pieces on Big Data R&D | Scoop.it
Survey state-of-the-art topics in Big Data: data collection, data storage and processing, extracting structured data from unstructured data, systems issues, analytics and visualization.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

Privacy, Anonymity, and Big Data in the Social Sciences

Privacy, Anonymity, and Big Data in the Social Sciences | Bits 'n Pieces on Big Data R&D | Scoop.it
Quality social science research and the privacy of human subjects require trust.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

On Facebook when the earth shakes...

On Facebook when the earth shakes... | Bits 'n Pieces on Big Data R&D | Scoop.it

On Sunday August 24th, 3:20 a.m Pacific time, an earthquake of magnitude 6.0 occurred in the Bay Area, 3.7 miles (6.0 km) northwest of American Canyon near the West Napa Fault. It was the largest earthquake in the Bay Area since the 1989 Loma Prieta earthquake.

During a crisis, people turn to Facebook (FB) to stay connected to their friends and family. They use it to receive social support and keep the people they care about informed on how they are doing.

Data scientists at FB used aggregate, anonymized Facebook activity data from cities located within 300 kilometers (around 200 miles) from the epicenter and analyzed how their Facebook activity following the earthquake differed from usual activity. A recent post from Jawbone (https://jawbone.com/blog/napa-earthquake-effect-on-sleep/) showed the effect of the earthquake on people's sleep. 

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Chikungunya threat inspires new DARPA challenge

Chikungunya threat inspires new DARPA challenge | Bits 'n Pieces on Big Data R&D | Scoop.it
Defense Department announces prize for infectious disease forecasting model
onur savas's insight:

The DARPA Challenge: https://www.innocentive.com/ar/challenge/9933617?cc=DARPApress&utm_source=DARPA&utm_campaign=9933617&utm_medium=press

 

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Teaching machines to read between the lines (and a new corpus with entity salience annotations)

Teaching machines to read between the lines (and a new corpus with entity salience annotations) | Bits 'n Pieces on Big Data R&D | Scoop.it

Language understanding systems are largely trained on freely available data, such as the Penn Treebank, perhaps the most widely used linguistic resource ever created. Google has previously released lots of linguistic data ourselves, to contribute to the language understanding community as well as encourage further research into these areas. 

Now, Google is releasing a new dataset, based on another great resource: the New York Times Annotated Corpus, a set of 1.8 million articles spanning 20 years. 600,000 articles in the NYTimes Corpus have hand-written summaries, and more than 1.5 million of them are tagged with people, places, and organizations mentioned in the article. The Times encourages use of the metadata for all kinds of things, and has set up a forum to discuss related research.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Facebook uses ethnography to deliver more relevant ads

Facebook uses ethnography to deliver more relevant ads | Bits 'n Pieces on Big Data R&D | Scoop.it


“As researchers focusing on Facebook’s advertising, we led research trips with a cross-functional team of product managers, marketers, and engineers to Indonesia, Turkey, and South Africa to develop a solid understanding of cultural differences across these countries. [...] Forming a richer understanding of how businesses and people connect with each other—both on and off of Facebook—around the world works will help us develop better ad solutions that drive a positive feedback cycle: we will make better experiences for the people who use Facebook and for the businesses and brands who want to connect with their core customers and prospects.”

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Facebook Gives Lessons In Network-Datacenter Design

Facebook Gives Lessons In Network-Datacenter Design | Bits 'n Pieces on Big Data R&D | Scoop.it
Every now and again, the hyperscale datacenter operators tell the world about some new tricks they have come up with to solve a particular set of pesky pro
more...
Scooped by onur savas
Scoop.it!

Studying the Big Data Forest While Ignoring Its Trees

Studying the Big Data Forest While Ignoring Its Trees | Bits 'n Pieces on Big Data R&D | Scoop.it

Google recently announced a promising new tool that might help researchers benefit from big data analysis, while at the same time protecting individuals’ privacy. This concept might seem sound counterintuitive at first, but it’s based on a solid mathematical foundation called “differential privacy.”

onur savas's insight:

The project is called RAPPOR: https://github.com/google/rappor

 

more...
No comment yet.
Scooped by onur savas
Scoop.it!

#HBRLive: The Internet of Things, Privacy, and The New Deal on Data

#HBRLive: The Internet of Things, Privacy, and The New Deal on Data | Bits 'n Pieces on Big Data R&D | Scoop.it

HBR senior editor Scott Berinato speaks with MIT's Alex "Sandy” Pentland, who has pioneered the use of sensor technology to collect deeply detailed, big data about people and their interactions.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Chicago uses big data to save itself from urban ills

Chicago uses big data to save itself from urban ills | Bits 'n Pieces on Big Data R&D | Scoop.it
The Windy City is using sophisticated predictive models to pinpoint risk areas and help it reduce everything from rat problems to cases of lead poisoning
more...
No comment yet.
Scooped by onur savas
Scoop.it!

The Socialist Origins of Big Data

The Socialist Origins of Big Data | Bits 'n Pieces on Big Data R&D | Scoop.it
Evgeny Morozov on how the ideas behind Project Cybersyn, a futuristic experiment in cybernetics from nineteen-seventies Chile, still shapes technology.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

1001 Datasets and Data Repositories

1001 Datasets and Data Repositories | Bits 'n Pieces on Big Data R&D | Scoop.it

1001 Datasets and Data repositories ( List of lists of lists ) - rough list to compile - a rough lists of listsClick here to edit the title

more...
No comment yet.
Scooped by onur savas
Scoop.it!

All-pairs similarity via DIMSUM | Twitter Blogs

All-pairs similarity via DIMSUM | Twitter Blogs | Bits 'n Pieces on Big Data R&D | Scoop.it
Given a dataset of sparse vector data, we solve the problem of finding all similar vector pairs according to a similarity function.
onur savas's insight:

Twitter uses this algorithm to find users, hashtags and ads that are very similar to one another, so they may be recommended and shown to users and advertisers. Sources codes are also provided:


Scalding github pull-request: https://github.com/twitter/scalding/pull/833
Spark github pull-request: https://github.com/apache/spark/pull/336

more...
No comment yet.
Scooped by onur savas
Scoop.it!

The Ultimate Challenge For Recommendation Engines | MIT Technology Review

The Ultimate Challenge For Recommendation Engines | MIT Technology Review | Bits 'n Pieces on Big Data R&D | Scoop.it
If you share an on-line move account with other people in your household, you probably receive some inappropriate recommendations. That may soon change.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

How This Algorithm Detected The Ebola Outbreak Before Humans Could

How This Algorithm Detected The Ebola Outbreak Before Humans Could | Bits 'n Pieces on Big Data R&D | Scoop.it
By mining the social web, machines saw the outbreak coming a week before the world knew about it.
onur savas's insight:

... and the global outbreaks in the HealthMap: http://healthmap.org/en/

more...
No comment yet.
Scooped by onur savas
Scoop.it!

A (very) brief review of published human subjects research conducted with social media companies | Simply Statistics

A (very) brief review of published human subjects research conducted with social media companies | Simply Statistics | Bits 'n Pieces on Big Data R&D | Scoop.it

More and more human subjects research is being performed by large tech companies. This is a short list of published research.

more...
No comment yet.