Bits 'n Pieces on...
Find tag "storm"
715 views | +2 today
Bits 'n Pieces on Big Data R&D
Information and insight into Big Data R&D
Curated by onur savas
Your new post is loading...
Scooped by onur savas!

Twitter open sources Storm-Hadoop hybrid called Summingbird

Twitter open sources Storm-Hadoop hybrid called Summingbird | Bits 'n Pieces on Big Data R&D |
Twitter has open sourced a “streaming MapReduce” system called Summingbird that makes Hadoop and Storm play nicer together so applications that require both batch and stream processing can do their jobs with as little complexity as possible.
Webdevilopers's curator insight, September 4, 2013 8:56 AM

In the case of Twitter, Hadoop handles batch processing, Storm handles stream processing, and the hybrid system is called Summingbird. It’s not a tool for every job, but it sounds pretty handy for those it’s designed to address. Hybrid systems like this are actually becoming more common as companies realize they can’t survive in a real-time world with Hadoop alone.

Scooped by onur savas!

Running a Multi-Node Storm Cluster

Running a Multi-Node Storm Cluster | Bits 'n Pieces on Big Data R&D |
Setting up a distributed Storm cluster on RHEL 6 with process supervision
onur savas's insight:

Storm is a promising technology to run real-time distributed analytical processes. This tutorial covers how one can run a Storm cluster on multiple nodes.

No comment yet.
Scooped by onur savas!

In-Stream Big Data Processing

In-Stream Big Data Processing | Bits 'n Pieces on Big Data R&D |

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. It became clear that real-time query processing and in-stream processing is the immediate need in many practical applications. In recent years, this idea got a lot of traction and a whole bunch of solutions like Twitter’s Storm, Yahoo’s S4, Cloudera’s Impala, Apache Spark, and Apache Tez appeared and joined the army of Big Data and NoSQL systems. This article is an effort to explore techniques used by developers of in-stream data processing systems, trace the connections of these techniques to massive batch processing and OLTP/OLAP databases, and discuss how one unified query engine can support in-stream, batch, and OLAP processing at the same time.

No comment yet.