Scalable Distributed System Design
198 views | +0 today
Follow
Your new post is loading...
Your new post is loading...
Scooped by Yuan
Scoop.it!

Comparing Pig Latin and SQL for Constructing Data Processing Pipelines

Comparing Pig Latin and SQL for Constructing Data Processing Pipelines | Scalable Distributed System Design | Scoop.it
Continue reading →
more...
No comment yet.
Scooped by Yuan
Scoop.it!

HBase vs Cassandra: why we moved

"This is such an important point I will reiterate: the beauty of Cassandra is that you can choose the trade-offs you want on a case by case basis such that they best match the requirements of the particular operation you are performing. Cassandra proves you can go beyond the popular interpretation of the CAP Theorem and the world keeps on spinning!"

more...
No comment yet.
Scooped by Yuan
Scoop.it!

Big Data Security: The Evolution of Hadoop’s Security Model

In his new article, Kevin T Smith focuses on the importance of Big Data Security and he discusses the evolution of Hadoop's security model.
more...
No comment yet.
Scooped by Yuan
Scoop.it!

MapReduce Patterns, Algorithms, and Use Cases

MapReduce Patterns, Algorithms, and Use Cases | Scalable Distributed System Design | Scoop.it
In this article I digested a number of MapReduce patterns and algorithms to give a systematic view of the different techniques that can be found on the web or scientific articles. Several practical...
more...
No comment yet.
Scooped by Yuan
Scoop.it!

A vendor-independent comparison of NoSQL databases: Cassandra, HBase, MongoDB, Riak

A vendor-independent comparison of NoSQL databases: Cassandra, HBase, MongoDB, Riak | Scalable Distributed System Design | Scoop.it
In 2010, when the world became enchanted by the capabilities of cloud systems and new databases designed to serve them, a group of researchers from Yahoo decided to look into NoSQL. They developed the YCSB framework to assess the performance of new tools and find the best cases for their use. The results were published in the paper,
more...
No comment yet.
Scooped by Yuan
Scoop.it!

Probabilistic Data Structures for Web Analytics and Data Mining

Probabilistic Data Structures for Web Analytics and Data Mining | Scalable Distributed System Design | Scoop.it
Statistical analysis and mining of huge multi-terabyte data sets is a common task nowadays, especially in the areas like web analytics and Internet advertising. Analysis of such large data sets oft...
more...
No comment yet.
Scooped by Yuan
Scoop.it!

MapReduce 2.0 in Apache Hadoop 0.23

MapReduce 2.0 in Apache Hadoop 0.23 | Scalable Distributed System Design | Scoop.it
Cloudera offers enterprises a powerful new data platform built on the popular Apache Hadoop open-source software package.
more...
No comment yet.
Scooped by Yuan
Scoop.it!

The Lambda architecture: principles for architecting realtime Big Data systems

The Lambda architecture: principles for architecting realtime Big Data systems | Scalable Distributed System Design | Scoop.it
query = function(all data)
I’ve started reading “Big Data - Principles and best practices of scalable realtime data systems" by Nathan Marz and James Warren. Throughout 2012 Manning have been...
more...
No comment yet.
Scooped by Yuan
Scoop.it!

In-Stream Big Data Processing

In-Stream Big Data Processing | Scalable Distributed System Design | Scoop.it
The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. It became clear that real-time query processing and in-strea...
more...
No comment yet.