Distributed Architectures
13.6K views | +0 today
Follow
Distributed Architectures
distributed architectures, big data, elasticsearch, hadoop, hive, cassandra, riak, redis, hazelcast, paxos, p2p, high scalability, distributed databases, among other things...
Curated by Nico
Your new post is loading...
Your new post is loading...
Scoop.it!

Interview with the Github Elasticsearch Team

Interview with the Github Elasticsearch Team | Distributed Architectures | Scoop.it

Transcript condensed and edited from the original audio. Thanks to both Tim Pease and Grant Rodgers of Github for taking the time to answer these questions!

Nico's insight:

Very nice story, with drama, suspens and happy ending

No comment yet.
Scoop.it!

Don't use Hadoop - your data isn't that big

So, how much experience do you have with Big Data and Hadoop?” they asked me. I told them that I use Hadoop all the time, but rarely for jobs larger than a few TB. I’m basically a big data neophite - I know the concepts, I’ve written code, but never at scale.


The next question they asked me. “Could you use Hadoop to do a simple group by and sum?” Of course I could, and I just told them I needed to see an example of the file format.


They handed me a flash drive with all 600MB of their data on it (not a sample, everything). For reasons I can’t understand, they were unhappy when my solution involved pandas.read_csv rather than Hadoop.


[...]

Nico's insight:

"But my data is more than 5TB! -  Your life now sucks - you are stuck with Hadoop". Quite true.

No comment yet.
Scoop.it!

Building the Perfect Cassandra Test Environment

Building the Perfect Cassandra Test Environment | Distributed Architectures | Scoop.it

A month back, one of our clients asked us to set up 15 individual single-node Cassandra instances, each of which would live in 64MB of RAM and each of which would reside on the same machine. My first response was “Why!?”


[...]

Nico's insight:

Sometimes you don't need to scale up but to scale down. Nice recipe.

No comment yet.
Scoop.it!

helenos

helenos | Distributed Architectures | Scoop.it

Helenos is a web based GUI tool to manage your data stored in Apache Cassandra

Nico's insight:

A sexier frontend to Cassandra than its command line

No comment yet.
Scoop.it!

Beating the CAP Theorem Checklist

Your ( ) tweet ( ) blog post ( ) marketing material ( ) online comment advocates a way to beat the CAP theorem. Your idea will not work. Here is why it won't work:

( ) you are assuming that software/network/hardware failures will not happen

( ) you pushed the actual problem to another layer of the system

( ) your solution is equivalent to an existing one that doesn't beat CAP

( ) you're actually building an AP system

( ) you're actually building a CP system

( ) you are not, in fact, designing a distributed system


[...]

Nico's insight:

Funny list of wrong assumptions in a distributed system

No comment yet.
Scoop.it!

Ordasity - a library for building stateful clustered services on the JVM.

Ordasity - a library for building stateful clustered services on the JVM. | Distributed Architectures | Scoop.it

Ordasity is a library designed to make building and deploying reliable clustered services on the JVM as straightforward as possible. It's written in Scala and uses Zookeeper for coordination.


Ordasity's simplicity and flexibility allows us to quickly write, deploy, and (most importantly) operate distributed systems on the JVM without duplicating distributed "glue" code or revisiting complex reasoning about distribution strategies.

Nico's insight:

Its API seems pretty straight forward. It needs a Zookeeper cluster though.

No comment yet.
Scoop.it!

RAFT - In Search of an Understandable Consensus Algorithm

If like many humans you've found even Paxos Made Simple a bit difficult to understand, you might enjoy RAFT as described in In Search of an Understandable Consensus Algorithm by Stanford's Diego Ongaro and John Ousterhout. The video presentation of the paper is given by John Ousterhout. Both the paper and the video are delightfully accessible.

Nico's insight:

Not everyone has to implement consensus algorithm, but as users Raft is probably a nice win over Paxos considering the current implementations; if you pardon my french, Zookeeper tends to be described as a PITA.

No comment yet.
Scoop.it!

Building a Distributed System with Akka Remote Actors

Building a Distributed System with Akka Remote Actors | Distributed Architectures | Scoop.it

At AddThis, we deployed our first production system written in Scala almost two years ago. Since then, a growing stack of new applications are built using this exciting language. Among the many native Scala libraries we have tried and adopted, Akka stands out as the most indispensable

Nico's insight:

The more I read about Akka, the more I like it.

No comment yet.
Scoop.it!

Scaling Dropbox

Scaling Dropbox | Distributed Architectures | Scoop.it
Being clever about system architecture in advance is hard. Scaling successfully is more about being clever with metrics and introspection, creating efficient build and provisioning processes and being comfortable with radical change.
Nico's insight:

Non technical article on how to plan your scalling. I think at Scoop.it we are on the same track: keep it simple and pragmatic.

No comment yet.
Scoop.it!

Cluster Specification — Akka Documentation

Cluster Specification — Akka Documentation | Distributed Architectures | Scoop.it

Akka Cluster provides a fault-tolerant, elastic, decentralized peer-to-peer cluster with no single point of failure (SPOF) or single point of bottleneck (SPOB). It implements a Dynamo-style system using gossip protocols, automatic failure detection, automatic partitioning [*], handoff [*], and cluster rebalancing [*]. But with some differences due to the fact that it is not just managing passive data, but actors - active, sometimes stateful, components that also have requirements on message ordering, the number of active instances in the cluster, etc.


[*] Not Implemented Yet

  • Actor partitioning
  • Actor handoff
  • Actor rebalancing
  • Stateful actor replication
Nico's insight:

This looks awesome. And very promising since the "not implemented yet".

No comment yet.
Scoop.it!

Hive Tuning - 2013 July 23 Toronto Hadoop User Group

Hive Deep Dive, Hive 0.11 Tuning tips, Hive 0.11 performance optimizations, and Tez
Nico's insight:

I've only skim read the slides, but there are nice hints about Hive internals related to performance. To bookmark and use when shit happens.

No comment yet.
Scoop.it!

Samza - Proposal at the ASF Incubator

Samza - Proposal at the ASF Incubator | Distributed Architectures | Scoop.it

Samza is a stream processing system for running continuous computation on infinite streams of data.


Proposal


Samza provides a system for processing stream data from publish-subscribe systems such as Apache Kafka. The developer writes a stream processing task, and executes it as a Samza job. Samza then routes messages between stream processing tasks and the publish-subscribe systems that the messages are addressed to.


Nico's insight:

Probably yet another hadoop component which will require a deployment of many processes

No comment yet.
Scoop.it!

Lightweight transactions in Cassandra 2.0

Lightweight transactions in Cassandra 2.0 | Distributed Architectures | Scoop.it

When discussing the tradeoffs between availability and consistency, we say that a distributed system exhibits strong consistency when a reader will always see the most recently written value.


It is easy to see how we can achieve strong consistency in a master-based system, where reads and writes are routed to a single master. However, this also has the unfortunate implication that the system must be unavailable when the master fails until a new master can take over.

Fortunately, you can also achieve strong consistency in a fully distributed, masterless system like Cassandra with quorum reads and writes. Cassandra also takes the next logical step and allows the client to specify per operation if he needs that redundancy — often, eventual consistency is completely adequate.


But what if strong consistency is not enough? What if we have some operations to perform in sequence that must not be interrupted by others, i.e., we must perform them one at a time, or make sure that any that we do run concurrently will get the same results as if they really were processed. This is linearizable consistency, or in ACID terms, a serial isolation level.

Nico's insight:

Very light, but they are transactions, because they do a little more than atomic operations.

No comment yet.
Scoop.it!

Why generalists are better at scaling the web - Scalable Startups

Specialists are in demand but when it comes to scaling the web companies will need generalists to the lead the way.


Recently at Surge 2011, the annual  conference on scalability  and performance, Google’s CIO Ben Fried gave an illuminating keynote address. His main insight was that generalists are the people that will lead engineering teams in successfully scaling the web.


In a world where the badge of Specialist or Expert is prized, this was refreshing perspective from an industry bigwig. As tech professionals, or any professional for that matter, we don’t welcome the label of generalist. The word suggests a jack-of-all-trades and master of none. But the generalist is no less an expert than the specialist. Generalists can get their hands greasy with the tools to fix bugs in the machine but they are especially good at mobilizing the machine itself; with their talents of broad vision, and perspective they can direct an entire team to accomplish tasks efficiently. This ability to see big-picture can not be underestimated especially during times of crisis or pressure to meet targets. For a team to scale the web effectively, you’re going to need a good mix of both types of personalities.

Nico's insight:

I always had issues with the "expert" thing. Hands get dirty in too many technologies to be an expert in anything.

No comment yet.
Scoop.it!

What’s under the hood in Cassandra 2.0

What’s under the hood in Cassandra 2.0 | Distributed Architectures | Scoop.it

The headlining features in 2.0 are lightweight transactionsCQL enhancements, and triggers. But 2.0 also features a lot of internal optimizations and improvements!

Nico's insight:

An nice list of performance improvements. I'd still wait few versions to deploy it into production (just my guts talking, not a real hint).

No comment yet.
Scoop.it!

Atomiticity, isolation, linearizability, durability in Cassandra

Atomiticity, isolation, linearizability, durability in Cassandra | Distributed Architectures | Scoop.it

Rick Branson - @rbranson
are you sure these aren't considered phantom reads? Reading up a bit more on this.


Aphyr - @aphyr
Yes; there's only a single read in this example, so phantom reads don't apply.


Rick Branson -‏ @rbranson
a CQL row isn't a single atom, its a sparse hash containing multiple atoms. Your one read is actually multiple reads.


[...]


Kelly Sommers - @kellabyte
That pretty much makes a lot of statements on this page false, right?
http://www.datastax.com/dev/blog/row-level-isolation


Aphyr - @aphyr
That's what I've been trying to convince @spyced and @PatrickMcFadin of, heh. :)

Nico's insight:

Aphyr's work on testing very carefull distributed database is awesome. And this spawn very deep discussion on twitter, with a lot of smart people.

No comment yet.
Scoop.it!

Seven reasons why I like Spark - Strata

Seven reasons why I like Spark - Strata | Distributed Architectures | Scoop.it

A large portion of this week’s Amp Camp at UC Berkeley, is devoted to an introduction to Spark – an open source, in-memory, cluster computing framework. After playing with Spark over the last month, I’ve come to consider it a key part of my big data toolkit. Here’s why:

[...]

Nico's insight:

"The Spark codebase is small, extensible, and hackable."

"hackable" ? I should love it then !

I tried so many time to dig into Hive, and so many times I actually broke it and made it behave incoherently. I'm done trying to improve it. Too much frustration.


No comment yet.
Scoop.it!

Designing Distributed Systems With ZooKeeper

Designing Distributed Systems With ZooKeeper | Distributed Architectures | Scoop.it

Let’s face it—designing distributed systems can be tough. There’s just no one-size-fits-all tool for creating distributed services: Every distributed application has a unique set of tolerances with regard to reliability, scalability, response time, and other performance factors. At Gilt, our toolbox for supporting distributed service development includes Apache ZooKeeperRabbitMQKafka and a smattering of distributed data stores. We made these technology choices based on years of hands-on development at Gilt, decades of cumulative experience across our engineering team and (literally) endless internal debate. 

Nico's insight:

They show an interesting use of Ordasity

No comment yet.
Scoop.it!

Eventsourced

Eventsourced | Distributed Architectures | Scoop.it

The Eventsourced library adds scalable actor state persistence and at-least-once message delivery guarantees to Akka. With Eventsourced, stateful actors persist received messages by appending them to a log (journal)

  • project received messages to derive current state
  • usually hold current state in memory (memory image)
  • recover current (or past) state by replaying received messages (during normal application start or after crashes)
  • never persist current state directly (except optional state snapshots for recovery time optimization)
Nico's insight:

Interesting kind of persistence for Akka actors

No comment yet.
Scoop.it!

Riak Pipe: “UNIX pipes for Riak” | Architects Zone

Riak Pipe: “UNIX pipes for Riak” | Architects Zone | Distributed Architectures | Scoop.it

Riak Pipe is most simply described as “UNIX pipes for Riak.” In much the same way you would pipe the output of one program to another on the command line, Riak Pipe allows you to pipe the output of a function on one vnode to the input of a function on another. This talk covers the basic structure of Riak Pipe,  with an emphasis on the structures and practices used to prevent overload. An analysis of the strengths and weaknesses of the approaches chosen, and potentials for future improvement, will also be presented.  

Nico's insight:

Seems awesome, but there is the Erlang language barrier.

No comment yet.
Scoop.it!

Advanced repair techniques in Cassandra

Advanced repair techniques in Cassandra | Distributed Architectures | Scoop.it

Anti-entropy repair in Cassandra can sometimes be a pain point for those doing deletes in their cluster, since it must be run before gc_grace expires to ensure deleted data is not resurrected.Reliable hints can go a long way to alleviating this, but if you lose a node at any point, you’ll still need to repair (though it’s worth mentioning that if you only delete via TTL, and only inserted with a TTL to begin with, you can skip repair if your cluster has synchronized time, which it should for a variety of reasons.)

Nico's insight:

"Repair" is a known pain in Cassandra. Riak has a more automated one : Active Anti-Entropy

No comment yet.
Scoop.it!

Big Data Debate: Will HBase Dominate NoSQL? -- InformationWeek

Big Data Debate: Will HBase Dominate NoSQL? -- InformationWeek | Distributed Architectures | Scoop.it
HBase offers both scalability and the economy of sharing the same infrastructure as Hadoop, but will its flaws hold it back? NoSQL experts square off.


HBase is modeled after Google BigTable and is part of the world's most popular big data processing platform, Apache Hadoop. But will this pedigree guarantee HBase a dominant role in the competitive and fast-growing NoSQL database market?

Michael Hausenblas of MapR argues that Hadoop's popularity and HBase's scalability and consistency ensure success. The growing HBase community will surpass other open-source movements and will overcome a few technical wrinkles that have yet to be worked out.


Jonathan Ellis of DataStax, the support provider behind open-source Cassandra, argues that HBase flaws are too numerous and intrinsic to Hadoop's HDFS architecture to overcome. These flaws will forever limit HBase's applicability to high-velocity workloads, he says.


Read what our two NoSQL experts have to say, and then weigh in with your opinion in the comments section below.

Nico's insight:

One funny pro-HBase argument is that there's worst: MongoDB. :D


I don't like Jonathan Ellis status though: "where he sets the technical direction and leads Apache Cassandra as project chair.". I hope that's not true. At the ASF the project chair is supposed to be an administrative task, the lead is done by a PMC (Project Management Community), aka the devs, by consensus. He may have influence, but it's social, not by title or rule.

No comment yet.
Scoop.it!

Britta hired at Elasticsearch

Britta hired at Elasticsearch | Distributed Architectures | Scoop.it

I’d like to welcome Britta Weber (“brwe“) to our team (long overdue…). Britta is almost done with her PhD in computer science, developing Machine Learning algorithms to process electron microscopy image data.

Nico's insight:

Elasticsearch is going to rock at data mining. See a piece of her work: https://github.com/elasticsearch/elasticsearch/issues/3307

No comment yet.
Scoop.it!

Metadata: Spanner: Google's Globally-Distributed Database

Metadata: Spanner: Google's Globally-Distributed Database | Distributed Architectures | Scoop.it

The Spanner paper by Google (appeared in OSDI'12) is cryptic and hard to understand. When I first read it, I thought I understood the main idea, and that the benefit of TrueTime was to enable lock-free read-only transactions in Spanner. Then, I slowly realized things didn't check; it was possible to achieve lock-free read-only transactions without TrueTime as well. I did another read, and thought for some time, and had a better understanding of how TrueTime benefits Spanner, and how to improve its shortcomings. 

I will first provide a summary of the Spanner work (borrowing sentences and figures from the Spanner paper), and then talk about what TrueTime is actually good for. 

Nico's insight:

Spanner. I still don't understand it all, but we should keep reading about it, it will probably bring ideas on other distributed databases.

No comment yet.
Scoop.it!

Billing Incident Post-Mortem

Billing Incident Post-Mortem | Distributed Architectures | Scoop.it

Twilio experienced an incident with its billing system on July 18, 2013. Although we’ve shared how the incident unfolded, and the impact on our customers, we’d like to detail the root cause, how we fixed it, and what we’re doing to ensure this doesn’t happen in the future.

Nico's insight:

Interesting story about what happens when the network connectivity breaks down between the nodes of a distributed database, here Redis.

No comment yet.