Distributed Architectures, Page 3

Your new post is loading...

Scoop.it!

From people.apache.org - April 26, 2014 12:18 PM

Nico's insight:

Even if it's quite basic, this seems a lot more fun than manipulating data with Hive

No comment yet.

Scoop.it!

From blog.twitter.com - April 2, 2014 11:31 AM

No comment yet.

Scoop.it!

From gigaom.com - March 28, 2014 8:03 AM

Nico's insight:

Mahout on Spark, that's good news!

No comment yet.

Scoop.it!

From www.appneta.com - February 22, 2014 11:02 AM

No comment yet.

Scoop.it!

From www.found.no - February 14, 2014 4:35 PM

Nico's insight:

The first article I read which is describing quite accurately the distributed characteristics of elasticsearch.

Another good article of them: https://www.found.no/foundation/elasticsearch-as-nosql/

No comment yet.

Scoop.it!

From helix.apache.org - February 7, 2014 6:11 PM

Nico's insight:

Based on Zookeeper, it seems to implement the usual schemes of clusters. There is a nice list of recipes:

And it seems nicely build: there no master xml stuff to write, it's just plain java; see for instance the state machine config.

This project seems young though.

No comment yet.

Scoop.it!

From www.elasticsearch.org - January 29, 2014 4:51 AM

Nico's insight:

Marvel is the first actual monitoring tool of Elasticsearch, since it is not a client only plugin; data is collected even if you're not connected, like every actual monitoring tool.

No comment yet.

Scoop.it!

From www.addthis.com - January 26, 2014 6:26 AM

Nico's insight:

An interesting distributed job manager, which is supporting sharding, with an interesting hierarchical data abstraction. It is Java based and involve Zookeeper and rabbitmq.

No comment yet.

Scoop.it!

From compositecode.com - December 25, 2013 11:15 AM

Nico's insight:

I don't agree with this article. For me the client is part of the cluster since it is from there where all of consistency, availibility and partition tolerence actually matter.

No comment yet.

Scoop.it!

From www.edwardcapriolo.com - November 14, 2013 2:25 PM

Nico's insight:

After a very quick tour, I was not keen to look into it further. This article confirms my fear about this white elephant.

No comment yet.

Scoop.it!

From prestodb.io - November 6, 2013 2:39 PM

Nico's insight:

Yet another Hive-compatible in memory distributed query engine, this time by Facebook

No comment yet.

Scoop.it!

From gigaom.com - October 29, 2013 5:19 AM

Nico's insight:

Spark really seems the future of data processing

No comment yet.

Scoop.it!

From raftconsensus.github.io - October 24, 2013 11:51 AM

Nico's insight:

A good centralized place for everything about Raft

No comment yet.

Scoop.it!

From databricks.com - April 14, 2014 6:20 PM

Nico's insight:

Yet another reason to get some time to test Spark.

No comment yet.

Scoop.it!

From thesecretlivesofdata.com - March 31, 2014 4:58 AM

No comment yet.

Scoop.it!

From www.drdobbs.com - February 23, 2014 1:32 PM

Nico's insight:

I love the first paragraph about "Application vs. Library":

"The lesson learned is that when starting a new project, you should opt for the library design if at all possible. It's pretty easy to create an application from a library by invoking it from a trivial program; however, it's almost impossible to create a library from an existing executable. A library offers much more flexibility to the users, at the same time sparing them non-trivial administrative effort."

big big big +1. My life of integrator would a lot more easier.

No comment yet.

Scoop.it!

From github.com - February 16, 2014 7:40 AM

Nico's insight:

I found it weird that a distributed system doesn"t implement that simple feature which would avoid the worst state of a distributed cluster : a split brain.

At Sccop.it we still use it, but we had to add additional checks so we don't use it when not enough nodes are up.

No comment yet.

Scoop.it!

From www.elasticsearch.org - February 12, 2014 12:16 PM

Nico's insight:

I cannot wait to finally be able to do snapshot/restore of indices.

Great job guys !

No comment yet.

Scoop.it!

From basho.com - January 26, 2014 7:00 PM

Nico's insight:

A nice list of ways to ensure data consistency, from the simplest Last Write Win, to the self-healing CRDT

No comment yet.

Scoop.it!

From blog.empathybox.com - January 26, 2014 8:18 AM

Nico's insight:

Nice reminder:

- availability doesn't only depends on the quality of the software, it also depends on the quality of the ops,

- and the probability of most of your nodes going down is not that unlikely, since machine clones will fail on the same bug

No comment yet.

Scoop.it!

From vilkeliskis.com - December 29, 2013 4:59 AM

Nico's insight:

Having HLL in Cassandra would be awesome

No comment yet.

Scoop.it!

From highlyscalable.wordpress.com - December 25, 2013 6:57 AM

Nico's insight:

Very good data processing 101.

No comment yet.

Scoop.it!

From github.com - November 2, 2013 8:54 AM

No comment yet.

Scoop.it!

From devo.ps - November 2, 2013 8:48 AM

No comment yet.

Scoop.it!

From blog.trifork.com - October 24, 2013 1:28 PM

No comment yet.

Distributed Architectures

Spark SQL Programming Guide

Manhattan, our real-time, multi-tenant distributed database for Twitter scale | Twitter Blogs

Apache Mahout, Hadoop's original machine learning project, is moving on from MapReduce

A Different View on Hadoop: Network Performance | AppNeta

Elasticsearch in Production

Apache Helix

Why We Built Marvel | Blog | Elasticsearch

Hydra is Now Open Source

The "Client Round Robin Anti-Pattern - Riak Developer Guidance

YARN... Either it is really complicated or I have brain damage : Edward Capriolo

Presto | Distributed SQL Query Engine for Big Data

Spark is a really big deal for big data, and Cloudera gets it

Raft Consensus Algorithm

Making Spark Easier to Use in Java with Java 8

Awesome visualisation of Raft distributed consensus protocol by @benbjohnson

ZeroMQ: The Design of Messaging Middleware

hazelcast : "Enhancement: add hazelcast.cluster.majority.size"

Elasticsearch 1.0.0 Released

Clocks Are Bad, Or, Welcome to the Wonderful World of Distributed Systems – Basho Technologies

Getting Real About Distributed System Reliability

Hacking Cassandra — Tadas Vilkeliskis

In-Stream Big Data Processing

snakebite - A pure python HDFS client

ZooKeeper vs. Doozer vs. Etcd

How to avoid the split-brain problem in elasticsearch