Across the planet, new technologies and business models are decentralizing power and placing it in the hands of communities and individuals. "We are seeing technology-driven networks replacing bureacratically-driven hierarchies," says VC and futurist Fred Wilson, speaking on what to expect in the next ten years. View the entire 25-minute video below (it's worth it!) and then check out the 21 innovations below.
PlainElastic.Net as Elastic search client. PlainElastic is a very simple lightweight library for Elasticsearch. It uses plain json for indexing and querying, this gives me more freedom and tooling to create json from user inputs for indexing and query.
Beanstalkd is a job queue rather than a message queue, so when you put things on the queue (or "tube" as they seem to be called in Beanstalkd), they stay there until a worker comes along and processes each one successfully. It will retry if the worker doesn't indicate it completed successfully, and the worker can also "bury" the job - i.e. mark it as failed. If there are more jobs than can be handled, the queue will just build up a bit of a backlog while it works through it all, nothing will be lost or missed. Also beanstalk can run with a binlog, so if the server goes down or beanstalk crashes, it can pick up again with the queue intact.
ZooKeeper Resilience at Pinterest. Apache ZooKeeper is an open source distributed coordination service that’s popular for use cases like service discovery, dynamic configuration management and distributed locking. While it’s versatile and useful, it has failure modes that can be hard to prepare for and recover from, and if used for site critical functionality, can have a significant impact on site availability.
High dimensional biological data shares many qualities with other forms of data. Typically it is wide (samples << variables), complicated by experiential design and made up of complex relationships driven by both biological and analytical sources of variance. Luckily the powerful combination of R, Cytoscape (< v3) and the R package RCytoscape can be used […]
The number of SQL options for Hadoop expanded substantially over the last 18 months. Most get a large amount of attention when announced, but a few slip under the radar. One of these low-flying options is Apache Tajo. I learned about Tajo in November of 2013 at a Hadoop User Group meeting.
The new Intel Xeon processor E7 v2 product family is designed to make data more valuable for your business through in-memory computing – one of the more recent advances in data management and analytic solutions, which stores the entire data set in main memory rather than traditional hard disk storage. In-memory database and analytics solutions enable significant performance gains in analyzing complex and diverse datasets. We’re talking about analysis in seconds or minutes rather than hours or days. This is how you get to real-time insight.
Data science is a broad church. I am a data scientist — or so I’ve been told — but what I do is actually quite different from what other “data scientists” do. For example, there are those practicing “investigative analytics” and those implementing “operational analytics.” (I’m in the second camp.)
This post targets other VoltDB developers who are going to be dealing with the various unconventional ways Volt now uses native memory in the Java portions of the database. It will also be of interest to other Java developers looking to step outside what is typically considered Java's comfort zone for interacting with native memory.
To solve a planning or optimization problem, some solvers tend to scale out poorly: As the problem has more variables and more constraints, they use a lot more RAM memory and CPU power. They can hit hardware memory limits at a few thousand variables and few million constraint matches. One way their users typically work around such hardware limits, is to use MapReduce. Let’s see what happens if we MapReduce a planning problem, such as the Traveling Salesman Problem.
Apache Spark, an in-memory data-processing framework, is now a top-level Apache project. That’s an important step for Spark’s stability as it increasingly replaces MapReduce in next-generation big data applications.
This article provides an overview of tools and libraries available for embedded data analytics and statistics, both stand-alone software packages and programming languages with statistical capabilities. The authors also discuss how to combine and integrate these embedded analytics technologies to handle big data.