Big Data Platform
487 views | +0 today
Follow
Big Data Platform
Collection/Analysis/Service of the Big Data
Curated by Dylan.Yeo
Your new post is loading...
Your new post is loading...
Scooped by Dylan.Yeo
Scoop.it!

How Scaling Really Works in Apache HBase

How Scaling Really Works in Apache HBase | Big Data Platform | Scoop.it

At first glance, the Apache HBase architecture appears to follow a master/slave model where the master receives all the requests but the real work is done by the slaves. This is not actually the case, and in this article I will describe what tasks are in fact handled by the master and the slaves.

 

more...
No comment yet.
Scooped by Dylan.Yeo
Scoop.it!

How UpStream uses R for Attribution Analysis

How UpStream uses R for Attribution Analysis | Big Data Platform | Scoop.it

Major retailers like Williams Sonoma use UpStream Software for marketing analytics, including revenue attribution, targeting, and optimization. In the video below Tess Nesbitt (senior statistician at UpStream) describes how she uses Revolution R Enterprise and Hadoop to figure out the impact on various marketing channels (for example direct mail, email offers, and catalogs) on consumer retail sales.

more...
No comment yet.
Scooped by Dylan.Yeo
Scoop.it!

Free Datascience books

I've been impressed in recent months by the number and quality of free datascience/machine learning books available online. I don't mean free as in some guy paid for a PDF version of an O'Reilly book and then posted it online for others to use/steal, but I mean genuine published books with a free online version sanctioned by the publisher. That is, "the publisher has graciously agreed to allow a full, free version of my book to be available on this site."

more...
No comment yet.
Scooped by Dylan.Yeo
Scoop.it!

Announcing Evan's Awesome A/B Tools

Announcing Evan's Awesome A/B Tools | Big Data Platform | Scoop.it

Today I am happy to announce a new suite of online statistics calculators, which I am hereby christening Evan's Awesome A/B Tools. I am calling these tools awesome because they are intuitive, visual, and easy-to-use. Unlike other online statistical calculators you've probably seen, they'll help you understand what's going on "under the hood" of common statistical tests, and by providing ample visual context, they make it easy for you to explain p-values and confidence intervals to your boss. (And they're free!)

more...
No comment yet.
Scooped by Dylan.Yeo
Scoop.it!

Apache HBase Internals: Locking and Multiversion Concurrency Control : Apache HBase

Apache HBase Internals: Locking and Multiversion Concurrency Control : Apache HBase | Big Data Platform | Scoop.it

Apache HBase provides a consistent and understandable data model to the user while still offering high performance.  In this blog, we’ll first discuss the guarantees of the HBase data model and how they differ from those of a traditional database.  Next, we’ll motivate the need for concurrency control by studying concurrent writes and then introduce a simple concurrency control solution.  Finally, we’ll study read/write concurrency control and discuss an efficient solution called Multiversion Concurrency Control.

more...
No comment yet.
Scooped by Dylan.Yeo
Scoop.it!

Rest.li: RESTful Service Architecture at Scale | LinkedIn Engineering

Rest.li: RESTful Service Architecture at Scale | LinkedIn Engineering | Big Data Platform | Scoop.it

Today we are announcing the open-sourcing of Rest.li, a piece of infrastructure developed and used here at LinkedIn. Rest.li is a REST+JSON framework for building robust, scalable service architectures using dynamic discovery and simple asynchronous APIs. We feel that Rest.li fills a niche for building RESTful service architectures at scale, offering a developer workflow for defining data and REST APIs that promotes uniform interfaces, consistent data modeling, type-safety, and compatibility-checked API evolution.

more...
No comment yet.
Scooped by Dylan.Yeo
Scoop.it!

The Science of What We Do (and Don't) Know About Data Visualization

The Science of What We Do (and Don't) Know About Data Visualization | Big Data Platform | Scoop.it

Visualization is easy, right? After all, it's just some colorful shapes and a few text labels. But things are more complex than they seem, largely due to the the ways we see and digest charts, graphs, and other data-driven images. While scientifically-backed studies do exist, there are actually many things we don't know about how and why visualization works. To help you make better decisions when visualizing your data, here's a brief tour of the research.

more...
luiy's curator insight, April 29, 2013 5:18 AM

We only scratched the surface on this, there are many other metaphors that are used in visualization, whether obvious or not. Barbara Tversky and Jeff Zacks found in the early 2000s that lines imply transitions whereas bars imply individual values. The seemingly simple choice between a bar and a line chart has implications on how we perceive the data.

 

Bizarrely, so does gravity. In our work on metaphors, Ziemkiewicz and I found that people interpreted round shapes as unstable because, they said, they might roll away. But to roll, there must be a force that causes the movement. After studying this effect some more, we found that the points in a scatterplot attract each other, and that they are seemingly pulled down by gravity. We remember points not where they are in the plot, but shift them towards clusters in our memory, and let them drift slightly downwards.

 

Findings and distinctions in visualization can be subtle, but they can have a profound impact on how well we can read the information and how we interpret it. There is much more to be learned about how visualization works and how best we can represent, analyze, and communicate data.

Scooped by Dylan.Yeo
Scoop.it!

With Big Data, Context is a Big Issue | Wired.com

With Big Data, Context is a Big Issue | Wired.com | Big Data Platform | Scoop.it

The Age of Context demands that contextual data be applied to everyday situations in useful ways. How do we make use of this data? Since we’ve gotten good at collecting data, now it’s all about putting it into context and making sense out of it – mining for the nuggets of insights that answer the “So What?” question. Data is meaningless and even cumbersome without context – the key holistic and interpretive lens through which data is filtered and turned into real information.

more...
MTD's curator insight, April 30, 2013 11:48 AM

Data is nothing (in a sense) without context: the where, when, how - and why - of choice and action is critical. 

Scooped by Dylan.Yeo
Scoop.it!

GraphLab picks up $6.75m from Madrona and NEA to bolster its ‘Hadoop for graphs’

GraphLab picks up $6.75m from Madrona and NEA to bolster its ‘Hadoop for graphs’ | Big Data Platform | Scoop.it

GraphLab-the-company wants to capitalize on the success of GraphLab-the-open-source-project by building a commercial product for applying advanced machine-learning to massive graph datasets, referring to its platform as a “Hadoop but for graphs” on a high level. The company promises to continue actively supporting the open-source project.

more...
No comment yet.
Scooped by Dylan.Yeo
Scoop.it!

How-To: Run a MapReduce Job in CDH4 using Advanced Features

In this post, we’ll delve deeper into MapReduce programming and cover some of the framework’s more advanced features. In particular, we’ll explore:

Combiner functions, a feature that allows you to aggregate map outputs before they are passed to the reducer, possibly greatly reducing the amount of data written to disk and sent over the network for certain types of jobsCounters, a way to track how often user-defined events occur across an entire job – for example, count the number of bad records your MapReduce job encounters in all your data and feed it back to you, without any complex instrumentation on your partCustom Writables, go beyond the basic data types that Hadoop provides as keys and values for your mappers and reducersMRUnit, a framework that facilitates unit testing of MapReduce programs

more...
No comment yet.
Scooped by Dylan.Yeo
Scoop.it!

Demo Code: Avro Kafka Storm Integration

This is a demo code that shows how to use a Kafka messaging where the message format is Avro. It uses Storm topology as the consumer of Kafka using KafkaSpout. The user story of this test is about event streaming from the Internet that we want to process in Storm.

more...
No comment yet.