EEDSP
20.4K views | +1 today
Follow
EEDSP
Digital Signal Processing, Data Analytics, Big Data, HPC, Deep Learning, GPGPU, Distributed and Parallel Computing
Curated by Shiwon Cho
Your new post is loading...
Your new post is loading...
Scooped by Shiwon Cho
Scoop.it!

Big Data with Golang Instead of MapReduce

This is one of those software engineering ideas that I would normally warn you about. So many people use MapReduce that it seems foolhardy to use something else. But in this case, it turned out well. The project was a success, and we were able to accomplish our goals more quickly and with fewer resources than it would have taken with a MapReduce cluster. Background I work on 3d Warehouse for Trimble SketchUp (formerly Google). One of our focuses over the last year has been analytics - both for our customers and for our own internal use. Most business intelligence providers are expensive - anywhere from 100-500K per year. Even with that price point, it's still cheaper than engineering time, so that was originally the path we took.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

分散システム処理モデルに関する動向について(MapReduceからBorgまで)

分散システム処理モデルに関する動向について(MapReduceからBorgまで) | EEDSP | Scoop.it
今回は、Googleから公開されたBorgなる論文を大規模分散システムの処理モデル的な観点から考察してみたいと思います。端的に言えば、Borgも含めた最近のクラウド環境の分散システムには重要なパラダイムシフト的な潮流があります。
大規模分散システムの処理モデル的な観点で、最初に近年のクラウド環境の分散システム動向を整理しつつ、最後にBorgから直近の分散システムの潮流を考察してみます。
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

How-to: Translate from MapReduce to Apache Spark

Venerable MapReduce has been Apache Hadoop‘s work-horse computation paradigm since its inception. It is ideal for the kinds of work for which Hadoop was originally designed: large-scale log processing, and batch-oriented ETL (extract-transform-load) operations.

As Hadoop’s usage has broadened, it has become clear that MapReduce is not the best framework for all computations. Hadoop has made room for alternative architectures by extracting resource management into its own first-class component, YARN. And so, projects like Impala have been able to use new, specialized non-MapReduce architectures to add interactive SQL capability to the platform, for example.

Today, Apache Spark is another such alternative, and is said by many to succeed MapReduce as Hadoop’s general-purpose computation paradigm. But if MapReduce has been so useful, how can it suddenly be replaced? After all, there is still plenty of ETL-like work to be done on Hadoop, even if the platform now has other real-time capabilities as well.


more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

OptaPlanner - Can MapReduce solve planning problems?

OptaPlanner - Can MapReduce solve planning problems? | EEDSP | Scoop.it

To solve a planning or optimization problem, some solvers tend to scale out poorly: As the problem has more variables and more constraints, they use a lot more RAM memory and CPU power. They can hit hardware memory limits at a few thousand variables and few million constraint matches. One way their users typically work around such hardware limits, is to use MapReduce. Let’s see what happens if we MapReduce a planning problem, such as the Traveling Salesman Problem.

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

MapReduce Patterns, Algorithms, and Use Cases

MapReduce Patterns, Algorithms, and Use Cases | EEDSP | Scoop.it
In this article I digested a number of MapReduce patterns and algorithms to give a systematic view of the different techniques that can be found on the web or scientific articles. Several practical...
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Big Data Analytics Beyond Hadoop | Javalobby

Google’s seminal paper on Map-Reduce [1] was the trigger that led to lot of developments in the big data space. Though the Map-Reduce paradigm was known in functional programming literature, the paper provided scalable implementations of the paradigm on a cluster of nodes. 

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

An example of MapReduce with rmr2

R can be connected with Hadoop through the rmr2 package. The core of this package is mapreduce() function that allows to write some custom MapReduce algorithms. The aim of this article is to show h...
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Making Hadoop MapReduce Work with a Redis Cluster | Datastream

Making Hadoop MapReduce Work with a Redis Cluster | Datastream | EEDSP | Scoop.it

Redis is a very cool open-source key-value store that can add instant value to your Hadoop installation. Since keys can contain strings, hashes, lists, sets and sorted sets, Redis can be used as a front end to serve data out of Hadoop, caching your ‘hot’ pieces of data in-memory for fast access when they are needed again. By using a Java client called Jedis, you can ingest and retrieve data with Redis. Combining this simple client with the power of MapReduce will let you write and read data to and from Redis in parallel.

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Sharing the knowledge : BIG DATA : HADOOP MAP REDUCE ALGORITHM

MapReduce consists of two methods. One is our map function and the other one is reduce function. Lets look into the map method. Map method accepts input in the form of key-value pairs. Lets see how it really works, the input to the map function will be in the below format

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Data deduplication tactics with HDFS and MapReduce - hadoopsphere.com

Data deduplication tactics with HDFS and MapReduce - hadoopsphere.com | EEDSP | Scoop.it
hadoopsphere.com: Data deduplication tactics with HDFS and MapReduce
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Configuring and tuning MapReduce's shuffle - Ramblings of a distributed computing programmer

Configuring and tuning MapReduce's shuffle - Ramblings of a distributed computing programmer | EEDSP | Scoop.it

Once you have outgrown your small Hadoop cluster it’s worth tuning some of the shuffle configurables to ensure that your performance keeps up with the physical growth of your cluster. The figure below shows key configurables in the shuffle stage, and identifies those that should be tuned.

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

ScalaTest a MapReduce using Akka

ScalaTest a MapReduce using Akka | EEDSP | Scoop.it
For people in hurry here is the MapReduce with ScalaTest and Akka code and steps I was trying to learn Scala and I wanted to kill several birds in one shot. Let me tell you, I am not disappointed, ...
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Tetris: Multi-Resource Packing for Cluster Schedulers

Tetris: Multi-Resource Packing for Cluster Schedulers | EEDSP | Scoop.it
Tetris is a cluster scheduler that packs, i.e., matches multi-resource task requirements with resource availabilities of machines.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

MapReduce for C: Run Native Code in Hadoop

MapReduce for C: Run Native Code in Hadoop | EEDSP | Scoop.it

MapReduce for C (MR4C), an open source framework that allows you to run native code in Hadoop.

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

As MapReduce fades, Apache Spark is now a top-level project

As MapReduce fades, Apache Spark is now a top-level project | EEDSP | Scoop.it
Apache Spark, an in-memory data-processing framework, is now a top-level Apache project. That’s an important step for Spark’s stability as it increasingly replaces MapReduce in next-generation big data applications.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Databricks raises $14M from Andreessen Horowitz, wants to take on MapReduce with Spark

Databricks raises $14M from Andreessen Horowitz, wants to take on MapReduce with Spark | EEDSP | Scoop.it
A team of professors behind the open source Spark and Shark in-memory big data projects has raised $13.9 million to commercialize the products via a company called Databricks.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

MapReduce C++ Library | Craig Henderson

MapReduce C++ Library | Craig Henderson | EEDSP | Scoop.it

The MapReduce C++ Library implements a single-machine platform for programming using the the Google MapReduce idiom.

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Big Data at Torbit: Custom MapReduce-like System

Big Data at Torbit: Custom MapReduce-like System | EEDSP | Scoop.it
Tylor Arndt about Torbit’s “build-your-own-MapReduce”: The final system begins with a web-service against which client systems interface. To ensure resiliency, an instance of the web- service runs on each cluster host.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

The Family of MapReduce and Large Scale Data Processing Systems

A paper from an Australian team providing a comprehensive survey of the family of MapReduce large scale data processing mechanisms and techniques to improve the performance and capabilities of...
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

The Architecture of a Credit Card Analysis Platform: Using Project Voldemort, Elastic MapReduce, Pangool

The Architecture of a Credit Card Analysis Platform: Using Project Voldemort, Elastic MapReduce, Pangool | EEDSP | Scoop.it
Ivan de Prado and Pere Ferrera on HighScalability.com: The solution we developed has an infrastructure cost of just a few thousands of dollars per month thanks to the use of the cloud (AWS), Hadoop and Voldemort.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Why The Time is Right for MapReduce Design Patterns

Why The Time is Right for MapReduce Design Patterns | EEDSP | Scoop.it

What is a MapReduce design pattern? Well, it’s all of the things above, but in the context of MapReduce. It is a rather constraining framework where you have to place your solutions in the terms of “map” and “reduce”.,In return, you get the benefits of abstracted parallelism and fault tolerance. The paradigm may be limiting, but it is far easier to work with — the list of different ways to solve problems is relatively short in comparison to object-oriented patterns.

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Building a Naive Bayes Classifier in the Browser Using Map-Reduce | Architects Zone

The last decade of Javascript performance improvements in the browser provide exciting possibilities for distributed computing....
more...
No comment yet.