EEDSP
20.5K views | +0 today
Follow
EEDSP
Digital Signal Processing, Data Analytics, Big Data, HPC, Deep Learning, GPGPU, Distributed and Parallel Computing
Curated by Shiwon Cho
Your new post is loading...
Your new post is loading...
Scooped by Shiwon Cho
Scoop.it!

Deep Learning and GPU Acceleration in Hadoop 3.0 - Hortonworks

Deep Learning and GPU Acceleration in Hadoop 3.0 - Hortonworks | EEDSP | Scoop.it

Recently Raj Verma (President & COO of Hortonworks) spoke to Jim McHugh from Nvidia at the DataWorks Summit keynote in San Jose. Jim began by by talking about how parallel processing that is used in gaming is also essential to Deep Learning

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop

Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop | EEDSP | Scoop.it

Evaluating which streaming architectural pattern is the best match to your use case is a precondition for a successful production deployment.

 


more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

HBase ZK-less Region Assignment : Apache HBase

HBase ZK-less Region Assignment : Apache HBase | EEDSP | Scoop.it

Recently, we changed how HBase assigns regions. This architectural change is referred to as ZK-less region assignment, i.e. assigning regions without involving ZooKeeper. The change allows us to achieve greater scale as well as faster startups and assignment. It is simpler and has less code, improves the speed at which assignments run so we can do faster rolling restarts. The master is also re-architected to handle more regions. This feature will be on by default in HBase 2.0.0.


more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

R & hadoop: Install Hadoop 2.5 in ubuntu 14.04 as well as RHadoop.

This article describes the step-by-step approach to install Hadoop/YARN 2.4.0 and R

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

The Future of Apache Storm: Secure, Highly-Available, Multi-Tenant

The Future of Apache Storm: Secure, Highly-Available, Multi-Tenant | EEDSP | Scoop.it
YARN and Apache Storm: A Powerful Combination YARN changed the game for all data access engines in Apache Hadoop. As part of Hadoop 2, YARN took the resource management capabilities that were in MapReduce and packaged them for use by new engines. Now Apache Storm is one of those data-processing engines that can run alongside …
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

The Future of Apache Ambari

The Future of Apache Ambari | EEDSP | Scoop.it
It’s been a busy year for Apache Ambari. Keeping up with the rapid innovation in the open community certainly is exciting. We’ve already seen six releases this year to maintain a steady drumbeat of new features and usability guardrails. We have also seen some exciting announcements of new folks jumping into the Ambari community.
With all these releases and community activities, let’s take a break to talk about how the broader Hadoop community is affecting Ambari and how this is influencing what you will see from Ambari in the future.
Take a Look Around
To talk about the future of Ambari, we have to recognize what is happening outside of Ambari in the Hadoop community. We have to talk about Apache Hadoop YARN.
YARN is the operating system for data processing, making it possible to bring multiple workloads and processing engines to the data stored in Apache Hadoop 2.…
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

New in CDH 5.1: Hue’s Improved Search App

New in CDH 5.1: Hue’s Improved Search App | EEDSP | Scoop.it

Hue 3.6 (now packaged in CDH 5.1) has brought the second version of the Search App up to even higher standards. The user experience has been greatly improved, as the app now provides a very easy way to build custom dashboards and visualizations.

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

4 reasons why Spark could jolt Hadoop into hyperdrive

4 reasons why Spark could jolt Hadoop into hyperdrive | EEDSP | Scoop.it
Apache Spark might push MapReduce to the back burner faster than some people might like, but it will also boost the Hadoop overall ecosystem. The project’s co-creator Matei Zaharia explains why Spark is so popular now and where it fits into the big data ecosystem.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Facebook HydraBase adds reliability to Hadoop's HBase

Facebook HydraBase adds reliability to Hadoop's HBase | EEDSP | Scoop.it
Facebook revamps its HBase implementation for faster recovery from downtime and higher reliability
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

10 parameters for Big Data networks - hadoopsphere.com

10 parameters for Big Data networks - hadoopsphere.com | EEDSP | Scoop.it

Big Data and Hadoop clusters involve heavy volume of data and in many instances high velocity in bursty traffic patterns. With these clusters finding in-roads in enterprise data centers, the network designers have a few more requirements to take care. Listed below are 10 parameters to evaluate while designing a network for Big Data and Hadoop cluster.

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Intel kills a Hadoop and feeds another

I seriously doubt you could have missed the 2nd part of this, but here’s the shortest executive summary:
• Intel has killed its own distribution of Hadoop — is there anyone that would disagree this is...

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Big Data Basics: Hadoop, MapReduce, Hive, Pig, & Spark

Big Data Basics: Hadoop, MapReduce, Hive, Pig, & Spark | EEDSP | Scoop.it
Learn about some of the most popular big data analysis frameworks in commercial use - and look at some real code! - Free Course
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Deeplearning4j - Open-source, distributed deep learning for the JVM

Deeplearning4j - Open-source, distributed deep learning for the JVM | EEDSP | Scoop.it
Open-Source Deep-Learning Software for Java and Scala on Hadoop and Spark
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Tachyon Overview - Tachyon 0.6.4 Documentation

Tachyon is a memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. It achieves high performance by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, thereby avoiding going to disk to load datasets that are frequently read. This enables different jobs/queries and frameworks to access cached files at memory speed.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Heterogeneous Storage Policies in HDP 2.2 - Hortonworks

Heterogeneous Storage Policies in HDP 2.2 - Hortonworks | EEDSP | Scoop.it
The Hadoop Distributed File System (HDFS) is the reliable and scalable data storage core of the Hortonworks Data Platform (HDP). In HDP, HDFS and YARN combine to form the distributed operating system for your data platform, providing resource management for diverse workloads and scalable data storage for the next generation of analytical applications. In this …
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

How-to: Translate from MapReduce to Apache Spark

Venerable MapReduce has been Apache Hadoop‘s work-horse computation paradigm since its inception. It is ideal for the kinds of work for which Hadoop was originally designed: large-scale log processing, and batch-oriented ETL (extract-transform-load) operations.

As Hadoop’s usage has broadened, it has become clear that MapReduce is not the best framework for all computations. Hadoop has made room for alternative architectures by extracting resource management into its own first-class component, YARN. And so, projects like Impala have been able to use new, specialized non-MapReduce architectures to add interactive SQL capability to the platform, for example.

Today, Apache Spark is another such alternative, and is said by many to succeed MapReduce as Hadoop’s general-purpose computation paradigm. But if MapReduce has been so useful, how can it suddenly be replaced? After all, there is still plenty of ETL-like work to be done on Hadoop, even if the platform now has other real-time capabilities as well.


more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Hadoop YARN Installation: The definitive guide

Hadoop YARN Installation: The definitive guide | EEDSP | Scoop.it
This article guides you in the installation of the new generation Hadoop based on YARN. It is based on the most recent version of Hadoop at the time of this writing (2.2.0) and includes HDFS, YARN and MapReduce configurations for both single-node and cluster environments.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

» Hadoop Cluster – Architecture and Core Components

» Hadoop Cluster – Architecture and Core Components | EEDSP | Scoop.it
Hadoop Architecture and Core Components - This articles explain the architecture of Hadoop and The Core Components of Hadoop
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Apache Hadoop Operations at Scale

Apache Hadoop Operations at Scale | EEDSP | Scoop.it
This blog curates Hadoop Summit San Jose sessions and track aimed at Hadoop operations and management at scale.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Apache Spark Resource Management and YARN App Models

Apache Spark Resource Management and YARN App Models | EEDSP | Scoop.it

A concise look at the differences between how Spark and MapReduce manage cluster resources under YARN

The most popular Apache YARN application after MapReduce itself is Apache Spark. At Cloudera, we have worked hard to stabilize Spark-on-YARN (SPARK-1101), and CDH 5.0.0 added support for Spark on YARN clusters.

In this post, you’ll learn about the differences between the Spark and MapReduce architectures, why you should care, and how they run on the YARN cluster ResourceManager.

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Hydra Takes On Hadoop

The social-networking company AddThis open-sourced Hydra under the Apache version 2.0 License in a recent announcement. Hydra grew from an in-house platform created to process semi-structured social data as live streams and do efficient query processing on those data sets.
more...
No comment yet.