BigStuff
Follow
Find
992 views | +8 today
 
Rescooped by Erwin J. van Eijk from Bigdata Analytics Platform
onto BigStuff
Scoop.it!

storm at twitter

Talk given at facebook's analytics@webscale conference. Covers storm basics, system overview, architecture at twitter and current use-cases. Featuring Hadoop


Via Sylvain Kalache, Taehui Hong
Erwin J. van Eijk's insight:

I like storm.

more...
No comment yet.
Your new post is loading...
Your new post is loading...
Rescooped by Erwin J. van Eijk from BigData Hadoop Ecosystem
Scoop.it!

How-to: Translate from MapReduce to Apache Spark

The key to getting the most out of Spark is to understand the differences between its RDD API and the original Mapper and Reducer API.
Venerable MapReduce has been Apache Hadoop‘s work-horse computation paradigm since its inception.

Via Charles Gerth
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Spark 1.1.0 released | Apache Spark

Spark 1.1.0 released | Apache Spark | BigStuff | Scoop.it
Apache Spark 1.1 Released w contributions from 171 developers! http://t.co/vif7h0Uh13
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Announcing Apache Pig 0.13.0

Announcing Apache Pig 0.13.0 | BigStuff | Scoop.it
The Apache Pig community released Pig 0.13. earlier this month. Pig uses a simple scripting language to perform complex transformations on data stored in Apache Hadoop.
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Securing Hadoop: What Are Your Options?

Securing Hadoop: What Are Your Options? | BigStuff | Scoop.it
The open source community, including Hortonworks, has invested heavily in building enterprise grade security for Apache Hadoop.
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Using Ambari to Manage YARN Enabled Applications

Using Ambari to Manage YARN Enabled Applications | BigStuff | Scoop.it
Apache Ambari is an open operational framework to provision, manage and monitor Hadoop clusters.
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Hadoop Map Reduce Vs. Apache Spark & Scala

Watch the sample class recording: http://www.edureka.co/apache-spark-scala-training?utm_source=youtube&utm_medium=referral&utm_campaign=mapreduce-vs-spark Ap...
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Apache Spark is Hadoop's speedy Swiss Army knife:

Apache Spark is Hadoop's speedy Swiss Army knife: | BigStuff | Scoop.it
Introduction to Spark and use cases : https://www.youtube.com/watch?v=o8Jy7ii4Uks (@ApacheSpark is Hadoop's speedy Swiss Army knife: http://t.co/72vCleRaZt)...
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Apache Hive on Apache Spark: Motivations and Design Principles

Apache Hive on Apache Spark: Motivations and Design Principles http://t.co/HJ5sR2ibCk via @sharethis
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

How-to: Install a Virtual Apache Hadoop Cluster with Vagrant and ...

How-to: Install a Virtual Apache Hadoop Cluster with Vagrant and ... | BigStuff | Scoop.it
There are some tutorials, and repositories available for installing a local virtualized cluster, but none of them did what I wanted to do: install the bare cluster using Vagrant, and install the Hadoop stack using the Cloudera ...
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Making Apache Spark YARN Ready

Making Apache Spark YARN Ready | BigStuff | Scoop.it
Spark on YARN
Hadoop 2 and its YARN-based architecture has ushered in a new wave of innovation in and around Hadoop. One technology benefitting from this maturation is Apache Spark.
Erwin J. van Eijk's insight:

I'm getting to be a reasonable fan of spark. Combined with yarn it's pretty good.

more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Discover HDP 2.1: Apache Storm for Stream Data Processing in Hadoop

Discover HDP 2.1: Apache Storm for Stream Data Processing in Hadoop | BigStuff | Scoop.it
We recently hosted the sixth of our seven Discover HDP 2.1 webinars, entitled Apache Storm for Stream Data Processing in Hadoop. Over 200 people attended the webinar and joined in the conversation.
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Big data startup Databricks is now certifying applications for Spark ...

Big data startup Databricks is now certifying applications for Spark ... | BigStuff | Scoop.it
Databricks, the company behind the commercialization of the Apache Spark data-processing framework, is certifying third-party software to run on the platform. Spark is gaining popularity as a faster, easier alternative to ...
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Palantir: big data needs to get even more abstract(ions) — Tech ...

Palantir: big data needs to get even more abstract(ions) — Tech ... | BigStuff | Scoop.it
Computers became way more productive when operating systems allowed computers to run different programs without having to be re-jiggered. Palantir thinks the same thing is about to happen to data.
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Spark on Mesos - YouTube

From #Mesoscon 2014, Paco Nathan from Databricks talks about handling big bata in a distributed, multi-tenant environment with Spark on Mesos.
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

The 10 worst big data practices | JavaWorld

The 10 worst big data practices | JavaWorld | BigStuff | Scoop.it
... Cassandra, the NoSQL maverick · Big data showdown: Cassandra vs. ... measured in single-digit kilobytes. Hadoop does best on large sets of relatively flat data. I'm sure you can create an extract that's more denormalized.
Erwin J. van Eijk's insight:

10 out of 10 right.

more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Seminar Series: Hadoop and Modern Data Architecture

Seminar Series: Hadoop and Modern Data Architecture | BigStuff | Scoop.it
A transformation is occurring in the data center.  Enterprises are turning to a modern data architecture in order to derive maximum value from both big and small data across their organization.  They are building new analytic apps that unlock...
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Announcing HDP Tech Preview Component: Apache Kafka

Announcing HDP Tech Preview Component: Apache Kafka | BigStuff | Scoop.it
We are excited to announce that Apache Kafka 0.8.1.1 is now available as a technical preview with Hortonworks Data Platform 2.1. Kafka was originally developed at LinkedIn and incubated as an Apache project in 2011.
Erwin J. van Eijk's insight:

Kafka rocks.

more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

An Exciting Year for Spark - insideBIGDATA

An Exciting Year for Spark - insideBIGDATA | BigStuff | Scoop.it
An Exciting Year for Spark
insideBIGDATA
Apache Spark has had an amazing year, and the people behind the open source large-scale data processing engine have pulled some data to show just how fast it has grown in the last 12 months.
more...
No comment yet.
Rescooped by Erwin J. van Eijk from Scalastic
Scoop.it!

Quick Start - Spark 1.0.2 Documentation

Quick Start - Spark 1.0.2 Documentation | BigStuff | Scoop.it
Item 3 on the TheASF Apache Spark quick-start throws an exception. Yay usable software (Win8/JDK8) DidYouEverTest http://t.co/LRRyK6ycRN
more...
Erwin J. van Eijk's curator insight, September 7, 8:50 AM

I like spark, can't help it.

Rescooped by Erwin J. van Eijk from Scalastic
Scoop.it!

Evaluating Apache Spark and Twitter Scalding

Evaluating Apache Spark and Twitter Scalding | BigStuff | Scoop.it
In general, there are two classes of frameworks to consider when building a machine learning system, in-memory and disk-based frameworks.
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Apache Spark and Cassandra | Planet Cassandra

Apache Spark and Cassandra | Planet Cassandra | BigStuff | Scoop.it
What is Apache Spark? The standard description of Apache Spark is that it’s ‘an open source data analytics cluster computing framework’. Another way to define Spark is as a VERY fast in-memory, data-processing framework – like lightening fast.
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

USENIX researchers get a grip on Hadoop performance | IT News

USENIX researchers get a grip on Hadoop performance | IT News | BigStuff | Scoop.it
Now that big data technologies like Apache Hadoop are moving into the enterprise, system engineers must start building models that can estimate how much work these distributed data processing systems can do and how ...
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Project Rhino Goal: At-Rest Encryption for Apache Hadoop ...

Although network encryption has been provided in the Apache Hadoop platform for some time (since Hadoop 2.02-alpha/CDH 4.1), at-rest encryption, the encryption of data stored on persistent storage such as disk, is not.
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Now Available: HDP Advanced Security (via XA Secure)

Now Available: HDP Advanced Security (via XA Secure) | BigStuff | Scoop.it
Two months ago, we announced the acquisition of XA Secure. and at that time we stated that the software would be generally available by the end of June.
more...
No comment yet.
Scooped by Erwin J. van Eijk
Scoop.it!

Congratulations to Leslie Lamport on winning the 2013 Turing Award

Congratulations to Leslie Lamport on winning the 2013 Turing Award | BigStuff | Scoop.it
Hortonworks would like to congratulate Leslie Lamport on winning the 2013 Turing Award given by the Association of Computing Machinery. This award is essentially the equivalent of the Nobel Prize for computer science.
more...
No comment yet.