Huge Data Handling
143 views | +0 today
Follow
Your new post is loading...
Your new post is loading...
Scooped by Franck Berkowicz
Scoop.it!

Gigaom Research webinar: Apache Hadoop: Is one cluster enough?

Gigaom Research webinar: Apache Hadoop: Is one cluster enough? | Huge Data Handling | Scoop.it
In this Gigaom Research webinar, the panel will discuss how the multi-cluster approach can be implemented in real systems, and whether and how it can be made to work. The panel will also talk about best practices for implementing the approach in organizations. Join Gigaom Research and our sponsor WANdisco on October 15.
more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

Discover How Telcos Do Hadoop

Discover How Telcos Do Hadoop | Huge Data Handling | Scoop.it
Discover how the world's largest telcos adopt a Modern Data Architecture based on Hortonworks Data Platform for improved service, security and sales
more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

20 short tutorials all data scientists should read (and practice)

20 short tutorials all data scientists should read (and practice) | Huge Data Handling | Scoop.it
We are now at 20, up from 17. I hope I find the time to write a one-page survival guide for UNIX, Python and Perl. Here's one for R. The links to core data sci…
more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

Introducing the Pig Cheat Sheet

Introducing the Pig Cheat Sheet | Huge Data Handling | Scoop.it
John Matson
There’s an old saying that explains why you should never wrestle with a pig. The adage, sometimes attributed to playwright George Bernard Shaw, warns that “you get dirty, and besides, the...
more...
No comment yet.
Rescooped by Franck Berkowicz from BigData Hadoop Ecosystem
Scoop.it!

How-to: Make Hadoop Accessible via LDAP

How-to: Make Hadoop Accessible via LDAP | Huge Data Handling | Scoop.it
Integrating Hue with LDAP can help make your secure Hadoop apps as widely consumed as possible.

Via Charles Gerth
more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

Adding ACID to Apache Hive - Hortonworks

Adding ACID to Apache Hive - Hortonworks | Huge Data Handling | Scoop.it
Discussion on a general approach for incorporating ACID transactions in Apache Hive for Hadoop.
more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

Apache Tez: A New Chapter in Hadoop Data Processing - Hortonworks

Apache Tez: A New Chapter in Hadoop Data Processing - Hortonworks | Huge Data Handling | Scoop.it
First in a series of posts around Apache Tez, its architecture and performance benefits for MapReduce in Hadoop.
more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

Cheat Sheet: How To Work with Hive Functions in Hadoop - Hortonworks

Cheat Sheet: How To Work with Hive Functions in Hadoop - Hortonworks | Huge Data Handling | Scoop.it
A cheat sheet for working with Hive User Defined Functions (UDFs) in Hadoop for data processing from Hortonworks and Qubole.
more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

How to Configure YARN and MapReduce 2 in Hortonworks Data Platform 2.0

How to Configure YARN and MapReduce 2 in Hortonworks Data Platform 2.0 | Huge Data Handling | Scoop.it
This post shows how to plan for and configure processing capacity in your enterprise HDP 2.0 cluster deployment. This will cover YARN and MapReduce 2.
more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

ORCFile in HDP 2: Better Compression, Better Performance - Hortonworks

ORCFile in HDP 2: Better Compression, Better Performance - Hortonworks | Huge Data Handling | Scoop.it
The upcoming Hive 0.12 is set to bring some great new advancements in the storage layer in the forms of higher compression and better query performance.
more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

Cisco UCS Common Platform Architecture (CPA) for Big Data with Cloudera [Design Zone for Data Centers]

Cisco UCS Common Platform Architecture (CPA) for Big Data with Cloudera  [Design Zone for Data Centers] | Huge Data Handling | Scoop.it
Franck Berkowicz's insight:

Some network design techniques might be gathered in this document

more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

A Few Useful Things to Know about Machine Learning

Franck Berkowicz's insight:

I never read enough of these...

more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

If you like machine learning, Scala and Hadoop, Tresata has a treat for you

If you like machine learning, Scala and Hadoop, Tresata has a treat for you | Huge Data Handling | Scoop.it
Hadoop-based analytics startup Tresata last week open sourced a set of machine learning libraries built on Scalding and designed to run in Hadoop and make use of the Apache Mahout project.
Franck Berkowicz's insight:

Open source Scala Machine Learning library

more...
No comment yet.
Rescooped by Franck Berkowicz from BigData Hadoop Ecosystem
Scoop.it!

NoSQL for Telco - Ericsson Research Blog

NoSQL for Telco - Ericsson Research Blog | Huge Data Handling | Scoop.it
Can No-SQL technologies hold for the specific requirements that apply to the Telco domain?

Via Charles Gerth
more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

HydraBase – The evolution of HBase@Facebook

HydraBase – The evolution of HBase@Facebook | Huge Data Handling | Scoop.it
When we revamped Messages in 2010 to integrate SMS, chat, email and Facebook Messages into one inbox, we built the product on open-source Apache HBase, a distributed key value data store running on top of HDFS, and extended it to meet our requirements. At the time, HBase was chosen as the underlying durable data store because it provided the high write throughput and low latency random read performance necessary for our Messages platform. In addition, it provided other important features, including horizontal scalability, strong consistency, and high availability via automatic failover. Since then, we’ve expanded the HBase footprint across Facebook, using it not only for point-read, online transaction processing workloads like Messages, but also for online analytics processing workloads where large data scans are prevalent. Today, in addition to Messages, HBase is used in production by other Facebook services, including our internal monitoring system, the recently launched Nearby Friends feature, search indexing, streaming data analysis, and data scraping for our internal data warehouses..
Franck Berkowicz's insight:

Inter-Datacenter HBase HA from Facebook

more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

Impala and ANSI-92 SQL on Hadoop

Impala and ANSI-92 SQL on Hadoop | Huge Data Handling | Scoop.it
The origins of Impala can be found in F1 - The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business. One of many differences between MapReduce and Impala is in Impala the intermediate d...
more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

The Apache Software Foundation Announces Apache™ Hadoop™ 2 : The Apache Software Foundation Blog

Franck Berkowicz's insight:

The official birth act we've been waiting for

more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

Next Generation Hadoop: It's Not Just Batch! | Javalobby

Next Generation Hadoop: It's Not Just Batch! | Javalobby | Huge Data Handling | Scoop.it
In my JavaOne talk last week,
I presented changes that are happening in Hadoop: It’s shaking
off it’s batch-based shackles and enabling a new Hadoop...
more...
No comment yet.
Rescooped by Franck Berkowicz from Code: Big Data
Scoop.it!

Hadoop Falcon and Data Lifecycle Management

Hadoop Falcon and Data Lifecycle Management | Huge Data Handling | Scoop.it

Apache Falcon is an open source data processing and management solution for the Hadoop ecosystem. It simplifies the management of data by enabling users to define infrastructure endpoints (e.g., clusters, HBase, databases, HCatalog), logical tables/feed/datasets ­(e.g., location, permissions, source, retention limits, replication targets) and processing rules (e.g., inputs, outputs, schedule, business logic) as configurations. 


Via Jose Menes
more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

Apache Hadoop YARN, NameNode HA, HDFS Federation

Introduction to new features available in new versions of Apache Hadoop: YARN, NameNode HA, HDFS Federation.
more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

Hadoop 2.0 and YARN

In this talk, Abhijit Lele from Hortonworks, discusses YARN architecture and how to get started developing for the next generation of Hadoop.
more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

In-Stream Big Data Processing

In-Stream Big Data Processing | Huge Data Handling | Scoop.it

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. It became clear that real-time query processing and in-stream processing is the immediate need in many practical applications. In recent years, this idea got a lot of traction and a whole bunch of solutions like Twitter’s Storm, Yahoo’s S4, Cloudera’s Impala, Apache Spark, and Apache Tez appeared and joined the army of Big Data and NoSQL systems. This article is an effort to explore techniques used by developers of in-stream data processing systems, trace the connections of these techniques to massive batch processing and OLTP/OLAP databases, and discuss how one unified query engine can support in-stream, batch, and OLAP processing at the same time.

more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

Building LinkedIn University Pages | LinkedIn Engineering

We're seeking intelligent problem solvers who are inspired and motivated to change the world.
Franck Berkowicz's insight:

Might be usefull soon.

more...
No comment yet.
Scooped by Franck Berkowicz
Scoop.it!

Announcing Parquet 1.0: Columnar Storage for Hadoop | Twitter Blogs

Announcing Parquet 1.0: Columnar Storage for Hadoop | Twitter Blogs | Huge Data Handling | Scoop.it
In March we announced the Parquet project, the result of a collaboration between Twitter and Cloudera intended to create an open-source columnar storage format library for Apache Hadoop. We’r......
Franck Berkowicz's insight:

Impressive columnar storage format for Hadoop. I've spent some time testing a pre-1.0 version combined with Impala 1.0. Really fast and impressive.

 

more...
No comment yet.