Big Data & Hadoop
53 views | +0 today
Follow
Your new post is loading...
Your new post is loading...
Scooped by Hien Luu
Scoop.it!

David Petraeus At Bilderberg to Craft “Big Data” Spy Grid

David Petraeus At Bilderberg to Craft “Big Data” Spy Grid | Big Data & Hadoop | Scoop.it
Paul Joseph Watson | Former CIA director helping to bolster same surveillance system that brought him down (David Petraeus At Bilderberg to Craft “Big Data” Spy Grid - http://t.co/6rGcat2tdT)...
more...
No comment yet.
Rescooped by Hien Luu from Big Data & Hadoop
Scoop.it!

How R Grows

How R Grows | Big Data & Hadoop | Scoop.it

Saturday morning I was drinking my coffee wondering how much effort goes into Rworldwide. (It’s my job.) I noticed that there were 4469 packages on CRAN, and it occurred to me that tabulating the packages by publication date would give some indication of how much effort is being expended to improve packags and keep them up to date. With very little work at all I was able to read the table on the Available CRAN packages by date of publication page and produce this plot.

more...
Rescooped by Hien Luu from Big Data & Hadoop
Scoop.it!

So What? – Monitoring Hadoop beyond Ganglia | Architects Zone

So What? – Monitoring Hadoop beyond Ganglia | Architects Zone | Big Data & Hadoop | Scoop.it

Over the last couple of months I have been talking to more and more customers who are either bringing their Hadoop clusters into production or that have already done so and are now getting serious about operations. This leads to some interesting discussions about how to monitor Hadoop properly and one thing pops up quite often: Do they need anything beyond Ganglia? If yes, what should they do beyond it?

more...
No comment yet.
Rescooped by Hien Luu from Big Data & Hadoop
Scoop.it!

What’s New in Hue 2.3

What’s New in Hue 2.3 | Big Data & Hadoop | Scoop.it

Hue 2.3 comes only two months after 2.2 but contains more than 100 improvements and fixes. In particular, two new apps were added (including an Apache Pig editor) and the query editors are now easier to use.

more...
No comment yet.
Scooped by Hien Luu
Scoop.it!

What’s New in Hue 2.3

What’s New in Hue 2.3 | Big Data & Hadoop | Scoop.it

Hue 2.3 comes only two months after 2.2 but contains more than 100 improvements and fixes. In particular, two new apps were added (including an Apache Pig editor) and the query editors are now easier to use.

more...
No comment yet.
Scooped by Hien Luu
Scoop.it!

So What? – Monitoring Hadoop beyond Ganglia | Architects Zone

So What? – Monitoring Hadoop beyond Ganglia | Architects Zone | Big Data & Hadoop | Scoop.it

Over the last couple of months I have been talking to more and more customers who are either bringing their Hadoop clusters into production or that have already done so and are now getting serious about operations. This leads to some interesting discussions about how to monitor Hadoop properly and one thing pops up quite often: Do they need anything beyond Ganglia? If yes, what should they do beyond it?

more...
No comment yet.
Scooped by Hien Luu
Scoop.it!

How R Grows

How R Grows | Big Data & Hadoop | Scoop.it

Saturday morning I was drinking my coffee wondering how much effort goes into Rworldwide. (It’s my job.) I noticed that there were 4469 packages on CRAN, and it occurred to me that tabulating the packages by publication date would give some indication of how much effort is being expended to improve packags and keep them up to date. With very little work at all I was able to read the table on the Available CRAN packages by date of publication page and produce this plot.

more...
Scooped by Hien Luu
Scoop.it!

WANdisco Hadoop Expert Presenting at Big Data Cloud Today! | Press Releases | News | WANdisco

With WANdisco's patented active-active replication technology, enterprises have non-stop access to their crucial data, regardless of location, allowing faster software development cycles and continuous access to Big Data.
more...
No comment yet.
Rescooped by Hien Luu from Big Data & Hadoop
Scoop.it!

How Scaling Really Works in Apache HBase

How Scaling Really Works in Apache HBase | Big Data & Hadoop | Scoop.it

At first glance, the Apache HBase architecture appears to follow a master/slave model where the master receives all the requests but the real work is done by the slaves. This is not actually the case, and in this article I will describe what tasks are in fact handled by the master and the slaves.

 

more...
No comment yet.
Rescooped by Hien Luu from Big Data & Hadoop
Scoop.it!

Using Impala to Query HBase

Impala can use the HBase client API via Java Native Interface (JNI) to query data stored in HBase. This querying does not read HFiles directly.

 

Querying depends on creating a Hive table and an HBase table and then establishing a mapping from HBase to Hive. For this mapping to function, each row key must either be a string or mapped to a string column. Once this mapping is established, string value columns in the Hive or Impala tables can be used to construct predicates. While it is always possible to scan the entire table, using string value column-based predicates to find a specific subset of information is almost always more efficient and useful.

 

Query predicates are applied to row keys as start and stop keys, thereby limiting the scope of a particular lookup. If row keys are not mapped to string columns, then ordering is typically incorrect and comparison operations do not work. For example, if row keys are not mapped to string columns, evaluating for greater than (>) or less than (

 

Predicates on non-key columns can be sent to HBase to scan as SingleColumnValueFilters, providing some performance gains. In such a case, HBase returns fewer rows than if those same predicates were applied using Impala. While there is some improvement, it is not as great when start and stop rows are used. This is because the number of rows that HBase must examine is not limited as it is when start and stop rows are used. As long as the row key predicate only applies to a single row, HBase will locate and return that row. Conversely, if a non-key predicate is used, even if it only applies to a single row, HBase must still scan the entire table to find the correct result.

 

Once the mapping between HBase and Hive is established, Impala supports doing joins of HBase and non-HBase tables. This allows you to construct queries that act upon data stored in different structures.

more...
No comment yet.
Rescooped by Hien Luu from Big Data & Hadoop
Scoop.it!

How Scaling Really Works in Apache HBase

How Scaling Really Works in Apache HBase | Big Data & Hadoop | Scoop.it

At first glance, the Apache HBase architecture appears to follow a master/slave model where the master receives all the requests but the real work is done by the slaves. This is not actually the case, and in this article I will describe what tasks are in fact handled by the master and the slaves.

 

more...
No comment yet.
Scooped by Hien Luu
Scoop.it!

How Scaling Really Works in Apache HBase

How Scaling Really Works in Apache HBase | Big Data & Hadoop | Scoop.it

At first glance, the Apache HBase architecture appears to follow a master/slave model where the master receives all the requests but the real work is done by the slaves. This is not actually the case, and in this article I will describe what tasks are in fact handled by the master and the slaves.

 

more...
No comment yet.
Scooped by Hien Luu
Scoop.it!

Using Impala to Query HBase

Impala can use the HBase client API via Java Native Interface (JNI) to query data stored in HBase. This querying does not read HFiles directly.

 

Querying depends on creating a Hive table and an HBase table and then establishing a mapping from HBase to Hive. For this mapping to function, each row key must either be a string or mapped to a string column. Once this mapping is established, string value columns in the Hive or Impala tables can be used to construct predicates. While it is always possible to scan the entire table, using string value column-based predicates to find a specific subset of information is almost always more efficient and useful.

 

Query predicates are applied to row keys as start and stop keys, thereby limiting the scope of a particular lookup. If row keys are not mapped to string columns, then ordering is typically incorrect and comparison operations do not work. For example, if row keys are not mapped to string columns, evaluating for greater than (>) or less than (

 

Predicates on non-key columns can be sent to HBase to scan as SingleColumnValueFilters, providing some performance gains. In such a case, HBase returns fewer rows than if those same predicates were applied using Impala. While there is some improvement, it is not as great when start and stop rows are used. This is because the number of rows that HBase must examine is not limited as it is when start and stop rows are used. As long as the row key predicate only applies to a single row, HBase will locate and return that row. Conversely, if a non-key predicate is used, even if it only applies to a single row, HBase must still scan the entire table to find the correct result.

 

Once the mapping between HBase and Hive is established, Impala supports doing joins of HBase and non-HBase tables. This allows you to construct queries that act upon data stored in different structures.

more...
No comment yet.
Scooped by Hien Luu
Scoop.it!

With Impala now GA, Cloudera's CEO sizes up the SQL-on-Hadoop market

With Impala now GA, Cloudera's CEO sizes up the SQL-on-Hadoop market | Big Data & Hadoop | Scoop.it
Cloudera’s Impala engine for interactive SQL queries on Hadoop data is now generally available, and CEO Mike Olson gives his lay of the competitive landscape.
more...
No comment yet.