Big Data Technology, Semantics and Analytics
12.1K views | +0 today
Big Data Technology, Semantics and Analytics
Trends, success and applications for big data including the use of semantic technology
Curated by Tony Agresta
Your new post is loading...
Your new post is loading...
Scooped by Tony Agresta!

What's the Scoop on Hadoop?

What's the Scoop on Hadoop? | Big Data Technology, Semantics and Analytics |
If you are an investor in the field of Big Data, you must have heard the terms “Big Data” and “Hadoop” a million times.  Big Data pundits use the terms interchangeably and conversations might lead you to believe that...
Tony Agresta's insight:

"Hadoop is not great for low latency or ad-hoc analysis and it’s terrible for real-time analytics."

In a webcast today with Matt Aslett from 451 Research and Justin Makeig from MarkLogic, a wealth of inforrmation was presented about Hadoop including how it's used today and how MarkLogic extends Hadoop.  When the video becomes available, I'll post it but in the meantime, the quote from the Forbes article echoes what the speakers discussed today.

Today, Hadoop is used to store, process and integrate massive amounts of structured and unstructured data and is typically part of a database architecture that may include relational databases, NoSQL, Search and even Graph Databases.  Organizations can bulk load data into the Hadoop Distributed File System (HDFS) and process it with MapReduce.   Yarn is a  technology that's starting to gain traction enabling multiple applications to run on top of HDFS and process data in many ways. But it's still early stage.

What's missing?  Real Time Applications.  That's an understatement since reliability and security have also been question marks as well as limited support for SQL based analytics.   Complex configuration makes it difficult to apply Hadoop.

MarkLogic allows users to deploy an Enterprise NoSQL database into an existing Hadoop implementation and offers many advantages including:

  • Real time access to your data
  • Less data movement
  • Mixed workloads within the same infrastructure
  • Cost effective long term storage
  • The ability to leverage your existing infrastructure

Since all of your MarkLogic data can be stored in HDFS including indexes, you can combine local storage for active, real time results with lower cost tiered storage (HDFS) for data that's less relevant or needs additional processing.  MarkLogic allows you to partition your data, rebalance and migrate partitioned data interactively.

What does this mean for you?  You can optimize costs, performance and availability while also satisfying the needs of the business in the form of real time analytics, alerting and enterprise search. You can take data "off line" and then bring it back instantly since it's already indexed.  You can still process your data using batch programs in Hadoop but now all of this is done in a shared infrastructure. 

To learn more about MarkLogic and Hadoop, visit this Resource Center

When the video is live, I'll send a link out.

Bryan Borda's curator insight, July 19, 2013 11:39 AM

Excellent information on advantages to using NoSQL technology with a Hadoop infrastructure.  Take advantage of the existing Hadoop environment by adding powerful NoSQL features to enhance the value.

Scooped by Tony Agresta!

MarkLogic Server - Technology Preview: HDFS Storage

An introduction to the HDFS Storage feature available as a technology preview from MarkLogic.
Tony Agresta's insight:

Seamlessly combine the power of MapReduce with MarkLogic’s real-time, interactive analysis and indexing on a single, unified platform.

  • Get more power out of Hadoop. Hadoop and MarkLogic together can allow you to tackle problems that would be difficult or impossible to address by either technology alone.
  • Save money by leveraging common infrastructure. Using MarkLogic and Hadoop Distributed File System (HDFS) enables common batch-processing infrastructure to be used across many different projects and applications.
  • Enterprise-class support for Hadoop. Our partnership with Hortonworks provides a strong, supported platform for building enterprise-class Big Data Applications with Apache Hadoop.

No comment yet.