If you are an investor in the field of Big Data, you must have heard the terms “Big Data” and “Hadoop” a million times. Big Data pundits use the terms interchangeably and conversations might lead you to believe that...
|Scooped by Tony Agresta|
"Hadoop is not great for low latency or ad-hoc analysis and it’s terrible for real-time analytics."
In a webcast today with Matt Aslett from 451 Research and Justin Makeig from MarkLogic, a wealth of inforrmation was presented about Hadoop including how it's used today and how MarkLogic extends Hadoop. When the video becomes available, I'll post it but in the meantime, the quote from the Forbes article echoes what the speakers discussed today.
Today, Hadoop is used to store, process and integrate massive amounts of structured and unstructured data and is typically part of a database architecture that may include relational databases, NoSQL, Search and even Graph Databases. Organizations can bulk load data into the Hadoop Distributed File System (HDFS) and process it with MapReduce. Yarn is a technology that's starting to gain traction enabling multiple applications to run on top of HDFS and process data in many ways. But it's still early stage.
What's missing? Real Time Applications. That's an understatement since reliability and security have also been question marks as well as limited support for SQL based analytics. Complex configuration makes it difficult to apply Hadoop.
MarkLogic allows users to deploy an Enterprise NoSQL database into an existing Hadoop implementation and offers many advantages including:
- Real time access to your data
- Less data movement
- Mixed workloads within the same infrastructure
- Cost effective long term storage
- The ability to leverage your existing infrastructure
Since all of your MarkLogic data can be stored in HDFS including indexes, you can combine local storage for active, real time results with lower cost tiered storage (HDFS) for data that's less relevant or needs additional processing. MarkLogic allows you to partition your data, rebalance and migrate partitioned data interactively.
What does this mean for you? You can optimize costs, performance and availability while also satisfying the needs of the business in the form of real time analytics, alerting and enterprise search. You can take data "off line" and then bring it back instantly since it's already indexed. You can still process your data using batch programs in Hadoop but now all of this is done in a shared infrastructure.
To learn more about MarkLogic and Hadoop, visit this Resource Center
When the video is live, I'll send a link out.