Big Data Technology, Semantics and Analytics
12.1K views | +0 today
Big Data Technology, Semantics and Analytics
Trends, success and applications for big data including the use of semantic technology
Curated by Tony Agresta
Your new post is loading...
Your new post is loading...
Scooped by Tony Agresta!

DBTA: Unleashing the Power of Hadoop for Big Data Analytics

Tony Agresta's insight:

Great paper that covers how you can make Hadoop really powerful.   Not all data is created equal.  Some is needed in real time.  Some requires less expensive storage options.  Some you may need to quickly migrate from HDFS to MarkLogic.  This paper is the perfect road map to understand how you can unleash the power of Hadoop.  No registration required to download  a copy. 

No comment yet.
Scooped by Tony Agresta!

Updated Database Landscape map – June 2013 — Too much information

Updated Database Landscape map – June 2013 — Too much information | Big Data Technology, Semantics and Analytics |
Tony Agresta's insight:

MarkLogic is uniquely positioned on this database landscape map.    Here's what makes the position very different from other vendors:

1.  Search - MarkLogic is directly connected to all the major enterprise search vendors.   Recent recognition of this was confirmed by Gartner in its Enterprise Search Magic Quadrant.   Notice that other NoSQL technologies are nowhere close to this connection point.

2.  General Purpose - MarkLogic provides an enterprise NoSQL database and Application Services and Search.   With support for many development languages, REST and JAVA APIs, MarkLogic has clear links to SAP, Enterprise DB and a host of other database providers.

3.   Graph and Document - MarkLogic has long been recognized as a document store and used widely all over the world for this purpose.  Notice the subtle connection to Graph as well connecting MarkLogic to other vendors in this space like Neo4J.  MarkLogic 7 promises to deliver a world class triple store to index subjects, predicates and objects in XML documents or load other triples through the MarkLogic Content Pump.  For the first time, the only Enterprise NoSQL technology with search will include semantics.  Updated APIs and support for SPARQL are part of this release.

4.  Big Tables - MarkLogic's ability to handle big data has long been known.  The olive green line is designated for vendors like MarkLogic, Cassandra, HBASE, etc.   MarkLogic's partnership with Intel for Distribution of Apache Hadoop and the fact that MarkLogic ships with Hadoop connectors provide additional confirmation for this position.

5.   Key Value Stores - Data can be stored as keys without a database schema required lby relational databases.  In MarkLogic's case, huge quantities of data can be indexed in real time with data stored in memory and disk making search results instant and complete.   After a recent analysis of over 50+ MarkLogic customers, the abilty to quickly get up and running and deliver new information products to market was a business driver they mentioned over and over again.

The fact is, no one else on the list has all of these qualities.   Because of this unique position, visually you see MarkLogic distanced from other clusters or long lists of technology vendors.  

To learn more, you can go to MarkLogic Resources.

Scooped by Tony Agresta!

MarkLogic Server - Technology Preview: HDFS Storage

An introduction to the HDFS Storage feature available as a technology preview from MarkLogic.
Tony Agresta's insight:

Seamlessly combine the power of MapReduce with MarkLogic’s real-time, interactive analysis and indexing on a single, unified platform.

  • Get more power out of Hadoop. Hadoop and MarkLogic together can allow you to tackle problems that would be difficult or impossible to address by either technology alone.
  • Save money by leveraging common infrastructure. Using MarkLogic and Hadoop Distributed File System (HDFS) enables common batch-processing infrastructure to be used across many different projects and applications.
  • Enterprise-class support for Hadoop. Our partnership with Hortonworks provides a strong, supported platform for building enterprise-class Big Data Applications with Apache Hadoop.

No comment yet.
Scooped by Tony Agresta!

Database Revolution; Future of Information Publishing

Tony Agresta's insight:

Here’s a brilliant presentation from Mike Bowers, Principal Engineer at the Church of Jesus Christ of Latter Day Saints.  It accomplishes two major objectives:

  • Mike reviews the strengths and weaknesses of the five major classes of databases today (relational, dimensional, object, graph and document). 
  • He then dissects the major NoSQL databases on the market including MarkLogic, Mongo, Riak, Cloudant/Couch DB and Cassandra.  How do they stack up? Are they enterprise ready? 

If developer productivity, application performance and enterprise readiness are concerns that your company has, this video is a “must see”.  Here are some sound bites I took away from the presentation.  Please note these comments only begin to scratch the surface of Mike’s message.


Over 80 % of the data being created today is unstructured and organizations need to store, search and analyze hundreds of different data formats at light speed.   The ability to handle data variability, data variety and data relevance has jumped to the top of the agenda for both business and IT. But how can organizations discern meaning from this data?   How do they create context around unstructured data with so many formats in play?  How do they make it discoverable?  


Relational Models are not well suited to handle the problem since they were designed to organize your data in rows, columns and tables.  The variety and complexity of unstructured data coupled with the overriding need to scale out on commodity hardware prevent them from leveraging over 80% of the data today.


Mike shows a great example of how the document database (NoSQL database) takes unstructured data in the form of a story, identifies the data elements in the story (topic, location, author), semantically links these elements to show relationships between the elements and then identifies the hierarchy within the story (title, subtitle, body, etc…).   Armed with all of this, the unstructured data lives with context.  The original document persists but now all of the elements are discoverable in a variety of ways.  


Given the reality that unstructured data is growing so rapidly and needs to be integrated and analyzed alongside structured data to complete the picture, what does an application need from a NoSQL database?  Basically what every database needs - five core capabilities:  1) inserts, updates and deletes 2) the ability to query the data 3) the ability to search the data 4) the ability to bulk process the data and 5) the ability to do all of this consistently.  With extraordinary data volumes, this has to be done at scale in an affordable way.  


The only enterprise NoSQL database that handles all of this today is MarkLogic.   Mike evaluates search relevance, advanced search using facets, geospatial search, entity enrichment, data consistency, developer productivity using JAVA, the ability to retrieve multiple documents, integration with the BI stack using SQL, real time data ingestion, indexing and much more.   Imagine if you had to ask your programmers to develop an application to handle data locks, threading bugs, serialization, dead locks and rare conditions?     Imagine if you had to write the code to ensure all parts of your data transactions succeeded?  How would you ensure all of the data has been committed consistently? Do the commits meet all of your data rules?    How do you ensure your data survives system failures and can be recovered after an inadvertent deletion?   


The vast majority of NoSQL databases lack these capabilities but MarkLogic has all of them.  If you are evaluating database technology today, I would highly recommend watching this video – at least twice.  


Learn more at

To see related videos, visit


Ian Sykes's comment, January 2, 2013 11:39 AM
Hi Dominic Happy New Year. Yes I was impressed by this and included it on my Blog. Certainly clarifies a lot for 2013.
Adrian Carr's curator insight, April 30, 2013 1:38 PM

this is a great presentation - full of great insight into the market

Edwin's curator insight, March 19, 2014 10:30 PM

Future of Database development

Scooped by Tony Agresta!

Shopping in the Big Data marketplace

Shopping in the Big Data marketplace | Big Data Technology, Semantics and Analytics |

"The Big Data marketplace is diverse and growing. It has a host of companies selling products that are called big data solutions but some of that is opportunism and some is deliberate obfuscation of the concept."


After reading this article, I couldn't help but add some more depth. "Shopping in the Big Data Marketplace" fails to mention the leader in this space, MarkLogic who has been providing powerful, accessible and trusted technology in this space longer than any of the vendors mentioned in the article. This isn't hype.


There's one absolute way to validate this - Read about the customers who use MarkLogic today including the BBC, MModal, Conde Naste, Zynx, countless government agencies. A more complete view is available here:


Why did these organizations decide to use MarkLogic? It's an enterprise grade, hardened Big Data platform that works. Some of the capabilities provided out of the box include:


 - MarkLogic distributes Hadoop

 - User-Defined functions

 - In-database Analytic Functions

 - Visualization Widgets

 - Real-time, flexible indexes

 - Multilingual full-text search

 - Schemaless design

 - Scale out on commodity hardware with auto-sharding

 - Tiered storage

 - Alerting and event processing

-  Geospatial query

 - Java API


 - JSON Support

 - MarkLogic Content Pump

 - BI Tools Integration

 - Application Builder

 - Information Studio

 - Connector for Hadoop

 - Monitoring and Management

 - Support for Linux, Windows, Solaris, MacOSX, SUSE, VMWare

 - Superclusters

 - Transactions

 - Role-based Security

 - Automated failover

 - Replication

 - Journal archiving

 - Point-in-time recovery

 - Database rollback

 - Backup/Restore

 - Distributed transactions

 - Common Criteria for Information Technology Security Evaluation (ISO 15408)


Here's a link to a video you can watch about how the BBC Olympics Website was built on this proven, trusted technology:






Scooped by Tony Agresta!

What's the Scoop on Hadoop?

What's the Scoop on Hadoop? | Big Data Technology, Semantics and Analytics |
If you are an investor in the field of Big Data, you must have heard the terms “Big Data” and “Hadoop” a million times.  Big Data pundits use the terms interchangeably and conversations might lead you to believe that...
Tony Agresta's insight:

"Hadoop is not great for low latency or ad-hoc analysis and it’s terrible for real-time analytics."

In a webcast today with Matt Aslett from 451 Research and Justin Makeig from MarkLogic, a wealth of inforrmation was presented about Hadoop including how it's used today and how MarkLogic extends Hadoop.  When the video becomes available, I'll post it but in the meantime, the quote from the Forbes article echoes what the speakers discussed today.

Today, Hadoop is used to store, process and integrate massive amounts of structured and unstructured data and is typically part of a database architecture that may include relational databases, NoSQL, Search and even Graph Databases.  Organizations can bulk load data into the Hadoop Distributed File System (HDFS) and process it with MapReduce.   Yarn is a  technology that's starting to gain traction enabling multiple applications to run on top of HDFS and process data in many ways. But it's still early stage.

What's missing?  Real Time Applications.  That's an understatement since reliability and security have also been question marks as well as limited support for SQL based analytics.   Complex configuration makes it difficult to apply Hadoop.

MarkLogic allows users to deploy an Enterprise NoSQL database into an existing Hadoop implementation and offers many advantages including:

  • Real time access to your data
  • Less data movement
  • Mixed workloads within the same infrastructure
  • Cost effective long term storage
  • The ability to leverage your existing infrastructure

Since all of your MarkLogic data can be stored in HDFS including indexes, you can combine local storage for active, real time results with lower cost tiered storage (HDFS) for data that's less relevant or needs additional processing.  MarkLogic allows you to partition your data, rebalance and migrate partitioned data interactively.

What does this mean for you?  You can optimize costs, performance and availability while also satisfying the needs of the business in the form of real time analytics, alerting and enterprise search. You can take data "off line" and then bring it back instantly since it's already indexed.  You can still process your data using batch programs in Hadoop but now all of this is done in a shared infrastructure. 

To learn more about MarkLogic and Hadoop, visit this Resource Center

When the video is live, I'll send a link out.

Bryan Borda's curator insight, July 19, 2013 11:39 AM

Excellent information on advantages to using NoSQL technology with a Hadoop infrastructure.  Take advantage of the existing Hadoop environment by adding powerful NoSQL features to enhance the value.

Scooped by Tony Agresta!

The new reality for Business Intelligence and Big Data

The new reality for Business Intelligence and Big Data | Big Data Technology, Semantics and Analytics |
You know about Big Data and its potential, how it creates greater understanding of our world, reduces waste and misuse of resources, and dramatically increases efficiency.
Tony Agresta's insight:

Data discovery tools allow you to reveal hidden insights in your data when you don't know what to look for in advance.  These highly interactive tools allow you to visualize disparate data in various forms - charts, timelines, graphs, geo-spatial and tables – and explore relationships in data to uncover patterns that static dashboards cannot.  


With the explosion of big data, organizations are now using these tools with structured, semi-structured and unstructured data.  This approach allows them to consolidate data without having to build complex schemas, search the data instantly, deliver new content products dynamically and analyze all of their data in real time.  A transformational shift in data analysis is underway allowing organizations to do this with documents, e-mails, video and other sources.   Imagine if you could load data into Hadoop, enrich it, ingest the data into an enterprise NoSQL database in real-time, index everything for instant search & discovery and analyze that data using Tableau or Cognos.   As the only Enterprise NoSQL Database on the market, MarkLogic allows you to do just that.


You can learn more here:


No comment yet.
Rescooped by Tony Agresta from MarkLogic - Enterprise NoSQL Database!

What is MarkLogic?

What is MarkLogic? | Big Data Technology, Semantics and Analytics |



Via Dominic Spitz
Dominic Spitz's curator insight, January 23, 2013 10:55 AM

→ Enterprise NoSQL Database Technology:

For more than a decade, MarkLogic has delivered a powerful and trusted enterprise-grade NoSQL database that enables organizations to turn all data into valuable and actionable information. Key features include ACID transactions, horizontal scaling, real-time indexing, high availability, disaster recovery, government-grade security, and more

→ Best Big Data Search:

Big Data is only valuable if you can find the information that is critical to your organization’s success. Searching across different data types — text, images, date/time, geospatial, and currencies — from multiple, disparate systems is difficult, and often requires weeks of data manipulation and normalization. Only when you have the power to search across all your data are you able to get the full value out of your information. With MarkLogic, you can easily add new information, index “as-is,” and get up to date sub-second search results.

→ Application Development:


Application development teams continually face challenges around costs, resources, time-to-market, feature completeness, and quality. Tools that promote rapid development and best practices are critical for meeting the stringent requirements of application initiatives. With the growing challenge of managing unstructured information and Big Data, development efficiency and effectiveness are more important than ever. You can quickly develop advanced database applications that search through terabytes of information to store and retrieve any type of data, using MarkLogic’s rich application development tools and APIs.

→ Analytics & Business Intelligence:


Big Data opens up the potential for predictive analysis, competitive and customer intelligence, sentiment analysis, social media surveillance, and so much more. However, not every Big Data technology platform supports real-time Business Intelligence

→ Get Inside the Brains of MarkLogic - MarkLogic Architecture:

Fast, agile, and scalable – that’s what makes MarkLogic the leading database for unstructured information. It fuses together database internals, search-style indexing, and application server behaviors into a unified system. It uses XML documents as its data model, and stores the documents within a transactional repository. It indexes the words and values from each of the loaded documents, as well as the document structure.

→ Real-time Your Hadoop:

Get the best of real-time Big Data Applications with the benefits of batch processing and archival storage by combining Hadoop with MarkLogic. The MarkLogic Connector for Hadoop allows you to easily move data between MarkLogic and Hadoop within applications, creating opportunities to identify new business insights


Scooped by Tony Agresta!

AllAnalytics - Mark Pitts - Hadoop: Take Care Before You Whoop It Up

AllAnalytics - Mark Pitts - Hadoop: Take Care Before You Whoop It Up | Big Data Technology, Semantics and Analytics |
Despite all the hype, Hadoop addresses only a specific set of problems. Make sure they're the ones you need to solve.
Tony Agresta's insight:

This is why MarkLogic has integarted Hadoop into our Enterprise NoSQL approach.  You can read more about the advantages here:

No comment yet.
Scooped by Tony Agresta!

96 – The percentage of Canadian executives who think real time Big Data tools are important | Infomart

96 – The percentage of Canadian executives who think real time Big Data tools are important | Infomart | Big Data Technology, Semantics and Analytics |

It's the title of this post that caught my eye. There's no shortage of Big Data tools on the market today. But how many handle the real time aspect of big data with enterprise reliability? Here's an excerpt from a talk given recently at MarkLogic's NY Summit:


We are in the middle of a database revolution. NoSQL is disrupting the database world by innovating and Schema-less databases enable extreme agility of software development and rapid changes to huge data sets. But are all NOSQL approaches the same? Not at all. MarkLogic, for example, allows for real time ingestion of data and alerting. Indexes are derived instantly. ACID transactions are supported without the need to write additional code.


What is an ACID Transaction? From Wikipedia: ACID (atomicity, consistency, isolation, durability) is a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.


So, evaluate NoSQL approaches carefully. Real Time support of ACID transactions may be exactly what you need to compete at a time when instant and reliable access to data is essential.


No comment yet.