Big Data Technology, Semantics and Analytics
12.1K views | +0 today
Big Data Technology, Semantics and Analytics
Trends, success and applications for big data including the use of semantic technology
Curated by Tony Agresta
Your new post is loading...
Your new post is loading...
Scooped by Tony Agresta!

What's the Scoop on Hadoop?

What's the Scoop on Hadoop? | Big Data Technology, Semantics and Analytics |
If you are an investor in the field of Big Data, you must have heard the terms “Big Data” and “Hadoop” a million times.  Big Data pundits use the terms interchangeably and conversations might lead you to believe that...
Tony Agresta's insight:

"Hadoop is not great for low latency or ad-hoc analysis and it’s terrible for real-time analytics."

In a webcast today with Matt Aslett from 451 Research and Justin Makeig from MarkLogic, a wealth of inforrmation was presented about Hadoop including how it's used today and how MarkLogic extends Hadoop.  When the video becomes available, I'll post it but in the meantime, the quote from the Forbes article echoes what the speakers discussed today.

Today, Hadoop is used to store, process and integrate massive amounts of structured and unstructured data and is typically part of a database architecture that may include relational databases, NoSQL, Search and even Graph Databases.  Organizations can bulk load data into the Hadoop Distributed File System (HDFS) and process it with MapReduce.   Yarn is a  technology that's starting to gain traction enabling multiple applications to run on top of HDFS and process data in many ways. But it's still early stage.

What's missing?  Real Time Applications.  That's an understatement since reliability and security have also been question marks as well as limited support for SQL based analytics.   Complex configuration makes it difficult to apply Hadoop.

MarkLogic allows users to deploy an Enterprise NoSQL database into an existing Hadoop implementation and offers many advantages including:

  • Real time access to your data
  • Less data movement
  • Mixed workloads within the same infrastructure
  • Cost effective long term storage
  • The ability to leverage your existing infrastructure

Since all of your MarkLogic data can be stored in HDFS including indexes, you can combine local storage for active, real time results with lower cost tiered storage (HDFS) for data that's less relevant or needs additional processing.  MarkLogic allows you to partition your data, rebalance and migrate partitioned data interactively.

What does this mean for you?  You can optimize costs, performance and availability while also satisfying the needs of the business in the form of real time analytics, alerting and enterprise search. You can take data "off line" and then bring it back instantly since it's already indexed.  You can still process your data using batch programs in Hadoop but now all of this is done in a shared infrastructure. 

To learn more about MarkLogic and Hadoop, visit this Resource Center

When the video is live, I'll send a link out.

Bryan Borda's curator insight, July 19, 2013 11:39 AM

Excellent information on advantages to using NoSQL technology with a Hadoop infrastructure.  Take advantage of the existing Hadoop environment by adding powerful NoSQL features to enhance the value.

Scooped by Tony Agresta!

Social Media & Big Data in the Insurance Industry

Social Media & Big Data in the Insurance Industry | Big Data Technology, Semantics and Analytics |
According to a global industry survey, Insurers feel less prepared to deal with threats arising from social media and big data than they do about more familiar ones.,Insurer ,Technology
Tony Agresta's insight:

Insurance companies increased use of social media means bigger data is on the way.  In turn, the need for technology to manage this data will increase.


For example, insurance companies are using social to increase visibility for their brand and develop stronger customer relationships.   Chubb Insurance follows influencers and industry news on their twitter page.   They provide educational information to Chubb followers in attempt to build awareness and trust.


The use of social media in insurance extends beyond CRM.  Companies are listening to social media sites in an attempt to detect posts related to insurance claims.  They are detecting activities that could indicate a claimant has gone beyond what a physician would deem acceptable.


Traditional uses of social media to assess sentiment apply as well.   Customer service channels are better informed with real time feeds on positive and negative sentiment about their products and the industry as a whole.


Prospects shop for insurance products on line using communities and social networks.   Understanding when this happens helps insurance companies target their sales and marketing efforts.  Sharing bite size pieces of information directly with consumers allows insurance companies to overcome one of their main obstacles, distrust.


Social media has become an effective way to communicate with policy holders for events that may affect claims. Most of this is done post-catastrophic events but proactive approaches relating to health and wellness is another application of social communication in support of reduced risk and lower costs.


Big data technology to manage these applications allows Insurers to ingest massive volumes of data, wrap context and meaning around the unstructured content, search it in real time and deliver the facts to the right channels at the right time.

No comment yet.
Scooped by Tony Agresta!

Semantic Technologies in MarkLogic - World Class Triple Store in Version 7

Tony Agresta's insight:

This video is a fantastic overview from MarkLogic's Stephen Buxton, John Snelson and Micah Dubinko covering semantic processing, use cases for triple stores that include richer search & graph applications and the expanded architecture in MarkLogic 7.    It's an hour in length but well worth the time if you're interested in understanding how you can use documents, facts derived from text and values to build ground breaking applications.   Databases as we know them will change forever with the convergence of enterprise nosql, search and semantic processing.  This video provides you with the foundation to understand this important change in database technology.

No comment yet.
Scooped by Tony Agresta!

RDF 101 - Cambridge Semantics

RDF 101 - Cambridge Semantics | Big Data Technology, Semantics and Analytics |
Semantic University tutorial and introduction to RDF.
Tony Agresta's insight:

This post is a bit technical but recommending reading since it represents a one of the most important aspects in data science today, the ability to discern meaning from unstructured data. 

To truly appreciate the significance of this technology, consider the following questions -  Since over 80% of the data being created today is unstructured in forms that include text, videos, images, documents, and more, how can organizations interpret the meaning behind this data?   How can they pull out the facts in the data and show relationships between those facts leading to new insights?


This post provides you with the foundation on how semantic processing works.   Cutting to the chase, the technologies are referred to as RDF (Resource Description Framework), SPARQL and OWL. They allow us to create underlying data models that understand relationships in unstructured data, even across web sites, document repositories and disparate applications like Facebook and LinkedIn.


These data models store data that has properties extracted from the unstructured data.  Consider the following sentence:   "The Monkeys are destroying Tom's garden."  Semantic processing of this text would deconstruct the sentence identifying the subject, object and predicate while also building relationships between the three.  The subject is "monkeys" and they are taking an action on "the garden". The garden is therefore the object and the predicate is "destroying".  


Most importantly, there is a connection made between the monkeys and the garden allowing us to show relationships between specific facts pulled from text.   How can this help us? 


Assume for a second you’re working for a government agency tracking a suspicious person who exists on a watch list?   Crawling the web looking for that person's name is one way to identify additional information about the person.   Technology to do this exists today.  When the name is detected, identifying relationships between the person being investigated and other subjects (or objects in the text) can lead you to new people that may also be of interest.  For examples if Sam is on the watch list a sentence like this would be of interest:  "Sam works with Steve at ABC Home Builders Corp.”  Relationships between the suspect (Sam) and someone new in the investigation (Steve) could be identified.   The fact that they both work for the same employer allows analysts to connect the subjects through this employer.


These interesting facts help investigators make connections within e-mail, phone conversations, in house data and other sources, all of which can be displayed visually in a graph to show the subjects and how they are linked. 


Data models to store, search and analyze this data will become one of the primary tools to interpret massive amounts of data being collected today.  This technology allows computers to understand relationships in unstructured data and display those relationships to analysts in the form of visual diagrams that clearly show connections to other data including phone calls, events, accounts, and more.  The implications of this extend far beyond counter terrorism to include social networking, marketing, fraud, cyber security and sales to name a few.

We are at an inflexion point in big data – data stored in silos can now be consolidated with external data from the open web.  Most importantly, the unstructured data can be interpreted as we form connections that are integral in understanding how things are related to each other.  Data visualization technology is the vehicle to display those connections allowing analysts to explore any form of data in a single application. 

Learn more about this technology and other advances in Enterprise NoSQL here.

No comment yet.
Scooped by Tony Agresta!

NSA collecting phone records of millions of Verizon customers daily

NSA collecting phone records of millions of Verizon customers daily | Big Data Technology, Semantics and Analytics |
Exclusive: Top secret court order requiring Verizon to hand over all call data shows scale of domestic surveillance under Obama administration
Tony Agresta's insight:

In my opinion, this demonstrates one of the most important aspects of big data analysis and should continue.    Call data, including outbound call numbers, inbound call numbers, duration of call, start time and end time, are vital pieces of information necessary to analyze and protect us.  US lives are at risk.  The article by Glenn Greennwald in The Guardian states:


"Such metadata is what the US government has long attempted to obtain in order to discover an individual's network of associations and communication patterns. The request for the bulk collection of all Verizon domestic telephone records indicates that the agency is continuing some version of the data-mining program begun by the Bush administration in the immediate aftermath of the 9/11 attack."


Let's get to the heart of the matter.   This type of big data analysis has prevented attacks and has proven to work.   At this moment, all of the facts of these investigations are not disclosed (and many classified examples will probably never be revealed) but many news outlets are reporting that a planned bomb attack on the NY subway was diverted because of phone and e-mail intercept and analysis. 


For intelligence analysts to do this accurately and completely they need to analyze the haystack of data represented by e-mail, phone and other forms of communication.   For example, if an e-mail from someone is intercepted because that person is corresponding about bomb recipes, our government should have access to the calls this person made and received. They should be allowed to analyze ALL of the connections between other callers to discern whether or not a network of clandestine activity exists.  If warranted, our government should be able to look at the content of the unstructured data to determine if there are other people, events or places referenced in the text.  Initiating these investigations by analyzing all e-mail traffic or all phone calls made to known or suspected terrorists protects everyone of us.


Techniques and processes to ingest, search and manage massive amounts of data like phone records and e-mail traffic are being used today. This should not surprise anyone.  Connecting the dots between caller metadata and known or suspected terrorists is one very effective way to maintain the safety of US citizens.   If this means the NSA needs to look more closely at other calls that suspects have made, so be it.  These preventative measures are in place to safeguard our freedom and prevent lives from being lost.   Haven't we seen enough of that?


No comment yet.
Scooped by Tony Agresta!

It's the Shape, Not the Size - of the Data - that Matters

It's the Shape, Not the Size - of the Data - that Matters | Big Data Technology, Semantics and Analytics |
Community Blogs, comments and opinions by industry professionals
Tony Agresta's insight:

Great insights from Amir Halfon on Enterprise NoSQL and some of the challenges organizations face today with respect to relational databases.   Here's a quote from the blog post

"The answer lies within a different category of technology called Enterprise NoSQL, which has been designed and built with transactions and enterprise features from the ground up, just like relational databases. But unlike those, an Enterprise NoSQL database models the data as hierarchical trees rather than rows and columns. These trees are aggressively indexed in-memory as soon as the data is ingested, and then used for both element retrieval and full text search, unifying two concepts that have traditionally been separate - the database and the search engine."

No comment yet.
Scooped by Tony Agresta!

MongoDB 2.4 addresses search concerns - SD Times: Software Development News

MongoDB 2.4 addresses search concerns - SD Times: Software Development News | Big Data Technology, Semantics and Analytics |
New release brings tools to help scale the NoSQL poster child, as well as default settings that will prevent newbie mistakes
Tony Agresta's insight:

We agree with key points made in this article including: 

"Stirman said that the search capabilities added to MongoDB 2.4 are not a panacea. It does not have 'everything you could ever need in search, but we think that for a lot of people, the feature will be good enough. There will still be users who have more sophisticated needs for search, and they will integrate with a separate search technology,' he said.

Integrating with other search technology is never easy.  Maintaining two technologies always presents challenges. 

I would highly suggest you read this blog post written by the Bloor Group on MarkLogic:

You will find a great list of search capabilities in MarkLogic here:

No comment yet.
Scooped by Tony Agresta!

Nate Silver's big-data insights -- FCW

Nate Silver's big-data insights -- FCW | Big Data Technology, Semantics and Analytics |
In his latest book, statistician and predictive analytics expert Nate Silver describes his approach to forming forecasts out of data.
Tony Agresta's insight:

“Big data is not a cure-all, and it is inherently filled with noise and uncertainty, but it does have tremendous potential if people approach it the right way. ‘The world is not lacking for techniques, it's more about the right goals and right attitudes,’ Silver said.”  Having goals associated with big data analysis is a must.   Applying technology and techniques to achieve those goals is not far behind. 

Different approaches to analysis, some of which are presented in this article, complement one another and allow you to reach those goals faster. Let's take three classic approaches - dashboards, predictive models and data visualization – and the problem of fraud detection.  Let’s say our goals include improved fraud detection for incoming insurance claims and more efficient allocation of resources to investigate those claims.  If analysts can prioritize the workload for investigators, they can find fraud faster and reduce costs.

BI dashboards typically show key metrics which may lead the analyst to spot trends that they want to model using predictive analysis.   They also point analysts to independent data that may have some explanatory power in the model.   For example, a BI dashboard showing recent insurance claims by postal code may show a spike in certain areas which could lead to deeper analysis where geographic indicators (city, zip+4) are selected as attributes to predict fraudulent claims.   While knowing that the insurance claim has a higher likelihood of being fraudulent is important, understanding the ring of people linked to that claim is potentially more important. Are those people linked to other claims that have been investigated and found to be fraudulent?  Do these people share the same address?  Are they using the same doctor or pharmacy?  Have they worked together in the past?  

Data visualization allows you to explore those relationships and picks up where predictive models leave off.  In this case, all of the major types of analysis were used to achieve the goal of identifying suspicious claims and ultimately identifying a fraud ring.

Different approaches to analysis can complement one another.  Business Intelligence and dashboards provide one level of visibility.  They point the analyst to key trends and relationships that may require a model to be built.  Results of those models (scores or yes/no indictors) can be used with data discovery tools to understand relationships, identify patterns of behavior, show connections between seemingly disparate data and rapidly draw conclusions.   Identifying goals up front will allow analysts to formulate questions they want to ask of the data.  Using different types of analysis helps address challenges with big data. 

To learn more about how you achieve your goals using Enterprise NoSQL, you can go here:

No comment yet.
Scooped by Tony Agresta!


Tony Agresta's insight:

Followers may be interested in this white paper from The Bloor Group which summarizes the differences between database technologies  It's meaty.

Here are a few additional points that Bloor has written about MarkLogic's Enterprise NoSQL approach:

  • MarkLogic is also a true transactional database. Most NoSQL databases have compromised the ACID (Atomicity, Consistency, Isolation and Durability) properties that are important for transaction processing, MarkLogic is fully equipped to be a transactional database, and if you simply wanted to use it for order processing, there would be no problem in doing so.
  • The database has been built to enable rapid search of its content in a similar manner to the way that Google’s search capabilities have been built to enable rapid search of the Internet.
  • As some of MarkLogic’s implementations have grown to above the single petabyte level, fast search of massive amounts of data is one of its most important features. To enable its search capability MarkLogic indexes everything on ingest; not just the data, but also the XML metadata. This provides it with the ability to search both text and structure. For example, you might want to quickly find someone’s phone number from a collection of emails.
  • With MarkLogic you could pose a query such as: “Find all emails sent by Jonathan Jones, sort in reverse order by time and locate the latest email that contains a phone number in its signature block.”
  • You may be able to deduce from this that Mark Logic knows what an email is, knows how to determine who the sender is, knows what a signature block is and knows how to identify a phone number from within the signature block. If you were looking for a mobile phone number then you would simply add the word “mobile” in front of phone number. It should be clear from this that very few databases could handle such a query, because most databases are straight-jacketed by whatever version of SQL they implement and, even if it were possible to bend SQL in such a way as to formulate this kind of query, most databases cannot dig into data structures they hold in the way that MarkLogic.

With the release of MarkLogic 6 last fall, MarkLogic also provided SQL support through integration with Tableau and Cognos, in-database analytic functions, JSON support, JAVA and REST APIs and more.  For more information on this release, you can go here:

No comment yet.
Scooped by Tony Agresta!

Big Data “Hype” Coming To An End | SiliconANGLE

Big Data “Hype” Coming To An End | SiliconANGLE | Big Data Technology, Semantics and Analytics |
Tony Agresta's insight:

"Organizations have fascinating ideas, but they are disappointed with a difficulty in figuring out reliable solutions,” writes Sicular from The Gartner Group.


"Their disappointment applies to more advanced cases of sentiment analysis, which go beyond traditional vendor offerings.  Difficulties are also abundant when organizations work on new ideas, which depend on factors that have been traditionally outside of their industry competence, e.g. linking a variety of unstructured data sources.”


Today, organizations are coming to the realization that free or low cost open source technology to handle big data requires intense development cycles that burn costs and time.  Solving demanding challenges in these four areas has proven difficult:


  • Search & Discovery
  • Content Delivery
  • Analytics and Information Products
  • Data Consolidation


Organizations need to work with proven technology that's reliable and durable.  They need to work with technology that handles ACID transactions, enterprise security, high availability, replication, real time indexing and alerting - without having to right 10,000+ lines of code. 


Major financial institutions, healthcare payors, government agencies, media giants, energy companies, and state & local organizations have standardized on big data technology proven to increase developer productivity, create new revenue streams and address mission critical operations in a post 9-11 era. 


Adrian Carr's curator insight, February 11, 2013 11:11 AM

IT does it again.  Build a technology up until we start to believe it will solve all world problems.  It generates huge "science projects" and then everything comes tumbling down.  Finally a voice of reason says...maybe we set expectations unrealistically...One more trough of disillussionment !

Scooped by Tony Agresta!

Future Big Data: How Analytics Will Impact NFL | SiliconANGLE

Future Big Data: How Analytics Will Impact NFL | SiliconANGLE | Big Data Technology, Semantics and Analytics |
Tony Agresta's insight:

The well-known book and movie documenting the success of Billy Beane and the Oakland A's is probably the best example of using data to provide a completive advantage in sports. 


Analyzing player tendencies like pitch sequences, at-bats and defensive moves are interesting.  When you connect them to other players and teams, they become even more interesting and can lead to sets of rules that make up how you coach every detail.  Expectations and patterns taught to players provide them with guidelines on how to react and increase the odds of winning.


Could this be applied to football?  It seems like that's what watching film of past games is all about.  If a team could tag the plays with meaningful content about the outcome, the situation, the players on the field, the time, location, weather and then make it discoverable, coaches could identify patterns and tendencies that were previously undetected. 


There are probably some other applications of big data in football.  Analyzing new recruits by looking at unstructured data from the open web is one.  Real time twitter streams during game time linked in advertising is likely another.  


So who wins?   49ers, 27-24.  Sorry Ray Lewis. Just one ring for you.

No comment yet.
Scooped by Tony Agresta!

Giving big data publishing the royal treatment

Giving big data publishing the royal treatment | Big Data Technology, Semantics and Analytics |
U.K. royal society jumps into 21st century with the MarkLogic NoSQL database, opening 170 years of content to public view
Tony Agresta's insight:


A NoSQL database from MarkLogic provides the Royal Society of Chemistry (RSC) with the ability to unlock a treasure trove of assets.  Now the RSC can publish three times as many journals and four times as many articles. It also gave the Society the ability to develop new educational applications to make chemistry accessible to a wider audience.


Modern approaches to information products replete with full text search have the power to transform your business.  Built on an enterprise hardened NoSQL database that can ingest data in real time using a "schemaless" design, they provide a brilliant user experience displaying search results, allowing users to filter, save searches and more.


Unstructured data assets hidden in the recesses as dark data can be consolidated with new forms of data streaming from the internet.


"The accumulated content includes more than 1 million images, millions of science data files, and hundreds of thousands of articles from more than 200,000 authors. On top of that, add the recent capture of social media, video, and other digital content."   


This reminds me of another application built on MarkLogic called AuthorMapper from Springer Media. Using geospatial search, users can zoom into countries, identify articles of interest, read the abstracts, search by date range, apply full text search and more - all within a durable, reliable enterprise NoSQL platform that offers real time alerts in a scale out environment.  Can MongoDB do this?   No they can't.


New Information products like these can be brought to market quickly since developers can built them with Java, Rest and other common languages.   Extending them to include interactive analytics in the form of data visualization is built into MarkLogic.  Don't be fooled by NoSQL pretenders only to discover you need to write hundreds of thousands of lines of code. 


You can watch a video of the RSC at under the Customers tab.  Click on videos and it’s in the lower right.  You can experiment with the Springer application at





No comment yet.
Rescooped by Tony Agresta from Big Data, IoT and other stuffs!

When Pirates Meet Advanced Analytics

When Pirates Meet Advanced Analytics | Big Data Technology, Semantics and Analytics |
Seagoing criminals are always changing tactics. To catch them, uncover their hidden patterns.

Via Toni Sánchez
Tony Agresta's comment, January 15, 2013 9:30 AM
This article is spot on - with so many different sources of data, identifying suspects can be challenging. The volume and variety of data stored in different systems makes the job of the intelligence officer almost impossible. But new approaches in big data technology allow data scientists to consolidate the data, including unstructured data. Tight integration with data visualization tools that talk to Enterprise NoSQL databases make it easier than ever to profile all of the data in support of identifying connections between people, events, locations and more. This approach has proven to work in some of the largest intelligence agencies in the world. It has been applied in local law enforcement, fraud detection, loss prevention, cyber security and social networking. Real time alerts make it possible to immediately notify analysts as big data streams into the NoSQL database meeting predefined conditions. But to do this you need an Enterprise approach to NoSQL one that is hardened, scales, meets security requirements and has disaster recovery and high availability as part of the foundational technology providing production grade implementations in less time.
Scooped by Tony Agresta!

Gus Hunt on the importance of Network Graphs & Big Data

Gus Hunt on the importance of Network Graphs & Big Data | Big Data Technology, Semantics and Analytics |
Tony Agresta's insight:

What do people care about most when trying to identify relationships in big data?  Connection points between people, places, organizations, events, things, concepts and time.  How can this be done against massive volumes of data?  Through the use of network graphs driven by semantic processing of unstructured data and consolidated information.  We are at high noon in the information age, the cusp of grasping all of the data and turning it into intelligence.  This video by Gus Hunt, the CTO of the CIA, summarizes the requirements and challenges to get this done. 

No comment yet.
Scooped by Tony Agresta!

State Street's Chief Scientist on How to Tame Big Data Using Semantics

State Street's Chief Scientist on How to Tame Big Data Using Semantics | Big Data Technology, Semantics and Analytics |
Semantic databases are the next frontier in managing big data, says State Street's David Saul.
Tony Agresta's insight:

Here’s a good article on how financial institutions will use semantics to understand and manage risk.   It sounds to me that facts about people, transactions and the market, for example, can be derived from all types and sources of data including unstructured data in documents.   The relationships and connections between these facts can be stored, searched and analyzed.   Adding in the dimension of time would allow you to see when the relationships were formed.  Looking at the connection points in the form of a graph would allow analysts to identify networks that reveal individuals central to the graph that take on new importance.

Imagine if you could search a series of data sources that include information about customers for transactions over a certain level.   Imagine if you could identify all the associated people (employees and other customers) linked to the transactions.  What would you see if you could take all of these related facts and graph them in the form of a social network to visually show the connection points between people, addresses, institutions, lending officers and more?  

The combination of semantics to extract meaning from unstructured data, search, data visualization and analysis could reveal high risk transactions along with links to other individuals.  How are they connected?  Integrating data from the open web and thid party sources might reveal important insights involving past employers, educational institutions, property owned and residential addresses.  Technology to support this scenario using massive amount of consolidated intelligence is not far away. 

No comment yet.
Scooped by Tony Agresta!

New Forms of Analytics Incorporate New Forms of Data

New Forms of Analytics Incorporate New Forms of Data | Big Data Technology, Semantics and Analytics |
Discover how MarkLogic NoSQL database solutions help you make better decisions,
faster, with MarkLogic SolutionTracks—a series of brief, easy-to-follow, whiteboard tutorials.
Tony Agresta's insight:

In decades past analytical approaches incorporated data from transactional systems, spreadsheets and other sources.  Challenges still exist today.  The time it takes to build and udpate data warehouses sucks the air out of the analytical process, especially for those needing intelligence in realtime. Most analysts interested in exploring data at the speed of the human mind have to wait and wait...


Today, increased competition, demands from the business and monumental masses of new forms of data exacerbate these challenges.  Timely analysis has become even more difficult.   In theory, new information products could support new revenue streams and increased customer satisfaction but many organizations struggle to achieve analytical nirvana. They simply can't get to all of the data.


Fortunately, solutions do exist to consolidate documents and other forms of unstructured data with traditional data in real time.   Dashboards no longer need to be static.   Predictive analysis can include explanatory data to provide lift in your models.  Interactive analysis allows busines owners to explore data at faster speeds.


Enterprise NoSQL databases ingest all of your data "as is", make it available through search and real time database connectivity, derive important statistical measures and provide the ability to build user defined functions running close to the database.  Alerts inform analysts when conditions are met using pre-built queries. 


Enterprise hardened analytical applications won't lose your data, provide high availability, backup & recovery and government grade security.   This warrants a closer look.

Using BI Tools and In Database Analytics with MarkLogic

No comment yet.
Scooped by Tony Agresta!

Updated Database Landscape map – June 2013 — Too much information

Updated Database Landscape map – June 2013 — Too much information | Big Data Technology, Semantics and Analytics |
Tony Agresta's insight:

MarkLogic is uniquely positioned on this database landscape map.    Here's what makes the position very different from other vendors:

1.  Search - MarkLogic is directly connected to all the major enterprise search vendors.   Recent recognition of this was confirmed by Gartner in its Enterprise Search Magic Quadrant.   Notice that other NoSQL technologies are nowhere close to this connection point.

2.  General Purpose - MarkLogic provides an enterprise NoSQL database and Application Services and Search.   With support for many development languages, REST and JAVA APIs, MarkLogic has clear links to SAP, Enterprise DB and a host of other database providers.

3.   Graph and Document - MarkLogic has long been recognized as a document store and used widely all over the world for this purpose.  Notice the subtle connection to Graph as well connecting MarkLogic to other vendors in this space like Neo4J.  MarkLogic 7 promises to deliver a world class triple store to index subjects, predicates and objects in XML documents or load other triples through the MarkLogic Content Pump.  For the first time, the only Enterprise NoSQL technology with search will include semantics.  Updated APIs and support for SPARQL are part of this release.

4.  Big Tables - MarkLogic's ability to handle big data has long been known.  The olive green line is designated for vendors like MarkLogic, Cassandra, HBASE, etc.   MarkLogic's partnership with Intel for Distribution of Apache Hadoop and the fact that MarkLogic ships with Hadoop connectors provide additional confirmation for this position.

5.   Key Value Stores - Data can be stored as keys without a database schema required lby relational databases.  In MarkLogic's case, huge quantities of data can be indexed in real time with data stored in memory and disk making search results instant and complete.   After a recent analysis of over 50+ MarkLogic customers, the abilty to quickly get up and running and deliver new information products to market was a business driver they mentioned over and over again.

The fact is, no one else on the list has all of these qualities.   Because of this unique position, visually you see MarkLogic distanced from other clusters or long lists of technology vendors.  

To learn more, you can go to MarkLogic Resources.

Scooped by Tony Agresta!

Graph Analysis Powers the Social and Semantic Spheres of Big Data —

Graph Analysis Powers the Social and Semantic Spheres of Big Data — | Big Data Technology, Semantics and Analytics |
Why predictive modeling of human behavior demands an end-to-end, low-latency database architecture
Tony Agresta's insight:

Here are some key points from the article in addition to some insights about graph analysis and big data:


  • Semantic graphs map relationships among words, concepts and other constructs in the human language allowing for unstructured data to be used in a graph showing important connections.
  • Graph analysis is not new.   It has been used as a form of data visualization to explore connections and identify patterns and relationship that would otherwise have gone undetected.
  • Some vendors have taken their graph capabilities to new levels. For example, Centrifuge Systems allows users to draw the graphs, search the graph space, interact with charts and display important measures about the graph network.   Analysts can easily pinpoint portions of the graph that require additional analysis.  Hotspots of interesting activity jump out from the graph based on the number of connections and important performance measures.
  • While social graphs may be the most popular, this approach is especially useful in detecting fraud networks, cyber data breaches, terrorist activity and more. 
  • One of the most important points is that graphs can incorporate diverse streams of big data including both structured and unstructured.  Imagine the ability to analyze banking wire transfer data in the same graph with unstructured data that includes names, locations, and employers - intelligence that has been discovered through the semantic processing of unstructured data.   That's a powerful combination of sources linking data from the open web with transactional information. When done in real-time, this can be used in anti-money laundering, fraud prevention and homeland defense.
  • "Data scientists explicitly build semantic graph models as ontologies, taxonomies, thesauri, and topic maps using tools that implement standards such as the W3C-developed Resource Description Framework (RDF)."


While this may be beyond the scope of many NoSQL and Hadoop databases, MarkLogic 7 is embracing triple stores as they continue to innovate on their Enterprise NoSQL approach. No one else has values, triple store data derived from semantic processing and documents with real time indexing and search - The bar for Enterprise  NoSQL is about to be raised again.


You can read more about this on Semantic Web:

No comment yet.
Scooped by Tony Agresta!

Data Analysis and Unstructured Data: Expanding Business Intelligence (BI) by Thinking Outside of the Box -

Data Analysis and Unstructured Data: Expanding Business Intelligence (BI) by Thinking Outside of the Box - | Big Data Technology, Semantics and Analytics |
Tony Agresta's insight:

New forms of business intelligence incorporate both structured and unstructured data into your analysis.   Where does this apply today?  Customer service, intelligence analysis in government, fraud analysis in financial services, healthcare, consumer packaged goods, retail and other markets can benefit from this approach.  The open web provides organizations with limitless data containing valuable information on sentiment, people, events, employers, relationships and more.   The ability to extract meaning from unstructured sources combined with structured data yields new insights that can be used to improve decisions. 


Let's take a look at healthcare, for example.


In an article by Dennis Amorosano entitled "Unstructured data a common hurdle to achieving guidelines", Mr. Amorosano writes "... of the 1.2 billion clinical documents produced in the United States each year, approximately 60 percent contain valuable information trapped in unstructured documents that are unavailable for clinical use, quality measurement and data mining. These paper documents have until now been the natural byproduct of most hospital workflows, as healthcare is one of the most document-intensive industries."


Forbes published an article last year entitled "The Next Revolution in Healthcare"  ( in which the author points out that the best healthcare institutions in the world still rely heavily on calculating risk to patients using clinical data.  At the same time "the real tragedy is that the information needed to properly assess the patient’s risk and determine treatment is available in the clinician’s notes, but without the proper tools the knowledge remains unavailable and hence, unused."


The good news is that new analytic solutions are available that leverage both forms of data.   BI connectivity brings the power of familiar Business tools to your applications that include unstructured data. Some of the benefits to this approach include:


  • Combining BI and NoSQL provides capabilities not available using relational stores and EDWs - real-time analysis and extended query features.
  • BI tools layer on top of NoSQL databases that use sophisticated security models to protect sensitive data. Users see only the information for which they have permissions.
  • Analysts can learn faster using data discovery tools that allow for rapid investigation of both unstructured and structured data within the same application.  A more complete view of your analysis offers tremendous advantages in patient diagnosis, claims analysis and personalized care.


To learn more about how analytics technology is working with Enterprise NoSQL Databases ideally suited to ingest, store, search and analyze all types of data, you can visit this page:


No comment yet.
Scooped by Tony Agresta!

Get Your Facts Straight: We've Had Enterprise-Grade Security Longer

Visit to read Get Your Facts Straight: We've Had Enterprise-Grade Security Longer.
Tony Agresta's insight:

This blog post says it all about enterprise-grade security for MarkLogic's Enterprise NoSQL database.   It includes a clear definition of enterprise-grade security including the points referenced below.  The fact is, MarkLogic has had this for over 10 years.  

  • Internal Authentication
  • Permission Management
  • Data Auditing
  • Client to Node Encryption
  • External Security Firm Validation
  • Transparant Data Encryption
  • Compartment Security

One of the primary reasons government agencies have used MarkLogic for years is because of this fact.   They would never trust mission critical operations and data to any technology that wasn't enterprise hardened.

Read more here:

No comment yet.
Scooped by Tony Agresta!

The new reality for Business Intelligence and Big Data

The new reality for Business Intelligence and Big Data | Big Data Technology, Semantics and Analytics |
You know about Big Data and its potential, how it creates greater understanding of our world, reduces waste and misuse of resources, and dramatically increases efficiency.
Tony Agresta's insight:

Data discovery tools allow you to reveal hidden insights in your data when you don't know what to look for in advance.  These highly interactive tools allow you to visualize disparate data in various forms - charts, timelines, graphs, geo-spatial and tables – and explore relationships in data to uncover patterns that static dashboards cannot.  


With the explosion of big data, organizations are now using these tools with structured, semi-structured and unstructured data.  This approach allows them to consolidate data without having to build complex schemas, search the data instantly, deliver new content products dynamically and analyze all of their data in real time.  A transformational shift in data analysis is underway allowing organizations to do this with documents, e-mails, video and other sources.   Imagine if you could load data into Hadoop, enrich it, ingest the data into an enterprise NoSQL database in real-time, index everything for instant search & discovery and analyze that data using Tableau or Cognos.   As the only Enterprise NoSQL Database on the market, MarkLogic allows you to do just that.


You can learn more here:


No comment yet.
Rescooped by Tony Agresta from visual data!

[INFOGRAPHIC] BIG DATA: What Your IT Team Wants You To Know

[INFOGRAPHIC] BIG DATA: What Your IT Team Wants You To Know | Big Data Technology, Semantics and Analytics |

The purpose of Big Data is to supply companies with actionable information on any variety of aspects. But this is proving to be far more difficult than it looks with over half of Big Data projects left uncompleted.

Two of the most often reported reasons for project failures are a lack of expertise in data analysis. Reports show that data processing, management and analysis are all difficult in any phase of the project, with IT teams citing each of those reasons more than 40% of the time.

However, failures in Big Data projects may not solely lie on faulty project management. In a recent survey, a staggering 80% of Big Data’s biggest challenges are from a lack of appropriate talent. The field’s relative infancy is making it hard to find the necessary staff to see projects through, resulting in underutilized data and missed project goals.

IT teams are quickly recognizing a chasm between executives and frontline staffers whose job it is to apply findings from Big Data. In the end,it may not be the anticipated cure-all for 21st century business management. It is only as good as good as the system that runs it.

Via Peter Azzopardi, Berend de Jonge, Lauren Moss
Tony Agresta's insight:

Very interesting infographic.  Why do they fail?  For all of the reasons above and then some...    Over 80% of the data being collected today is unstructured and not readily stored in relational database technology burdened by complex extract, transform and load.  There's also pre-existing data, sometimes referred to as "dark data" that includes documents which need to be included and made discoverable for a host of reasons - compliance and regulatory issues are one.   Log activity and e-mail traffic used to detect cyber threats and mitigate risk through analysis of file transfers is yet another set of data that requires immediate attention.


Social and mobile are clearly channels that need to be addressed as organizations continue to mine data from the open web in support of CRM, product alerts, real time advertising options and more.  


To accomplish all of this, organizations need a platform with enterprise hardened technology that can ingest all of these forms of data in real time, without having to write complex schemas.   Getting back to the point - What do most projects fail?   If companies attempt to do this with technology that is not reliable, not durable and does not leverage the skills of their existing development organization, the project will fail.  


We have seen this time and time again.   MarkLogic to the rescue.   With over 350 customers and 500 big data applications, our Enterprise NoSQL approach mitigates the risk.  Why?  Our technology stack includes connectors to Hadoop, integration with leading analytics tools using SQL, Java and Rest APIs, JSON support, real time data ingestion, the ability to handle any form of data, alerting, in database analytics functions, high availability, replication, security and a lot more.  


When you match this technology with a world-class services organization with proven implementation skills, we can guarantee your next Big Data project will work.  We have done it hundreds of times with the largest companies in the world and very, very big data.

Olivier Vandelaer's curator insight, January 30, 2013 2:45 AM

Looking at the infographic, it clearly reminds me about the start of "Enterprise Data Warehouse": failures by "Innacurate scope", "Technical Roadblocks" & "Siloed data and no collaboration". It looks so familiar.

Adrian Carr's curator insight, January 30, 2013 10:27 AM

This is a great infographic - it shows that whilst everyone is doing it (it being "Big Data" - whatever that is...), talent is rare, technology is hard to find and the projects never end.  A far cry from the speed with which companies such as the BBC deployed MarkLogic to serve all data for the sport websites through the Olympics.  Now that was big data, delivered by a talented team in a short space of time.

Scooped by Tony Agresta!

MarkLogic Server - Technology Preview: HDFS Storage

An introduction to the HDFS Storage feature available as a technology preview from MarkLogic.
Tony Agresta's insight:

Seamlessly combine the power of MapReduce with MarkLogic’s real-time, interactive analysis and indexing on a single, unified platform.

  • Get more power out of Hadoop. Hadoop and MarkLogic together can allow you to tackle problems that would be difficult or impossible to address by either technology alone.
  • Save money by leveraging common infrastructure. Using MarkLogic and Hadoop Distributed File System (HDFS) enables common batch-processing infrastructure to be used across many different projects and applications.
  • Enterprise-class support for Hadoop. Our partnership with Hortonworks provides a strong, supported platform for building enterprise-class Big Data Applications with Apache Hadoop.

No comment yet.
Scooped by Tony Agresta!

With MarkLogic Search Technology, Factiva Enables Standardized Search And Improved Experiences Across Dow Jones Digital Network, -

With MarkLogic Search Technology, Factiva Enables Standardized Search And Improved Experiences Across Dow Jones Digital Network, - | Big Data Technology, Semantics and Analytics |
With MarkLogic Search Technology, Factiva Enables Standardized Search And Improved Experiences Across Dow Jones Digital Network,
Tony Agresta's insight:

Let's not forget that enterprise search is often a critical component in most big data applications.  Users evaluating NoSQL technology should investigate the degree to which the technology supports search.


Here are some of the benefits that Dow Jones found in the MarkLogic Enterprise approach:


  • One powerful, unified search platform to service the search needs of both consumer and enterprise customers.
  • The enhancements they make will be scalable and efficiently accessible to everyone.
  • Support for dynamic taxonomy coding of content. For example, the same person  (Barack Obama and President Obama) or company (BP and British Petroleum) may be referred to and discovered without requiring the user to specify all variations of what they are seeking.
  • Improved ability to use "proximity of search" elements allowing users to search for words that may be in close proximity to others words thereby increasing the power and relevance of the search results.
  • Fast response to very precise search
  • Personalized search that customizes results using search history patterns and interests
  • Narrowing the search result set to improve direct access to relevant content within the structured or unstructured data, 
  • Time savings through easy detection of related sources of content


If this topic interests you, I would suggest looking at the Search capabilities in MarkLogic 6. You can find them here:  Search Developer Guide.   Some of the guide is technical but a lot of it summarizes the types of applications you can build with MarkLogic.  It will provide you with ideas on what you can include in your search applications.



No comment yet.
Scooped by Tony Agresta!

Insights for 2013: Understanding Your Customers & The Full Value of Digital - Analytics Blog

Insights for 2013: Understanding Your Customers & The Full Value of Digital - Analytics Blog | Big Data Technology, Semantics and Analytics |
Tony Agresta's insight:

With digital customer interactions exploding across more channels than ever before, marketing mix optimization has become even more complex.   What if customers interactions could be stored in one database, ingested from many sources?  What if customer conversion metrics could be calculated in that database in real time?  What if you could interactively analyze campaigns to identify the over and underperforming ones?   With so many clicks, opens, web page visits and more, this may seem daunting.


But it is possible.   Today, most CRM systems allow users to store this data but aren’t all that friendly when it comes to analysis.   Big data technology integrated with interactive data analysis tools allows you to ingest different both structured and unstructured data providing marketing analysts with the ability to search the data, isolate sets of data for analysis and work with applications that display the ad or video alongside data visualization widgets that measure response,  performance and trends over time.   Notifications triggered using pre-built alerts tell you when conversion rates exceed predefined thresholds or fall below expectations.


Not many companies have applied enterprise NoSQL technology for marketing applications like but all of the technology components to accomplish this have been deployed in applications that share some of the same characteristics.  Look at the excellent work the BBC did at last year's Summer Olympics.   They ingested real time data feeds from over 20 venues, twitter and authored articles. They consolidated this with photos, bios, player statistics - all on one website and all with sub-second, instantaneous response.    The user experience was remarkable – stickiness personified.


Measuring advertising and campaign performance along with historical visitor traffic data is not a far cry from this type of application.  The end game is improved loyalty, optimized ad spend, faster decision making and increased sales.


No comment yet.