Big Data Technology, Semantics and Analytics
12.1K views | +0 today
Big Data Technology, Semantics and Analytics
Trends, success and applications for big data including the use of semantic technology
Curated by Tony Agresta
Your new post is loading...
Your new post is loading...
Scooped by Tony Agresta!

Importance of NoSQL to Discovery - A Data Analysis Road Map You Can Apply Today

Importance of NoSQL to Discovery - A Data Analysis Road Map You Can Apply Today | Big Data Technology, Semantics and Analytics |
When you use the analytical process known as discovery, I recommend that you look for tools and environments that allow you connect to NoSQL platforms
Tony Agresta's insight:

The convergence of data visualization and NoSQL is becoming a hotter topic every day.  We're at the very beginning of this movement  as organizations integrate many forms of data with technology to visualize relationships and detect patterns across and within data sets.  There aren't many vendors that do this well today and demand is growing.  Some organizations are trying to achieve big data visualization through data science as a service.   Some software companies have created connectors to NoSQL (and other) data sources to reach this goal.  As you would expect, deployment options run the gamut. 

Examples of companies that offer data visualization generated from a variety of data sources including NoSQL are Centrifuge Systems who displays results in the form of relationship graphs, Pentaho who provides a full array of analytics including data visualization and predictive analytics and Tableau who supports dozens of data sources along with great charting and other forms of visualization.   Regardless of which you choose (and there are others), the process you apply to select and analyze the data will be important.  

In the article, John L Myers discusses some of the challenges users face with data discovery technology (DDT).  Since DDT operates from the premise that you don’t know all the answers  in advance, it’s more difficult to pinpoint the sources needed in the analysis.    Analysts discover insights as they navigate through the data visualizations.  This challenge isn’t too distant from what predictive modelers face as they decide what variables they want to feed into models.  They oftentimes don’t know what the strongest predictors will be so they apply their experience to carefully select data.  They sometimes transform specific fields allowing an attribute to exhibit greater explanatory power.   BI experts have long struggled with the same issue as they try and decide what metrics and dashboards will be most useful to the business.  

Here are some guidelines that may help you solve the problem.   They can be used to plan your approach to data analysis.

  • Start by writing down a hypothesis you want to prove before you connect to specific sources.  What do you want to explore?  What do you want to prove?  In some cases, you'll want to prove many things. That's fine.   Write down your top ones.
  • For each hypothesis create a list of specific questions you want to ask the data that could prove or disprove the hypothesis.   You may have 20 or 30 questions for each hypothesis.
  • Find the data sources that have the data you need to answer the questions.  What data will you need to arrive at a conclusion? 
  • Begin to profile each field to see how complete the data is.   In other words, take an inventory of the data checking to see if there are a missing values, data quality errors or values that make the specific source a good one. This may point back to changes in data collection needed by your current systems or processes. 
  • Go a layer deeper in your charting and profiling beyond histograms to show relationships between variables you believe will be helpful as you attempt to answer your list of questions and prove or disprove your hypothesis.  Show some relationships between two or more variables using heat maps, cross tabs and drill charts.
  • Reassess your original hypothesis.  Do you have the necessary data?  Or do you need to request additional types of data?
  • Once you are set on the inventory of data and you have the tools to connect to those sources, create a set of visualizations to resolve the answers to each of the questions.  In some cases, it may be 4 or 5 visualizations for each question.  Sometimes, you will be able to answer the question with one visualization.
  • Assemble the results for each question to prove or disprove the hypothesis.    You should arrive at a nice storyboard approach that, when assembled in the right order, allows you to articulate the steps in the analysis and draw conclusions needed to run your business.     

If you take these steps upfront and work with a tool that allows you to easily connect to a variety of data sources, you can quickly test your theory, profile and adjust the variables used in your analysis and create meaningful results the organization can use.  But if you go into the exercise without any data planning, without any goals in mind, you are bound to waste cycle times trying to decide what to include in your analysis and what not to include.    Granted, you won't be able to account for every data analysis issue your department or company has.   The purpose of this exercise is to frame the questions you want to ask of the data in support of a more directed approach to data visualization. 

Intelligence-led-decisions should be well received by your cohorts and applied more readily with this type of up front planning.  The steps you take to analyze the data will run more smoothly.   You will be able to explain and better defend the data visualization path you've taken to arrive at conclusions.  In other words, the story will be more clear when you present it. 

Consider the types of visualizations supported by the analytics technology when you do this. Will you need temporal analysis?   Will you require relationship graphs that show connections between people, events, organizations and more?    Do you need geospatial visualizations to prove your hypothesis?  A little bit of planning when using data discovery and NoSQL technology will go a long way in meeting your analytical needs. 

No comment yet.
Scooped by Tony Agresta!

Is MarkLogic the Adult at the NoSQL Party? - Datanami

Is MarkLogic the Adult at the NoSQL Party? - Datanami | Big Data Technology, Semantics and Analytics |
If big data is a big party, Hadoop would be at the center and be surrounded by Hive, Giraph, YARN, NoSQL, and the other exotic technologies that generate so much excitement.
Tony Agresta's insight:

MarkLogic continues to enhance it's approach to NoSQL further confirming it's the adult at the party.  MarkLogic 7 includes enhancements to enterprise search as well as Tiered Storage and integration with Hadoop. 

MarkLogic Semantics, also part of MarkLogic 7, provides organizations with the ability to enhance the content experience for users by including an even richer experience that includes semantic facts, documents and values in the same search experience.   By doing this, organizations can surface semantic facts stored in MarkLogic when users are searching for a topic or person of interest.  For example, if a user searches all unstructured data on a topic, facts about authors, publication dates, related articles and other facts about the topic would be part of the search results.

This could be applied in many ways.  Intelligence analysts may be interested in facts about people of interest.  Fraud and AML analysts could be interested in facts about customers with unusual transaction behavior.   Life Sciences companies may want to include documents, facts about the drug manufacturing process and values about pharma products as part of the search results.

Today, traditional search applications are being replaced by smarter, content rich semantic search.    This addition to MarkLogic continues to confirm that all of this can be done within a single, unified architecture saving organizations development time, money and resources while delivering enterprise grade technology used in the most mission critical applications today. 

No comment yet.
Scooped by Tony Agresta!

DBTA: Unleashing the Power of Hadoop for Big Data Analytics

Tony Agresta's insight:

Great paper that covers how you can make Hadoop really powerful.   Not all data is created equal.  Some is needed in real time.  Some requires less expensive storage options.  Some you may need to quickly migrate from HDFS to MarkLogic.  This paper is the perfect road map to understand how you can unleash the power of Hadoop.  No registration required to download  a copy. 

No comment yet.
Scooped by Tony Agresta!

Who are the Big Data Influencers?

Who are the Big Data Influencers? | Big Data Technology, Semantics and Analytics |
Tony Agresta's insight:

Here's a nice list of bloggers, tweeters and general influencers in the Big Data space.    The post makes the following key points worth noting:

  • You can now reach out to the influencer using Twitter, email, phone or any other appropriate way with the increased conviction that follows from knowing that you are being highly relevant to them. In fact – in most cases they are likely to thank you for bringing the relevant material to their attention and in many cases they will share their “find” with others.
  • The net effect of this solution is that your evangelists spend time on being relevant and building relationships with influencers – rather than spending time looking for opportunities to engage; and just as a sales team that works off a steady stream of hot leads performs better than one that has to find their own leads, your evangelists will help win significantly more hearts, minds and market share.

No comment yet.
Scooped by Tony Agresta!

Gus Hunt on the importance of Network Graphs & Big Data

Gus Hunt on the importance of Network Graphs & Big Data | Big Data Technology, Semantics and Analytics |
Tony Agresta's insight:

What do people care about most when trying to identify relationships in big data?  Connection points between people, places, organizations, events, things, concepts and time.  How can this be done against massive volumes of data?  Through the use of network graphs driven by semantic processing of unstructured data and consolidated information.  We are at high noon in the information age, the cusp of grasping all of the data and turning it into intelligence.  This video by Gus Hunt, the CTO of the CIA, summarizes the requirements and challenges to get this done. 

No comment yet.
Scooped by Tony Agresta!

State Street's Chief Scientist on How to Tame Big Data Using Semantics

State Street's Chief Scientist on How to Tame Big Data Using Semantics | Big Data Technology, Semantics and Analytics |
Semantic databases are the next frontier in managing big data, says State Street's David Saul.
Tony Agresta's insight:

Here’s a good article on how financial institutions will use semantics to understand and manage risk.   It sounds to me that facts about people, transactions and the market, for example, can be derived from all types and sources of data including unstructured data in documents.   The relationships and connections between these facts can be stored, searched and analyzed.   Adding in the dimension of time would allow you to see when the relationships were formed.  Looking at the connection points in the form of a graph would allow analysts to identify networks that reveal individuals central to the graph that take on new importance.

Imagine if you could search a series of data sources that include information about customers for transactions over a certain level.   Imagine if you could identify all the associated people (employees and other customers) linked to the transactions.  What would you see if you could take all of these related facts and graph them in the form of a social network to visually show the connection points between people, addresses, institutions, lending officers and more?  

The combination of semantics to extract meaning from unstructured data, search, data visualization and analysis could reveal high risk transactions along with links to other individuals.  How are they connected?  Integrating data from the open web and thid party sources might reveal important insights involving past employers, educational institutions, property owned and residential addresses.  Technology to support this scenario using massive amount of consolidated intelligence is not far away. 

No comment yet.
Scooped by Tony Agresta!

Semantic Technologies in MarkLogic - World Class Triple Store in Version 7

Tony Agresta's insight:

This video is a fantastic overview from MarkLogic's Stephen Buxton, John Snelson and Micah Dubinko covering semantic processing, use cases for triple stores that include richer search & graph applications and the expanded architecture in MarkLogic 7.    It's an hour in length but well worth the time if you're interested in understanding how you can use documents, facts derived from text and values to build ground breaking applications.   Databases as we know them will change forever with the convergence of enterprise nosql, search and semantic processing.  This video provides you with the foundation to understand this important change in database technology.

No comment yet.
Scooped by Tony Agresta!

RDF 101 - Cambridge Semantics

RDF 101 - Cambridge Semantics | Big Data Technology, Semantics and Analytics |
Semantic University tutorial and introduction to RDF.
Tony Agresta's insight:

This post is a bit technical but recommending reading since it represents a one of the most important aspects in data science today, the ability to discern meaning from unstructured data. 

To truly appreciate the significance of this technology, consider the following questions -  Since over 80% of the data being created today is unstructured in forms that include text, videos, images, documents, and more, how can organizations interpret the meaning behind this data?   How can they pull out the facts in the data and show relationships between those facts leading to new insights?


This post provides you with the foundation on how semantic processing works.   Cutting to the chase, the technologies are referred to as RDF (Resource Description Framework), SPARQL and OWL. They allow us to create underlying data models that understand relationships in unstructured data, even across web sites, document repositories and disparate applications like Facebook and LinkedIn.


These data models store data that has properties extracted from the unstructured data.  Consider the following sentence:   "The Monkeys are destroying Tom's garden."  Semantic processing of this text would deconstruct the sentence identifying the subject, object and predicate while also building relationships between the three.  The subject is "monkeys" and they are taking an action on "the garden". The garden is therefore the object and the predicate is "destroying".  


Most importantly, there is a connection made between the monkeys and the garden allowing us to show relationships between specific facts pulled from text.   How can this help us? 


Assume for a second you’re working for a government agency tracking a suspicious person who exists on a watch list?   Crawling the web looking for that person's name is one way to identify additional information about the person.   Technology to do this exists today.  When the name is detected, identifying relationships between the person being investigated and other subjects (or objects in the text) can lead you to new people that may also be of interest.  For examples if Sam is on the watch list a sentence like this would be of interest:  "Sam works with Steve at ABC Home Builders Corp.”  Relationships between the suspect (Sam) and someone new in the investigation (Steve) could be identified.   The fact that they both work for the same employer allows analysts to connect the subjects through this employer.


These interesting facts help investigators make connections within e-mail, phone conversations, in house data and other sources, all of which can be displayed visually in a graph to show the subjects and how they are linked. 


Data models to store, search and analyze this data will become one of the primary tools to interpret massive amounts of data being collected today.  This technology allows computers to understand relationships in unstructured data and display those relationships to analysts in the form of visual diagrams that clearly show connections to other data including phone calls, events, accounts, and more.  The implications of this extend far beyond counter terrorism to include social networking, marketing, fraud, cyber security and sales to name a few.

We are at an inflexion point in big data – data stored in silos can now be consolidated with external data from the open web.  Most importantly, the unstructured data can be interpreted as we form connections that are integral in understanding how things are related to each other.  Data visualization technology is the vehicle to display those connections allowing analysts to explore any form of data in a single application. 

Learn more about this technology and other advances in Enterprise NoSQL here.

No comment yet.
Scooped by Tony Agresta!

NSA collecting phone records of millions of Verizon customers daily

NSA collecting phone records of millions of Verizon customers daily | Big Data Technology, Semantics and Analytics |
Exclusive: Top secret court order requiring Verizon to hand over all call data shows scale of domestic surveillance under Obama administration
Tony Agresta's insight:

In my opinion, this demonstrates one of the most important aspects of big data analysis and should continue.    Call data, including outbound call numbers, inbound call numbers, duration of call, start time and end time, are vital pieces of information necessary to analyze and protect us.  US lives are at risk.  The article by Glenn Greennwald in The Guardian states:


"Such metadata is what the US government has long attempted to obtain in order to discover an individual's network of associations and communication patterns. The request for the bulk collection of all Verizon domestic telephone records indicates that the agency is continuing some version of the data-mining program begun by the Bush administration in the immediate aftermath of the 9/11 attack."


Let's get to the heart of the matter.   This type of big data analysis has prevented attacks and has proven to work.   At this moment, all of the facts of these investigations are not disclosed (and many classified examples will probably never be revealed) but many news outlets are reporting that a planned bomb attack on the NY subway was diverted because of phone and e-mail intercept and analysis. 


For intelligence analysts to do this accurately and completely they need to analyze the haystack of data represented by e-mail, phone and other forms of communication.   For example, if an e-mail from someone is intercepted because that person is corresponding about bomb recipes, our government should have access to the calls this person made and received. They should be allowed to analyze ALL of the connections between other callers to discern whether or not a network of clandestine activity exists.  If warranted, our government should be able to look at the content of the unstructured data to determine if there are other people, events or places referenced in the text.  Initiating these investigations by analyzing all e-mail traffic or all phone calls made to known or suspected terrorists protects everyone of us.


Techniques and processes to ingest, search and manage massive amounts of data like phone records and e-mail traffic are being used today. This should not surprise anyone.  Connecting the dots between caller metadata and known or suspected terrorists is one very effective way to maintain the safety of US citizens.   If this means the NSA needs to look more closely at other calls that suspects have made, so be it.  These preventative measures are in place to safeguard our freedom and prevent lives from being lost.   Haven't we seen enough of that?


No comment yet.
Scooped by Tony Agresta!

Visa Says Big Data Identifies Billions of Dollars in Fraud

Visa Says Big Data Identifies Billions of Dollars in Fraud | Big Data Technology, Semantics and Analytics |
Visa’s chief enterprise risk officer, Ellen Richey, says “you see the criminal capability evolving on the technology side.” She gives CIO Journal an inside look at how the company has used Big Data to make its network more secure...
Tony Agresta's insight:

The approach Visa takes in identifying fraud is grounded in 16 different predictive models and allows for new independent variables to be added to the model.  This improves accuracy while alowing the models to be kept up to date.  Here's an excerpt from the WJS Article:


"The new analytic engine can study as many as 500 aspects of a transaction at once. That’s a sharp improvement from 2005, when the company’s previous analytic engine could study only 40 aspects at once. And instead of using just one analytic model, as it did in 2005, Visa now operates 16 models, covering different segments of its market, such as geographic regions."


The article also states that the analytics engine has the card number and not the personal information about the transaction - likley stored in a different system.  I wonder if Visa, at some point in the process, also takes the fraud transactions and analyzes them visually to identify connections and linkages based on address, other geographic identifiers, 3rd party data, employer data and more?  Are two or more of the fraud cases in some way connected?  Does this represent a ring of activity presening higher risk to merchants, customers and Visa?


The tools on the market to do this work are expanding.   The data used to analyze this activity (including unstructured data) is being stored in databases that allow for the visual analysis of big data.  Graph databases replete with underlying intelligence extracted from text that identify people, places and events can be used to extend the type of analysis that Visa is doing and prioritize investigations.   Through more efficient allocation of investigation resources, fraud prevention can jump to a higher level.

luiy's curator insight, April 27, 2013 2:37 PM

“From the strategic point of view, we are achieving an amazing improvement, year over year, in our ability to detect fraud,” says Richey. “It’s not just our ability to analyze our transactions, but our ability to add new kinds of data, such as geo-location, to that analysis. With every new type of data, we increase the accuracy of our models. And from a strategic point of view we can think about taking and additional step change of fraud out of our system.”

In the future, Big Data will play a bigger role in authenticating users, reducing the need for the system to ask users for multiple proofs of their identify, according to Richey, and 90% or more of transactions will be processed without asking customers those extra questions, because algorithms that analyze their behavior and the context of the transaction will dispel doubts. “Data and authentication will come together,” Richey said.

The data-driven improvement in security accomplishes two strategic goals at once, according to Richey. It improves security itself, and it increases trust in the brand, which is critical for the growth and well-being of the business, because consumers won’t put up with a lot of credit-card fraud. “To my mind, that is the importance of the security improvements we are seeing,” she said. “Our investments in data and analysis are baseline to our ability to thrive and grow as a company.”

Scooped by Tony Agresta!

Get Your Facts Straight: We've Had Enterprise-Grade Security Longer

Visit to read Get Your Facts Straight: We've Had Enterprise-Grade Security Longer.
Tony Agresta's insight:

This blog post says it all about enterprise-grade security for MarkLogic's Enterprise NoSQL database.   It includes a clear definition of enterprise-grade security including the points referenced below.  The fact is, MarkLogic has had this for over 10 years.  

  • Internal Authentication
  • Permission Management
  • Data Auditing
  • Client to Node Encryption
  • External Security Firm Validation
  • Transparant Data Encryption
  • Compartment Security

One of the primary reasons government agencies have used MarkLogic for years is because of this fact.   They would never trust mission critical operations and data to any technology that wasn't enterprise hardened.

Read more here:

No comment yet.
Scooped by Tony Agresta!


Tony Agresta's insight:

Followers may be interested in this white paper from The Bloor Group which summarizes the differences between database technologies  It's meaty.

Here are a few additional points that Bloor has written about MarkLogic's Enterprise NoSQL approach:

  • MarkLogic is also a true transactional database. Most NoSQL databases have compromised the ACID (Atomicity, Consistency, Isolation and Durability) properties that are important for transaction processing, MarkLogic is fully equipped to be a transactional database, and if you simply wanted to use it for order processing, there would be no problem in doing so.
  • The database has been built to enable rapid search of its content in a similar manner to the way that Google’s search capabilities have been built to enable rapid search of the Internet.
  • As some of MarkLogic’s implementations have grown to above the single petabyte level, fast search of massive amounts of data is one of its most important features. To enable its search capability MarkLogic indexes everything on ingest; not just the data, but also the XML metadata. This provides it with the ability to search both text and structure. For example, you might want to quickly find someone’s phone number from a collection of emails.
  • With MarkLogic you could pose a query such as: “Find all emails sent by Jonathan Jones, sort in reverse order by time and locate the latest email that contains a phone number in its signature block.”
  • You may be able to deduce from this that Mark Logic knows what an email is, knows how to determine who the sender is, knows what a signature block is and knows how to identify a phone number from within the signature block. If you were looking for a mobile phone number then you would simply add the word “mobile” in front of phone number. It should be clear from this that very few databases could handle such a query, because most databases are straight-jacketed by whatever version of SQL they implement and, even if it were possible to bend SQL in such a way as to formulate this kind of query, most databases cannot dig into data structures they hold in the way that MarkLogic.

With the release of MarkLogic 6 last fall, MarkLogic also provided SQL support through integration with Tableau and Cognos, in-database analytic functions, JSON support, JAVA and REST APIs and more.  For more information on this release, you can go here:

No comment yet.
Scooped by Tony Agresta!

Big Data “Hype” Coming To An End | SiliconANGLE

Big Data “Hype” Coming To An End | SiliconANGLE | Big Data Technology, Semantics and Analytics |
Tony Agresta's insight:

"Organizations have fascinating ideas, but they are disappointed with a difficulty in figuring out reliable solutions,” writes Sicular from The Gartner Group.


"Their disappointment applies to more advanced cases of sentiment analysis, which go beyond traditional vendor offerings.  Difficulties are also abundant when organizations work on new ideas, which depend on factors that have been traditionally outside of their industry competence, e.g. linking a variety of unstructured data sources.”


Today, organizations are coming to the realization that free or low cost open source technology to handle big data requires intense development cycles that burn costs and time.  Solving demanding challenges in these four areas has proven difficult:


  • Search & Discovery
  • Content Delivery
  • Analytics and Information Products
  • Data Consolidation


Organizations need to work with proven technology that's reliable and durable.  They need to work with technology that handles ACID transactions, enterprise security, high availability, replication, real time indexing and alerting - without having to right 10,000+ lines of code. 


Major financial institutions, healthcare payors, government agencies, media giants, energy companies, and state & local organizations have standardized on big data technology proven to increase developer productivity, create new revenue streams and address mission critical operations in a post 9-11 era. 


Adrian Carr's curator insight, February 11, 2013 11:11 AM

IT does it again.  Build a technology up until we start to believe it will solve all world problems.  It generates huge "science projects" and then everything comes tumbling down.  Finally a voice of reason says...maybe we set expectations unrealistically...One more trough of disillussionment !

Scooped by Tony Agresta!

Semantics: The Next Big Issue in Big Data

Semantics: The Next Big Issue in Big Data | Big Data Technology, Semantics and Analytics |
State Street s David Saul argues big data is better when it s smart data.
Tony Agresta's insight:

Banking, like many industries, faces challenges in the area of data consolidation.  Addressing this challenge can require the use of semantic technology to accomplish the following:


  • A common taxonomy across banking divisions allowing everyone to speak the same language
  • Applications that integrate data including structured data with unstructured data and semantic facts about trading instruments, transactions that pose risk and derivatives
  • Ways to search all of the data instantly and represent results using different types of analysis, data visualization or through relevance rankings that highlight risk to the bank.


"What's needed is a robust data governance structure that puts underlying meaning to the information.  You can have the technology and have the standards, but within your organization, if you don't know who owns the data, who's responsible for the data, then you don't have good control."


Some organizations have built data governance taxonomies to identify the important pieces of data that need to be surfaced in rich semantic applications focused on risk or CRM, for example.  Taxonomies and ontologies understand how data is classified and relationships between the types of data.  In turn, they can be used to create facts about the data which can be stored in modern databases (enterprise NoSQL) and used to drive smart applications. 


Lee Fulmer, a London-based managing director of cash management for JPMorgan Chase says the creation of [data governance] standards is paramount for fueling adoption, because even if global banks can work out internal data issues, they still have differing regulatory regimes across borders that will require that the data be adapted.


"The big paradigm shift that we need, that would allow us to leverage technology to improve how we do our regulatory agenda in our banking system.  If we can come up with a set of standards where we do the same sanction reporting, same format, same data transcription, same data transmission services, to the Canadians, to the Americans, to the British, to the Japanese, it would reduce a huge amount of costs in all of our banks."


Semantic technology is becoming an essential way to govern data, create a common language, build rich applications and, in turn, reduce risk, meet regulatory requirements and reduce costs. 


No comment yet.
Scooped by Tony Agresta!

Triplestores Rise In Popularity

Triplestores Rise In Popularity | Big Data Technology, Semantics and Analytics |

Triplestores:  An Introduction and Applications

Tony Agresta's insight:

Triplestores are gaining in popularity.  This article does a nice job at describing what triple stores are and how they differ from graph databases.  But there isn't much in the article on how triple stores are used.  So here goes:

Some organizations are finding that when they apply a combination of semantic facts (triples) with other forms of unstructured and structured data, they can build extremely rich content applications.   In some cases, content pages are constructed dynamically.    The context based applications deliver targeted, relevant results creating a unique user experience.  Single unified architectures that can store and search semantic facts, documents and values at the same time require fewer IT and data processing resources resulting in shorter time to market.  Enterprise grade technology provides the security, replication, availability, role based access and the assurance no data is lost in the process.  Real time indexing provides instant results.

Other organizations are using triples stores and graph databases to  visually show connections useful in uncovering intelligence about your data.    These tools connect to Triplestores and NoSQL databases easily allowing users to configure graphs to show how the data is connected.   There's wide applicability for this but common use cases include identifying fraud and money laundering networks, counter-terrorism, social network analysis, sales performance, cyber security and IT asset management.  The triples, documents and values provide the fuel for  the visualization engine allowing for comprehensive data discovery and faster business decisions.

Other organizations focus on semantic enrichment and then ingest resulting semantic facts into triplestores to enhance the applications mentioned above.  Semantic enrichment extracts meaning from free flowing text and identifies triples.

Today, the growth in open data - pre-built triple stores - is allowing organizations to integrate semantic facts to create richer content applications.   There are hundreds of sources of triple stores that contain tens of billions of triples, all free.

What's most important about these approaches?  Your organization can easily integrate all forms of data in a single unified architecture.  The data is driving smart websites, rich search applications and powerful approaches to data visualization.   This is worth looking at more closely since the end results are more customers, lower costs, greater insights and happier users.

No comment yet.
Scooped by Tony Agresta!

The Age of Big Data - Predicting Crime After Shocks

Tony Agresta's insight:

The BBC documentary follows people who mine Big Data, including the Los Angeles Police Department (LAPD) who uses data to predict crime.  It's proven that historical patterns can be used to predict future behavior.  With a database of over 13 million crimes spanning 80 years and real time continuous updates, the LAPD has applied mathematical algorithms and pattern recognition to identify crime hotspots.   Targeted police work has resulted in a 26% decrease in burglaries and a 12% decrease in property crimes. 


How does this work?  In the same way that earthquake aftershocks can be predicted, data miners analyzed historical crime statistics including location and timing.  They found patterns in the big data crime landscape.  By tracking the history, timing and location of crimes, they revealed that the probability another crime would occur in certain locales was higher.  They discovered patterns in the data.  In this case, the rate of crime and geospatial distribution of events were excellent predictors of future behavior including pinpointing small geograpic areas which they used to direct police resources. 


Today, these predictive aftershocks are becoming more accurate through the use of real time data feeds, alerts, geospatial analysis and temporal analysis.  Over 150 cities in the US are starting to apply these techniques allowing police officers to anticipate, focus, apprehend and therefore lower risk.   

Thanks to KD Nuggets for providing the link to the BBC video which is very well done.

No comment yet.
Scooped by Tony Agresta!

What's the Scoop on Hadoop?

What's the Scoop on Hadoop? | Big Data Technology, Semantics and Analytics |
If you are an investor in the field of Big Data, you must have heard the terms “Big Data” and “Hadoop” a million times.  Big Data pundits use the terms interchangeably and conversations might lead you to believe that...
Tony Agresta's insight:

"Hadoop is not great for low latency or ad-hoc analysis and it’s terrible for real-time analytics."

In a webcast today with Matt Aslett from 451 Research and Justin Makeig from MarkLogic, a wealth of inforrmation was presented about Hadoop including how it's used today and how MarkLogic extends Hadoop.  When the video becomes available, I'll post it but in the meantime, the quote from the Forbes article echoes what the speakers discussed today.

Today, Hadoop is used to store, process and integrate massive amounts of structured and unstructured data and is typically part of a database architecture that may include relational databases, NoSQL, Search and even Graph Databases.  Organizations can bulk load data into the Hadoop Distributed File System (HDFS) and process it with MapReduce.   Yarn is a  technology that's starting to gain traction enabling multiple applications to run on top of HDFS and process data in many ways. But it's still early stage.

What's missing?  Real Time Applications.  That's an understatement since reliability and security have also been question marks as well as limited support for SQL based analytics.   Complex configuration makes it difficult to apply Hadoop.

MarkLogic allows users to deploy an Enterprise NoSQL database into an existing Hadoop implementation and offers many advantages including:

  • Real time access to your data
  • Less data movement
  • Mixed workloads within the same infrastructure
  • Cost effective long term storage
  • The ability to leverage your existing infrastructure

Since all of your MarkLogic data can be stored in HDFS including indexes, you can combine local storage for active, real time results with lower cost tiered storage (HDFS) for data that's less relevant or needs additional processing.  MarkLogic allows you to partition your data, rebalance and migrate partitioned data interactively.

What does this mean for you?  You can optimize costs, performance and availability while also satisfying the needs of the business in the form of real time analytics, alerting and enterprise search. You can take data "off line" and then bring it back instantly since it's already indexed.  You can still process your data using batch programs in Hadoop but now all of this is done in a shared infrastructure. 

To learn more about MarkLogic and Hadoop, visit this Resource Center

When the video is live, I'll send a link out.

Bryan Borda's curator insight, July 19, 2013 11:39 AM

Excellent information on advantages to using NoSQL technology with a Hadoop infrastructure.  Take advantage of the existing Hadoop environment by adding powerful NoSQL features to enhance the value.

Scooped by Tony Agresta!

Social Media & Big Data in the Insurance Industry

Social Media & Big Data in the Insurance Industry | Big Data Technology, Semantics and Analytics |
According to a global industry survey, Insurers feel less prepared to deal with threats arising from social media and big data than they do about more familiar ones.,Insurer ,Technology
Tony Agresta's insight:

Insurance companies increased use of social media means bigger data is on the way.  In turn, the need for technology to manage this data will increase.


For example, insurance companies are using social to increase visibility for their brand and develop stronger customer relationships.   Chubb Insurance follows influencers and industry news on their twitter page.   They provide educational information to Chubb followers in attempt to build awareness and trust.


The use of social media in insurance extends beyond CRM.  Companies are listening to social media sites in an attempt to detect posts related to insurance claims.  They are detecting activities that could indicate a claimant has gone beyond what a physician would deem acceptable.


Traditional uses of social media to assess sentiment apply as well.   Customer service channels are better informed with real time feeds on positive and negative sentiment about their products and the industry as a whole.


Prospects shop for insurance products on line using communities and social networks.   Understanding when this happens helps insurance companies target their sales and marketing efforts.  Sharing bite size pieces of information directly with consumers allows insurance companies to overcome one of their main obstacles, distrust.


Social media has become an effective way to communicate with policy holders for events that may affect claims. Most of this is done post-catastrophic events but proactive approaches relating to health and wellness is another application of social communication in support of reduced risk and lower costs.


Big data technology to manage these applications allows Insurers to ingest massive volumes of data, wrap context and meaning around the unstructured content, search it in real time and deliver the facts to the right channels at the right time.

No comment yet.
Scooped by Tony Agresta!

Big Data & Predictive Analytics | SmartData Collective

Big Data & Predictive Analytics | SmartData Collective | Big Data Technology, Semantics and Analytics |
Predictive analysts usually think of doing predictive modeling on structured data pulled from a database.
Tony Agresta's insight:

This article covers some important information about predictive analysis.  The definition around the "unit of analysis" and the use of unstructured data as inputs to models are two. 

The article does not touch on how big data platforms can enhance the modeleing process.  When it comes to really big data, the exercise of building and deploying predictive models may be best suited to a platform approach.  Key ingredients would includ:

  • Hadoop to preprocess the data
  • Semantic enrichment of content
  • Real time indexing and scoring allowing business users to get instant access to results
  • Search applications that use model scores displayed with other relevant data to create a richer search experience
  • In-database analytics
  • Tools to pump data in and out of the database
  • Application services to quickly build applications that use the scores.  

There’s more to predictive modeling than just the modeling technology.  Optimizing the entire process may require the use of an Enterprise NoSQL database and platform.  This could end up being a very valuable part of your technology stack.  Applying this in support of data discovery, business intelligence and predictive analytics will improve chances of success,

There's a good video on this topic here:  Analytics and Information Products

No comment yet.
Scooped by Tony Agresta!

New Forms of Analytics Incorporate New Forms of Data

New Forms of Analytics Incorporate New Forms of Data | Big Data Technology, Semantics and Analytics |
Discover how MarkLogic NoSQL database solutions help you make better decisions,
faster, with MarkLogic SolutionTracks—a series of brief, easy-to-follow, whiteboard tutorials.
Tony Agresta's insight:

In decades past analytical approaches incorporated data from transactional systems, spreadsheets and other sources.  Challenges still exist today.  The time it takes to build and udpate data warehouses sucks the air out of the analytical process, especially for those needing intelligence in realtime. Most analysts interested in exploring data at the speed of the human mind have to wait and wait...


Today, increased competition, demands from the business and monumental masses of new forms of data exacerbate these challenges.  Timely analysis has become even more difficult.   In theory, new information products could support new revenue streams and increased customer satisfaction but many organizations struggle to achieve analytical nirvana. They simply can't get to all of the data.


Fortunately, solutions do exist to consolidate documents and other forms of unstructured data with traditional data in real time.   Dashboards no longer need to be static.   Predictive analysis can include explanatory data to provide lift in your models.  Interactive analysis allows busines owners to explore data at faster speeds.


Enterprise NoSQL databases ingest all of your data "as is", make it available through search and real time database connectivity, derive important statistical measures and provide the ability to build user defined functions running close to the database.  Alerts inform analysts when conditions are met using pre-built queries. 


Enterprise hardened analytical applications won't lose your data, provide high availability, backup & recovery and government grade security.   This warrants a closer look.

Using BI Tools and In Database Analytics with MarkLogic

No comment yet.
Scooped by Tony Agresta!

Updated Database Landscape map – June 2013 — Too much information

Updated Database Landscape map – June 2013 — Too much information | Big Data Technology, Semantics and Analytics |
Tony Agresta's insight:

MarkLogic is uniquely positioned on this database landscape map.    Here's what makes the position very different from other vendors:

1.  Search - MarkLogic is directly connected to all the major enterprise search vendors.   Recent recognition of this was confirmed by Gartner in its Enterprise Search Magic Quadrant.   Notice that other NoSQL technologies are nowhere close to this connection point.

2.  General Purpose - MarkLogic provides an enterprise NoSQL database and Application Services and Search.   With support for many development languages, REST and JAVA APIs, MarkLogic has clear links to SAP, Enterprise DB and a host of other database providers.

3.   Graph and Document - MarkLogic has long been recognized as a document store and used widely all over the world for this purpose.  Notice the subtle connection to Graph as well connecting MarkLogic to other vendors in this space like Neo4J.  MarkLogic 7 promises to deliver a world class triple store to index subjects, predicates and objects in XML documents or load other triples through the MarkLogic Content Pump.  For the first time, the only Enterprise NoSQL technology with search will include semantics.  Updated APIs and support for SPARQL are part of this release.

4.  Big Tables - MarkLogic's ability to handle big data has long been known.  The olive green line is designated for vendors like MarkLogic, Cassandra, HBASE, etc.   MarkLogic's partnership with Intel for Distribution of Apache Hadoop and the fact that MarkLogic ships with Hadoop connectors provide additional confirmation for this position.

5.   Key Value Stores - Data can be stored as keys without a database schema required lby relational databases.  In MarkLogic's case, huge quantities of data can be indexed in real time with data stored in memory and disk making search results instant and complete.   After a recent analysis of over 50+ MarkLogic customers, the abilty to quickly get up and running and deliver new information products to market was a business driver they mentioned over and over again.

The fact is, no one else on the list has all of these qualities.   Because of this unique position, visually you see MarkLogic distanced from other clusters or long lists of technology vendors.  

To learn more, you can go to MarkLogic Resources.

Scooped by Tony Agresta!

Graph Analysis Powers the Social and Semantic Spheres of Big Data —

Graph Analysis Powers the Social and Semantic Spheres of Big Data — | Big Data Technology, Semantics and Analytics |
Why predictive modeling of human behavior demands an end-to-end, low-latency database architecture
Tony Agresta's insight:

Here are some key points from the article in addition to some insights about graph analysis and big data:


  • Semantic graphs map relationships among words, concepts and other constructs in the human language allowing for unstructured data to be used in a graph showing important connections.
  • Graph analysis is not new.   It has been used as a form of data visualization to explore connections and identify patterns and relationship that would otherwise have gone undetected.
  • Some vendors have taken their graph capabilities to new levels. For example, Centrifuge Systems allows users to draw the graphs, search the graph space, interact with charts and display important measures about the graph network.   Analysts can easily pinpoint portions of the graph that require additional analysis.  Hotspots of interesting activity jump out from the graph based on the number of connections and important performance measures.
  • While social graphs may be the most popular, this approach is especially useful in detecting fraud networks, cyber data breaches, terrorist activity and more. 
  • One of the most important points is that graphs can incorporate diverse streams of big data including both structured and unstructured.  Imagine the ability to analyze banking wire transfer data in the same graph with unstructured data that includes names, locations, and employers - intelligence that has been discovered through the semantic processing of unstructured data.   That's a powerful combination of sources linking data from the open web with transactional information. When done in real-time, this can be used in anti-money laundering, fraud prevention and homeland defense.
  • "Data scientists explicitly build semantic graph models as ontologies, taxonomies, thesauri, and topic maps using tools that implement standards such as the W3C-developed Resource Description Framework (RDF)."


While this may be beyond the scope of many NoSQL and Hadoop databases, MarkLogic 7 is embracing triple stores as they continue to innovate on their Enterprise NoSQL approach. No one else has values, triple store data derived from semantic processing and documents with real time indexing and search - The bar for Enterprise  NoSQL is about to be raised again.


You can read more about this on Semantic Web:

No comment yet.
Scooped by Tony Agresta!

Data Analysis and Unstructured Data: Expanding Business Intelligence (BI) by Thinking Outside of the Box -

Data Analysis and Unstructured Data: Expanding Business Intelligence (BI) by Thinking Outside of the Box - | Big Data Technology, Semantics and Analytics |
Tony Agresta's insight:

New forms of business intelligence incorporate both structured and unstructured data into your analysis.   Where does this apply today?  Customer service, intelligence analysis in government, fraud analysis in financial services, healthcare, consumer packaged goods, retail and other markets can benefit from this approach.  The open web provides organizations with limitless data containing valuable information on sentiment, people, events, employers, relationships and more.   The ability to extract meaning from unstructured sources combined with structured data yields new insights that can be used to improve decisions. 


Let's take a look at healthcare, for example.


In an article by Dennis Amorosano entitled "Unstructured data a common hurdle to achieving guidelines", Mr. Amorosano writes "... of the 1.2 billion clinical documents produced in the United States each year, approximately 60 percent contain valuable information trapped in unstructured documents that are unavailable for clinical use, quality measurement and data mining. These paper documents have until now been the natural byproduct of most hospital workflows, as healthcare is one of the most document-intensive industries."


Forbes published an article last year entitled "The Next Revolution in Healthcare"  ( in which the author points out that the best healthcare institutions in the world still rely heavily on calculating risk to patients using clinical data.  At the same time "the real tragedy is that the information needed to properly assess the patient’s risk and determine treatment is available in the clinician’s notes, but without the proper tools the knowledge remains unavailable and hence, unused."


The good news is that new analytic solutions are available that leverage both forms of data.   BI connectivity brings the power of familiar Business tools to your applications that include unstructured data. Some of the benefits to this approach include:


  • Combining BI and NoSQL provides capabilities not available using relational stores and EDWs - real-time analysis and extended query features.
  • BI tools layer on top of NoSQL databases that use sophisticated security models to protect sensitive data. Users see only the information for which they have permissions.
  • Analysts can learn faster using data discovery tools that allow for rapid investigation of both unstructured and structured data within the same application.  A more complete view of your analysis offers tremendous advantages in patient diagnosis, claims analysis and personalized care.


To learn more about how analytics technology is working with Enterprise NoSQL Databases ideally suited to ingest, store, search and analyze all types of data, you can visit this page:


No comment yet.
Scooped by Tony Agresta!

The new reality for Business Intelligence and Big Data

The new reality for Business Intelligence and Big Data | Big Data Technology, Semantics and Analytics |
You know about Big Data and its potential, how it creates greater understanding of our world, reduces waste and misuse of resources, and dramatically increases efficiency.
Tony Agresta's insight:

Data discovery tools allow you to reveal hidden insights in your data when you don't know what to look for in advance.  These highly interactive tools allow you to visualize disparate data in various forms - charts, timelines, graphs, geo-spatial and tables – and explore relationships in data to uncover patterns that static dashboards cannot.  


With the explosion of big data, organizations are now using these tools with structured, semi-structured and unstructured data.  This approach allows them to consolidate data without having to build complex schemas, search the data instantly, deliver new content products dynamically and analyze all of their data in real time.  A transformational shift in data analysis is underway allowing organizations to do this with documents, e-mails, video and other sources.   Imagine if you could load data into Hadoop, enrich it, ingest the data into an enterprise NoSQL database in real-time, index everything for instant search & discovery and analyze that data using Tableau or Cognos.   As the only Enterprise NoSQL Database on the market, MarkLogic allows you to do just that.


You can learn more here:


No comment yet.
Rescooped by Tony Agresta from visual data!

[INFOGRAPHIC] BIG DATA: What Your IT Team Wants You To Know

[INFOGRAPHIC] BIG DATA: What Your IT Team Wants You To Know | Big Data Technology, Semantics and Analytics |

The purpose of Big Data is to supply companies with actionable information on any variety of aspects. But this is proving to be far more difficult than it looks with over half of Big Data projects left uncompleted.

Two of the most often reported reasons for project failures are a lack of expertise in data analysis. Reports show that data processing, management and analysis are all difficult in any phase of the project, with IT teams citing each of those reasons more than 40% of the time.

However, failures in Big Data projects may not solely lie on faulty project management. In a recent survey, a staggering 80% of Big Data’s biggest challenges are from a lack of appropriate talent. The field’s relative infancy is making it hard to find the necessary staff to see projects through, resulting in underutilized data and missed project goals.

IT teams are quickly recognizing a chasm between executives and frontline staffers whose job it is to apply findings from Big Data. In the end,it may not be the anticipated cure-all for 21st century business management. It is only as good as good as the system that runs it.

Via Peter Azzopardi, Berend de Jonge, Lauren Moss
Tony Agresta's insight:

Very interesting infographic.  Why do they fail?  For all of the reasons above and then some...    Over 80% of the data being collected today is unstructured and not readily stored in relational database technology burdened by complex extract, transform and load.  There's also pre-existing data, sometimes referred to as "dark data" that includes documents which need to be included and made discoverable for a host of reasons - compliance and regulatory issues are one.   Log activity and e-mail traffic used to detect cyber threats and mitigate risk through analysis of file transfers is yet another set of data that requires immediate attention.


Social and mobile are clearly channels that need to be addressed as organizations continue to mine data from the open web in support of CRM, product alerts, real time advertising options and more.  


To accomplish all of this, organizations need a platform with enterprise hardened technology that can ingest all of these forms of data in real time, without having to write complex schemas.   Getting back to the point - What do most projects fail?   If companies attempt to do this with technology that is not reliable, not durable and does not leverage the skills of their existing development organization, the project will fail.  


We have seen this time and time again.   MarkLogic to the rescue.   With over 350 customers and 500 big data applications, our Enterprise NoSQL approach mitigates the risk.  Why?  Our technology stack includes connectors to Hadoop, integration with leading analytics tools using SQL, Java and Rest APIs, JSON support, real time data ingestion, the ability to handle any form of data, alerting, in database analytics functions, high availability, replication, security and a lot more.  


When you match this technology with a world-class services organization with proven implementation skills, we can guarantee your next Big Data project will work.  We have done it hundreds of times with the largest companies in the world and very, very big data.

Olivier Vandelaer's curator insight, January 30, 2013 2:45 AM

Looking at the infographic, it clearly reminds me about the start of "Enterprise Data Warehouse": failures by "Innacurate scope", "Technical Roadblocks" & "Siloed data and no collaboration". It looks so familiar.

Adrian Carr's curator insight, January 30, 2013 10:27 AM

This is a great infographic - it shows that whilst everyone is doing it (it being "Big Data" - whatever that is...), talent is rare, technology is hard to find and the projects never end.  A far cry from the speed with which companies such as the BBC deployed MarkLogic to serve all data for the sport websites through the Olympics.  Now that was big data, delivered by a talented team in a short space of time.