Big Data Technology, Semantics and Analytics
12.5K views | +2 today
Follow
Big Data Technology, Semantics and Analytics
Trends, success and applications for big data including the use of semantic technology
Curated by Tony Agresta
Your new post is loading...
Your new post is loading...
Scooped by Tony Agresta
Scoop.it!

Semantic Publishing for News & Media: Enhancing Content and Audience Engagement - Ontotext

Semantic Publishing for News & Media: Enhancing Content and Audience Engagement - Ontotext | Big Data Technology, Semantics and Analytics | Scoop.it
Borislav Popov, head of Ontotext Media & Publishing, will show you how news & media publishers can use semantic publishing technology to more efficiently generate content while increasing audience engagement through personalisation and recommendations.
Tony Agresta's insight:

This webinar is recommended for those interested in how to apply core semantic technology to structured and unstructured data.   In this webinar you will learn:


  • The importance of text analysis, entity extraction and semantic indexing - all directly linked to a  graph database
  • The significance of training text mining algorithms to create accuracy in extraction and classification
  • The power of semantic recommendations - delivering highly relevant content using a blend of semantic analysis, reader profiles and past browsing history
  • How "Semantic Search" can be applied to isolate the most meaningful content


This webinar will show live demonstrations of semantic technology for news and media.  But if you are in government, financial services, healthcare, life sciences or education, I would still recommend the webinar.  The concepts are directly applicable and most of the technology can be adapted to meet your needs. 



more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

SWJ: 5 years in, most cited papers | www.semantic-web-journal.net

SWJ: 5 years in, most cited papers | www.semantic-web-journal.net | Big Data Technology, Semantics and Analytics | Scoop.it
Tony Agresta's insight:

The Semantic Web Journal was launched 5 years ago.  There's a wealth of information here on semanitc. 


Below is the abstract for the number two entry - GraphDB (formerly OWLIM). You can download the paper on the site.


"An explosion in the use of RDF, first as an annotation language and later as a data representation language, has driven the requirements for Web-scale server systems that can store and process huge quantities of data, and furthermore provide powerful data access and mining functionality. This paper describes OWLIM (now called GraphDB), a family of semantic repositories that provide storage, inference and novel data-access features delivered in a scalable, resilient, industrial-strength platform."

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Sizing AWS instances for the Semantic Publishing Benchmark | LDBCouncil

Tony Agresta's insight:

This post is a bit technical but I would encourage all readers to look this over, especially the conclusions section.   The key takeaway has to do with the ability for a graph database (GraphDB 6.1 in this case) to perform updates (inserts to the database) at the same time queries are being run against the database.   Results here are impressive. 

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Triplestores Rise In Popularity

Triplestores Rise In Popularity | Big Data Technology, Semantics and Analytics | Scoop.it

Triplestores:  An Introduction and Applications

Tony Agresta's insight:

Triplestores are gaining in popularity.  This article does a nice job at describing what triple stores are and how they differ from graph databases.  But there isn't much in the article on how triple stores are used.  So here goes:

Some organizations are finding that when they apply a combination of semantic facts (triples) with other forms of unstructured and structured data, they can build extremely rich content applications.   In some cases, content pages are constructed dynamically.    The context based applications deliver targeted, relevant results creating a unique user experience.  Single unified architectures that can store and search semantic facts, documents and values at the same time require fewer IT and data processing resources resulting in shorter time to market.  Enterprise grade technology provides the security, replication, availability, role based access and the assurance no data is lost in the process.  Real time indexing provides instant results.

Other organizations are using triples stores and graph databases to  visually show connections useful in uncovering intelligence about your data.    These tools connect to Triplestores and NoSQL databases easily allowing users to configure graphs to show how the data is connected.   There's wide applicability for this but common use cases include identifying fraud and money laundering networks, counter-terrorism, social network analysis, sales performance, cyber security and IT asset management.  The triples, documents and values provide the fuel for  the visualization engine allowing for comprehensive data discovery and faster business decisions.

Other organizations focus on semantic enrichment and then ingest resulting semantic facts into triplestores to enhance the applications mentioned above.  Semantic enrichment extracts meaning from free flowing text and identifies triples.

Today, the growth in open data - pre-built triple stores - is allowing organizations to integrate semantic facts to create richer content applications.   There are hundreds of sources of triple stores that contain tens of billions of triples, all free.

What's most important about these approaches?  Your organization can easily integrate all forms of data in a single unified architecture.  The data is driving smart websites, rich search applications and powerful approaches to data visualization.   This is worth looking at more closely since the end results are more customers, lower costs, greater insights and happier users.

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Semantic Web vs. Semantic Technologies - Cambridge Semantics

Semantic Web vs. Semantic Technologies - Cambridge Semantics | Big Data Technology, Semantics and Analytics | Scoop.it
The Semantic Web is one class of Semantic Technologies, but is closely related to others, such as NLP. This Semantic University Lesson explores those relationships.
Tony Agresta's insight:

A short summary defining semantic technologies that also includes a synopsis of what's meant when the term "semantic web" is used.

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Operational Semantics - From Text Mining to Triplestores – The Full Semantic Circle

Operational Semantics - From Text Mining to Triplestores – The Full Semantic Circle | Big Data Technology, Semantics and Analytics | Scoop.it
In the not too distant past, analysts were all searching for a “360 degree view” of their data.  Most of the time this phrase referred to integrated RDBMS data, analytics interfaces and customers. But with the onslaught…
Tony Agresta's insight:

Semantic pipelines allow for the identification, extraction, classification and storage of semantic knowledge creating a knowledge base of all your data.   Most organizations have struggled to create these pipelines primarily because the plumbing hasn't existed.  But now it does. 


This post discusses how free flowing text streams into graph databases using concept extraction processes.  A well coordinated feed of data is written to the underlying graph database while updates are tracked on a continuous basis to ensure database integrity.  


Other important pipeline plumbing includes tools for disambiguation (used to resolve the definition of entities inside the text), classification of the entities, structuring relationships between entities and determining sentiment.


Organizations that deploy well functioning semantic pipelines have an added advantage over their competitors.   They have instant access to a completed knowledge base of data.  Research functions spend less time searching and more time analyzing.  Alerting notifies critical business functions to take immediate action.  Service levels are improved using accurate, well structured responses.  Sentiment is detected allowing more time to react to changing market conditions.



In general, the REST Client API calls out a GATE-based annotation pipeline and sends back enriched data in RDF form. Organizations typically customize these pipelines which consist of any GATE-developed set of text mining algorithms for scoring, machine learning, disambiguation or any of the other wide range of text mining techniques.

It is important to note that these text mining pipelines create RDF in a linear fashion and feed GraphDB™. Once the RDF is enriched in this fashion and stored in the database, these annotations can then be modified, edited or removed. This is particularly useful when integrating with Linked Open Data (LOD) sources. Updates to the database are populated automatically when the source information changes.

For example, let’s say your text mining pipeline is referencing Freebase as its Linked Open Data source for organization names. If an organization name changes or a new subsidiary is announced in Freebase, this information will be updated as reference-able metadata in GraphDB™.

In addition, this tightly-coupled integration includes a suite of enterprise-grade APIs, the core of which is the Concept Extraction API. This API consists of a Coordinator and Entity Update Feed. Here’s what they do:

  • The Concept Extraction API Coordinator module accepts annotation requests and dispatches them towards a group of Concept Extraction Workers. The Coordinator communicates with GraphDB™ in order to track changes leading to updates in each worker’s entity extractor. The API Coordinator acts as a traffic cop allowing for approved and unique entities to be inserted in GraphDB™ while preventing duplicates from taking up valuable real estate.
  • The Entity Update Feed (EUF) plugin is responsible for tracking and reporting on updates about every entity (concept) within the database that has been modified in any way (added, removed, or edited). This information is stored in the graph database and query-able via SPARQL. Reports can be run notifying a user of any and all changes.

 

Other APIs include Document Classification, Disambiguation, Machine Learning, Sentiment Analysis & Relation Extraction. Together, this complete set of technology allows for tight integration and accurate processing of text while efficiently storing resulting RDF statements in GraphDB™.

As mentioned, the value of this tightly-coupled integration is in the rich metadata and relationships which can now be derived from the underlying RDF database. It’s this metadata that powers high performance search and discovery or website applications – results are compete, accurate and instantaneous.

- See more at: http://www.ontotext.com/text-mining-triplestores-full-semantic-circle/#sthash.fg1RVcQN.dpuf

In general, the REST Client API calls out a GATE-based annotation pipeline and sends back enriched data in RDF form. Organizations typically customize these pipelines which consist of any GATE-developed set of text mining algorithms for scoring, machine learning, disambiguation or any of the other wide range of text mining techniques.

It is important to note that these text mining pipelines create RDF in a linear fashion and feed GraphDB™. Once the RDF is enriched in this fashion and stored in the database, these annotations can then be modified, edited or removed. This is particularly useful when integrating with Linked Open Data (LOD) sources. Updates to the database are populated automatically when the source information changes.

For example, let’s say your text mining pipeline is referencing Freebase as its Linked Open Data source for organization names. If an organization name changes or a new subsidiary is announced in Freebase, this information will be updated as reference-able metadata in GraphDB™.

In addition, this tightly-coupled integration includes a suite of enterprise-grade APIs, the core of which is the Concept Extraction API. This API consists of a Coordinator and Entity Update Feed. Here’s what they do:

  • The Concept Extraction API Coordinator module accepts annotation requests and dispatches them towards a group of Concept Extraction Workers. The Coordinator communicates with GraphDB™ in order to track changes leading to updates in each worker’s entity extractor. The API Coordinator acts as a traffic cop allowing for approved and unique entities to be inserted in GraphDB™ while preventing duplicates from taking up valuable real estate.
  • The Entity Update Feed (EUF) plugin is responsible for tracking and reporting on updates about every entity (concept) within the database that has been modified in any way (added, removed, or edited). This information is stored in the graph database and query-able via SPARQL. Reports can be run notifying a user of any and all changes.

 

Other APIs include Document Classification, Disambiguation, Machine Learning, Sentiment Analysis & Relation Extraction. Together, this complete set of technology allows for tight integration and accurate processing of text while efficiently storing resulting RDF statements in GraphDB™.

As mentioned, the value of this tightly-coupled integration is in the rich metadata and relationships which can now be derived from the underlying RDF database. It’s this metadata that powers high performance search and discovery or website applications – results are compete, accurate and instantaneous.

- See more at: http://www.ontotext.com/text-mining-triplestores-full-semantic-circle/#sthash.fg1RVcQN.dpuf
more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

How Graph Analytics Can Connect You To What's Next In Big Data

How Graph Analytics Can Connect You To What's Next In Big Data | Big Data Technology, Semantics and Analytics | Scoop.it
Everything in our digital universe is connected. Every single day, you wake up and start a series of interactions with people, products and machines. Sometimes, these things influence you, and sometimes you play the role of the influencer. This is how our world is connected, in a network of relationships [...]
Tony Agresta's insight:

This recent article by Scott Gnau of Teradata does a great job at discussing Graph Analytics. For example, Scott writes


"The ability to track relationships between people, products, processes and other 'entities' remains crucial to breaking up sophisticated fraud rings."


He also talks about the fact that graph analytics:


"Allow companies to detect, in near real-time, the cyber-threats hidden in the flood of diverse data generated from IP, network, server and communication logs – a huge problem, as we know, that exists today."


But what's powering the analysis?  Where is the data stored that drives the visual display of the graph?  Today, graph databases are becoming more and more popular.  One type of graph database is the native RDF triplestore.


Triplestores store semantic facts in the form of subject - predicate - object using the Resource Description Framework.  These facts might be created using natural language processing pipelines or imported from Linked Open Data.  In either case, RDF is a standard model for data publishing and data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ. RDFS and OWL are its schema languages; SPARQL is the query language, similar to SQL.


RDF specifically supports the evolution of schemas over time without requiring all of the data transformations or reloading of data. A central concept is the Universal Resources Identifier (URI). These are globally unique identifiers, the most popular variant of which are the widely used URLs. All data elements (objects, entities, concepts, relationships, attributes) are identified with URIs, allowing data from different sources to merge without collisions. All data is represented in triples (also referred to as “statements”), which are simple enough to allow for easy, correct transformation of data from any other representation without the loss of data. At the same time, triples form interconnected data networks (graphs), which are sufficiently expressive and efficient to represent even the most complex data structures.


When Scott talks about tracking "entities", one way to do this is using graph visualization (graph analytics) that sits on top of a native RDF triplestore. The data in the triplestore contains the relationships between the entities.  They can be displayed using relationships graphs (a form of data visualization).  Since graphs can get very complex very fast (and since triplestores can hold billions of RDF statements, all of which are theoretically eligible to be displayed in the visual graph), users apply link analysis techniques to filter the data, configure the graph, change the size of the entities (nodes) on the graph and the links (edges), search the graph space and interact with charts, tables, timelines and geospatial views. 


SPARQL, the powerful query language that can be used with triplestores, is more than adequate to create subsets of RDF statements which can then be stored in smaller, more nimble triplestores that make graph analysis easier. 


So, while graph analysis is getting hot, the power is in the blend of the visual aspect and the underlying RDF triplestore.   It's also fair to say that creating the triples by analyzing free flowing text is also an important part of this solution.


To read more about native triplestores, graph visualization and text mining, get the  white paper published by Ontotext called "The Truth About Triplestores" which outlines all of this and goes deeper into text mining and other semantic technology.


To learn more about graph visualization, you can go to to Cambridge Intelligence or Centrifuge Systems.


more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Understanding The Various Sources of Big Data – Infographic

Understanding The Various Sources of Big Data – Infographic | Big Data Technology, Semantics and Analytics | Scoop.it
Big data is everywhere and it can help organisations any industry in many different ways.
Tony Agresta's insight:

If you have not had the chance to review some of the free sources of big data that can enhance your content applications, take a look at the Linked Open Data Graph.  It's updated daily and you can learn more by searching for the CKAN API.   This graph represents tens of billions of semantic facts about a diverse set of topics.   These facts have been used to enhance many content driven web sites allowing users to learn more about music, geography, populations and much more.  

more...
Henry Pan's curator insight, November 9, 2013 11:19 AM

Too many data source......

Scooped by Tony Agresta
Scoop.it!

Semantic Technologies in MarkLogic - World Class Triple Store in Version 7

Tony Agresta's insight:

This video is a fantastic overview from MarkLogic's Stephen Buxton, John Snelson and Micah Dubinko covering semantic processing, use cases for triple stores that include richer search & graph applications and the expanded architecture in MarkLogic 7.    It's an hour in length but well worth the time if you're interested in understanding how you can use documents, facts derived from text and values to build ground breaking applications.   Databases as we know them will change forever with the convergence of enterprise nosql, search and semantic processing.  This video provides you with the foundation to understand this important change in database technology.

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

RDF 101 - Cambridge Semantics

RDF 101 - Cambridge Semantics | Big Data Technology, Semantics and Analytics | Scoop.it
Semantic University tutorial and introduction to RDF.
Tony Agresta's insight:

This post is a bit technical but recommending reading since it represents a one of the most important aspects in data science today, the ability to discern meaning from unstructured data. 


To truly appreciate the significance of this technology, consider the following questions -  Since over 80% of the data being created today is unstructured in forms that include text, videos, images, documents, and more, how can organizations interpret the meaning behind this data?   How can they pull out the facts in the data and show relationships between those facts leading to new insights?

 

This post provides you with the foundation on how semantic processing works.   Cutting to the chase, the technologies are referred to as RDF (Resource Description Framework), SPARQL and OWL. They allow us to create underlying data models that understand relationships in unstructured data, even across web sites, document repositories and disparate applications like Facebook and LinkedIn.

 

These data models store data that has properties extracted from the unstructured data.  Consider the following sentence:   "The Monkeys are destroying Tom's garden."  Semantic processing of this text would deconstruct the sentence identifying the subject, object and predicate while also building relationships between the three.  The subject is "monkeys" and they are taking an action on "the garden". The garden is therefore the object and the predicate is "destroying".  

 

Most importantly, there is a connection made between the monkeys and the garden allowing us to show relationships between specific facts pulled from text.   How can this help us? 

 

Assume for a second you’re working for a government agency tracking a suspicious person who exists on a watch list?   Crawling the web looking for that person's name is one way to identify additional information about the person.   Technology to do this exists today.  When the name is detected, identifying relationships between the person being investigated and other subjects (or objects in the text) can lead you to new people that may also be of interest.  For examples if Sam is on the watch list a sentence like this would be of interest:  "Sam works with Steve at ABC Home Builders Corp.”  Relationships between the suspect (Sam) and someone new in the investigation (Steve) could be identified.   The fact that they both work for the same employer allows analysts to connect the subjects through this employer.

 

These interesting facts help investigators make connections within e-mail, phone conversations, in house data and other sources, all of which can be displayed visually in a graph to show the subjects and how they are linked. 

 

Data models to store, search and analyze this data will become one of the primary tools to interpret massive amounts of data being collected today.  This technology allows computers to understand relationships in unstructured data and display those relationships to analysts in the form of visual diagrams that clearly show connections to other data including phone calls, events, accounts, and more.  The implications of this extend far beyond counter terrorism to include social networking, marketing, fraud, cyber security and sales to name a few.


We are at an inflexion point in big data – data stored in silos can now be consolidated with external data from the open web.  Most importantly, the unstructured data can be interpreted as we form connections that are integral in understanding how things are related to each other.  Data visualization technology is the vehicle to display those connections allowing analysts to explore any form of data in a single application. 


Learn more about this technology and other advances in Enterprise NoSQL here.


more...
No comment yet.