Semantic University tutorial and introduction to RDF.
|Scooped by Tony Agresta|
This post is a bit technical but recommending reading since it represents a one of the most important aspects in data science today, the ability to discern meaning from unstructured data.
To truly appreciate the significance of this technology, consider the following questions - Since over 80% of the data being created today is unstructured in forms that include text, videos, images, documents, and more, how can organizations interpret the meaning behind this data? How can they pull out the facts in the data and show relationships between those facts leading to new insights?
This post provides you with the foundation on how semantic processing works. Cutting to the chase, the technologies are referred to as RDF (Resource Description Framework), SPARQL and OWL. They allow us to create underlying data models that understand relationships in unstructured data, even across web sites, document repositories and disparate applications like Facebook and LinkedIn.
These data models store data that has properties extracted from the unstructured data. Consider the following sentence: "The Monkeys are destroying Tom's garden." Semantic processing of this text would deconstruct the sentence identifying the subject, object and predicate while also building relationships between the three. The subject is "monkeys" and they are taking an action on "the garden". The garden is therefore the object and the predicate is "destroying".
Most importantly, there is a connection made between the monkeys and the garden allowing us to show relationships between specific facts pulled from text. How can this help us?
Assume for a second you’re working for a government agency tracking a suspicious person who exists on a watch list? Crawling the web looking for that person's name is one way to identify additional information about the person. Technology to do this exists today. When the name is detected, identifying relationships between the person being investigated and other subjects (or objects in the text) can lead you to new people that may also be of interest. For examples if Sam is on the watch list a sentence like this would be of interest: "Sam works with Steve at ABC Home Builders Corp.” Relationships between the suspect (Sam) and someone new in the investigation (Steve) could be identified. The fact that they both work for the same employer allows analysts to connect the subjects through this employer.
These interesting facts help investigators make connections within e-mail, phone conversations, in house data and other sources, all of which can be displayed visually in a graph to show the subjects and how they are linked.
Data models to store, search and analyze this data will become one of the primary tools to interpret massive amounts of data being collected today. This technology allows computers to understand relationships in unstructured data and display those relationships to analysts in the form of visual diagrams that clearly show connections to other data including phone calls, events, accounts, and more. The implications of this extend far beyond counter terrorism to include social networking, marketing, fraud, cyber security and sales to name a few.
We are at an inflexion point in big data – data stored in silos can now be consolidated with external data from the open web. Most importantly, the unstructured data can be interpreted as we form connections that are integral in understanding how things are related to each other. Data visualization technology is the vehicle to display those connections allowing analysts to explore any form of data in a single application.