Big Data Technology, Semantics and Analytics
12.1K views | +0 today
Follow
Big Data Technology, Semantics and Analytics
Trends, success and applications for big data including the use of semantic technology
Curated by Tony Agresta
Your new post is loading...
Your new post is loading...
Scooped by Tony Agresta
Scoop.it!

The Hype Around Graph Databases And Why It Matters

The Hype Around Graph Databases And Why It Matters | Big Data Technology, Semantics and Analytics | Scoop.it
Organizations are struggling with a fundamental challenge – there’s far more data than they can handle.  Sure, there’s a shared vision to analyze structured and unstructured data in support of better decision making but is this a reality for most companies?  The big data tidal wave is transforming the database [...]
Tony Agresta's insight:

This article was recently published on graph databases and why they matter.  It lists some examples of how graph databases are part of search engines we all use as well as some interesting features that integrate the results of text analysis into the graph database.  

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Operational Semantics - From Text Mining to Triplestores – The Full Semantic Circle

Operational Semantics - From Text Mining to Triplestores – The Full Semantic Circle | Big Data Technology, Semantics and Analytics | Scoop.it
In the not too distant past, analysts were all searching for a “360 degree view” of their data.  Most of the time this phrase referred to integrated RDBMS data, analytics interfaces and customers. But with the onslaught…
Tony Agresta's insight:

Semantic pipelines allow for the identification, extraction, classification and storage of semantic knowledge creating a knowledge base of all your data.   Most organizations have struggled to create these pipelines primarily because the plumbing hasn't existed.  But now it does. 


This post discusses how free flowing text streams into graph databases using concept extraction processes.  A well coordinated feed of data is written to the underlying graph database while updates are tracked on a continuous basis to ensure database integrity.  


Other important pipeline plumbing includes tools for disambiguation (used to resolve the definition of entities inside the text), classification of the entities, structuring relationships between entities and determining sentiment.


Organizations that deploy well functioning semantic pipelines have an added advantage over their competitors.   They have instant access to a completed knowledge base of data.  Research functions spend less time searching and more time analyzing.  Alerting notifies critical business functions to take immediate action.  Service levels are improved using accurate, well structured responses.  Sentiment is detected allowing more time to react to changing market conditions.



In general, the REST Client API calls out a GATE-based annotation pipeline and sends back enriched data in RDF form. Organizations typically customize these pipelines which consist of any GATE-developed set of text mining algorithms for scoring, machine learning, disambiguation or any of the other wide range of text mining techniques.

It is important to note that these text mining pipelines create RDF in a linear fashion and feed GraphDB™. Once the RDF is enriched in this fashion and stored in the database, these annotations can then be modified, edited or removed. This is particularly useful when integrating with Linked Open Data (LOD) sources. Updates to the database are populated automatically when the source information changes.

For example, let’s say your text mining pipeline is referencing Freebase as its Linked Open Data source for organization names. If an organization name changes or a new subsidiary is announced in Freebase, this information will be updated as reference-able metadata in GraphDB™.

In addition, this tightly-coupled integration includes a suite of enterprise-grade APIs, the core of which is the Concept Extraction API. This API consists of a Coordinator and Entity Update Feed. Here’s what they do:

  • The Concept Extraction API Coordinator module accepts annotation requests and dispatches them towards a group of Concept Extraction Workers. The Coordinator communicates with GraphDB™ in order to track changes leading to updates in each worker’s entity extractor. The API Coordinator acts as a traffic cop allowing for approved and unique entities to be inserted in GraphDB™ while preventing duplicates from taking up valuable real estate.
  • The Entity Update Feed (EUF) plugin is responsible for tracking and reporting on updates about every entity (concept) within the database that has been modified in any way (added, removed, or edited). This information is stored in the graph database and query-able via SPARQL. Reports can be run notifying a user of any and all changes.

 

Other APIs include Document Classification, Disambiguation, Machine Learning, Sentiment Analysis & Relation Extraction. Together, this complete set of technology allows for tight integration and accurate processing of text while efficiently storing resulting RDF statements in GraphDB™.

As mentioned, the value of this tightly-coupled integration is in the rich metadata and relationships which can now be derived from the underlying RDF database. It’s this metadata that powers high performance search and discovery or website applications – results are compete, accurate and instantaneous.

- See more at: http://www.ontotext.com/text-mining-triplestores-full-semantic-circle/#sthash.fg1RVcQN.dpuf

In general, the REST Client API calls out a GATE-based annotation pipeline and sends back enriched data in RDF form. Organizations typically customize these pipelines which consist of any GATE-developed set of text mining algorithms for scoring, machine learning, disambiguation or any of the other wide range of text mining techniques.

It is important to note that these text mining pipelines create RDF in a linear fashion and feed GraphDB™. Once the RDF is enriched in this fashion and stored in the database, these annotations can then be modified, edited or removed. This is particularly useful when integrating with Linked Open Data (LOD) sources. Updates to the database are populated automatically when the source information changes.

For example, let’s say your text mining pipeline is referencing Freebase as its Linked Open Data source for organization names. If an organization name changes or a new subsidiary is announced in Freebase, this information will be updated as reference-able metadata in GraphDB™.

In addition, this tightly-coupled integration includes a suite of enterprise-grade APIs, the core of which is the Concept Extraction API. This API consists of a Coordinator and Entity Update Feed. Here’s what they do:

  • The Concept Extraction API Coordinator module accepts annotation requests and dispatches them towards a group of Concept Extraction Workers. The Coordinator communicates with GraphDB™ in order to track changes leading to updates in each worker’s entity extractor. The API Coordinator acts as a traffic cop allowing for approved and unique entities to be inserted in GraphDB™ while preventing duplicates from taking up valuable real estate.
  • The Entity Update Feed (EUF) plugin is responsible for tracking and reporting on updates about every entity (concept) within the database that has been modified in any way (added, removed, or edited). This information is stored in the graph database and query-able via SPARQL. Reports can be run notifying a user of any and all changes.

 

Other APIs include Document Classification, Disambiguation, Machine Learning, Sentiment Analysis & Relation Extraction. Together, this complete set of technology allows for tight integration and accurate processing of text while efficiently storing resulting RDF statements in GraphDB™.

As mentioned, the value of this tightly-coupled integration is in the rich metadata and relationships which can now be derived from the underlying RDF database. It’s this metadata that powers high performance search and discovery or website applications – results are compete, accurate and instantaneous.

- See more at: http://www.ontotext.com/text-mining-triplestores-full-semantic-circle/#sthash.fg1RVcQN.dpuf
more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Diseasome | Map: explore the human disease network. Dataset, interactive map and printable poster of gene-disease relationships.

Diseasome, explore the human disease-gene network map. Printable posters and datasets available.
Tony Agresta's insight:

Here's a graph that shows how human diseases are connected. In this graph, nodes are diseases.  The links show how they are connected to one another.  The larger the node, the more diseases it is connected to. 


You can also see the genes associated with the diseases which indicate the common genetic origin of the diseases. 


This graph allows you to filter diseases by disease categories.  To search the graph for a specific category, use the "hide all" filter and then select the disease category that's of interest.   The diseases are highlighted on the graph. Then you can zoom in to see related diseases and gene associations. 

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

How Graph Analytics Can Connect You To What's Next In Big Data

How Graph Analytics Can Connect You To What's Next In Big Data | Big Data Technology, Semantics and Analytics | Scoop.it
Everything in our digital universe is connected. Every single day, you wake up and start a series of interactions with people, products and machines. Sometimes, these things influence you, and sometimes you play the role of the influencer. This is how our world is connected, in a network of relationships [...]
Tony Agresta's insight:

This recent article by Scott Gnau of Teradata does a great job at discussing Graph Analytics. For example, Scott writes


"The ability to track relationships between people, products, processes and other 'entities' remains crucial to breaking up sophisticated fraud rings."


He also talks about the fact that graph analytics:


"Allow companies to detect, in near real-time, the cyber-threats hidden in the flood of diverse data generated from IP, network, server and communication logs – a huge problem, as we know, that exists today."


But what's powering the analysis?  Where is the data stored that drives the visual display of the graph?  Today, graph databases are becoming more and more popular.  One type of graph database is the native RDF triplestore.


Triplestores store semantic facts in the form of subject - predicate - object using the Resource Description Framework.  These facts might be created using natural language processing pipelines or imported from Linked Open Data.  In either case, RDF is a standard model for data publishing and data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ. RDFS and OWL are its schema languages; SPARQL is the query language, similar to SQL.


RDF specifically supports the evolution of schemas over time without requiring all of the data transformations or reloading of data. A central concept is the Universal Resources Identifier (URI). These are globally unique identifiers, the most popular variant of which are the widely used URLs. All data elements (objects, entities, concepts, relationships, attributes) are identified with URIs, allowing data from different sources to merge without collisions. All data is represented in triples (also referred to as “statements”), which are simple enough to allow for easy, correct transformation of data from any other representation without the loss of data. At the same time, triples form interconnected data networks (graphs), which are sufficiently expressive and efficient to represent even the most complex data structures.


When Scott talks about tracking "entities", one way to do this is using graph visualization (graph analytics) that sits on top of a native RDF triplestore. The data in the triplestore contains the relationships between the entities.  They can be displayed using relationships graphs (a form of data visualization).  Since graphs can get very complex very fast (and since triplestores can hold billions of RDF statements, all of which are theoretically eligible to be displayed in the visual graph), users apply link analysis techniques to filter the data, configure the graph, change the size of the entities (nodes) on the graph and the links (edges), search the graph space and interact with charts, tables, timelines and geospatial views. 


SPARQL, the powerful query language that can be used with triplestores, is more than adequate to create subsets of RDF statements which can then be stored in smaller, more nimble triplestores that make graph analysis easier. 


So, while graph analysis is getting hot, the power is in the blend of the visual aspect and the underlying RDF triplestore.   It's also fair to say that creating the triples by analyzing free flowing text is also an important part of this solution.


To read more about native triplestores, graph visualization and text mining, get the  white paper published by Ontotext called "The Truth About Triplestores" which outlines all of this and goes deeper into text mining and other semantic technology.


To learn more about graph visualization, you can go to to Cambridge Intelligence or Centrifuge Systems.


more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Ontotext Releases GraphDB™ 6.1 RDF Triplestore - Ontotext

Ontotext Releases GraphDB™ 6.1 RDF Triplestore - Ontotext | Big Data Technology, Semantics and Analytics | Scoop.it
GraphDB 6.1 is the latest version of Ontotext's flagship RDF Triplestore product.
Tony Agresta's insight:

GraphDB 6.1, a native triplestore, is now available from Ontotext.  You can get a free copy of Lite, Standard or Enterprise here:  http://www.ontotext.com/products/ontotext-graphdb/  This is worth trying, especially since it comes with the Knowledge Path Series which guides you through the entire evaluation:  http://www.ontotext.com/graphdb-knowledge-path-series/

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Semantic Biomedical Tagger - S4 - Ontotext Wiki

Semantic Biomedical Tagger - S4 - Ontotext Wiki | Big Data Technology, Semantics and Analytics | Scoop.it
Tony Agresta's insight:

Take a look at the types of entities that the semantic biomedical tagger (SBT) can identify from complex text.   The biomedical tagger has a built-in capability to recognize 133 biomedical entity types and semantically link them to a knowledge base system.  In this case it is Linked Life Data (LLD). The SBT can load entity names from the LLD service or any other RDF database with a SPARQL endpoint.


What does this mean for you?  You can analyze free flowing text that has complex biomedical terms.   Ontotext can analyze the text, identify entities and match those entities to our Linked Life Data service.  By doing so, we enrich the terms identified in the Biomedical Tagger.  Entity names can then be loaded into GraphDB (an RDF database) or any other RDF database in support of search and discovery or analytics applications.  Your documents are discoverable at the ENTITY LEVEL allowing analyst and researchers to find precisely what they are looking for - instantly.  


more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Ontotext Announces Strategic Hires for Ontotext USA - Ontotext

Ontotext Announces Strategic Hires for Ontotext USA - Ontotext | Big Data Technology, Semantics and Analytics | Scoop.it
Strategic hires for Ontotext USA indicates Ontotext's expansion in the North American marketplace.
Tony Agresta's insight:

Ontotext has long had a presence in North America but recently expanded operations for a number of reasons:  support for the growing install base in this region, expanding into key US markets and building out alliances.   Success in EMEA and wide adoption of Ontotext has driven this growth.    Recently, Ontotext released 6.0 of its native RDF triplestore, GraphDB.  GraphDB is widely regarded as the most powerful RDF triplestore in the industry and has support for inferencing, optimized support for data integration through owl:sameAs, enterprise replication cluster, connectors to Lucene, SoLR & Elasticsearch, query optimization, SPARQL 1.1 support, RDF rank to order query results by relevance or other measures, simultaneous high performance loading, queries and inference and much more. 


Free versions of the Lite edition have been available for quite some time.   But Ontotext recently also started making the Standard and Enterprise versions available for testing (http://www.ontotext.com/products/ontotext-graphdb/)


Organizations have gravitated toward Ontotext more so than other NoSQL vendors and pure triplestore players because of the broad portfolio of semantic technology Ontotext provides that goes beyond GraphDB.  This includes Natural Language Processing, Semantic Enrichment, Semantic Data Integration, Curation and Authoring tools.    Experience Ontotext has working with Linked Open Data sets extends back to the beginning of the LOD movement.    When these tools and technologies are blended with GraphDB, they offer a powerful combination of semantic technologies that deliver a solution using a single vendor while lowering maintenance costs, shortening time to delivery and delivering proven deployment options.  

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Ontotext Delivers Semantic Publishing Solutions to the World’s Largest Media & Publishing Companies

Ontotext Delivers Semantic Publishing Solutions to the World’s Largest Media & Publishing Companies | Big Data Technology, Semantics and Analytics | Scoop.it
Washington DC (PRWEB) August 27, 2014 -- Ontotext Media & Publishing delivers semantic publishing solutions to the world’s largest media and publishing companies including automated content enrichment, data management, content and user analytics and natural language processing. Recently, Ontotext Media and Publishing has been enhanced to include contextually-aware reading recommendations based on content and user behavior, delivering an even more powerful user experience.
Tony Agresta's insight:

Semantic Recommendations are all about personalized, contextual recommendations based on a blend of search history, users profiles and, most importantly, semantically enriched content.  This refers to content that has been analyzed using natural language processing. Entities are extracted from the text, classified and indexed inside a graph database.  When a visitor comes to a website or information portal,  "Semantic Recommendations" understands more than just the past browsing history.  It understands what other articles have relevant, contextual information of interest to the reader.  This, in turn, creates a fantastic user experience because visitors get much more than they originally thought would be available in search results.  This news release talks more about Semantic Recommendations and Ontotext Media and Publishing. By the way, this same technology can be used for any website, any information product, any search and discovery application.  The basic premise is that once all of your content has been semantically enriched, search engines deliver highly relevant results. 

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Thought Leaders in Big Data: Atanas Kiryakov, CEO of Ontotext (Part 1)

Thought Leaders in Big Data: Atanas Kiryakov, CEO of Ontotext (Part 1) | Big Data Technology, Semantics and Analytics | Scoop.it
Next»» Next»» This segment is part 1 in the series : Thought Leaders in Big Data: Atanas Kiryakov, CEO of Ontotext1 2 3 4 5
Tony Agresta's insight:

This interview was with Atanas Kiryakov, founder and CEO of Ontotext.  He is an expert in semantic technology and discusses use cases for text mining, graph databases, semantic enrichment and content curation.   This is a five part series and I would recommend this to anyone interested in taking the next step in big data - semantic analysis of text leading to contextual search and discovery applications. 

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Ontotext Improves Its RDF Triplestore, GraphDB™ 6.0: Enterprise Resilience, Faster Loading Speeds and Connectors to Full-Text Search Engines Top the List of Enhancements

Ontotext Improves Its RDF Triplestore, GraphDB™ 6.0:  Enterprise Resilience, Faster Loading Speeds and Connectors to Full-Text Search Engines Top the List of Enhancements | Big Data Technology, Semantics and Analytics | Scoop.it
Sofia, Bulgaria (PRWEB) August 20, 2014 -- Today, Ontotext released GraphDB™ 6.0 including enhancements to the high availability enterprise replication cluster, faster loading speeds, higher update rates and connectors for Lucene, SOLR and Elasticsearch. GraphDB™ 6.0 is the next major release of OWLIM – the triplestore known for its outstanding support for OWL 2 and SPARQL 1.1 that already powers some of the most impressive RDF database showcases.
Tony Agresta's insight:

This press release from PRWEB summarizes the latest enhancements to GraphDB from Ontotext including improvements in load speeds, enterprise high availability replication cluster and connectors to Lucene SoRL and Elasticsearch.  

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

LMI Named a Winner in Destination Innovation Competition - Semanticweb.com

LMI Named a Winner in Destination Innovation Competition - Semanticweb.com | Big Data Technology, Semantics and Analytics | Scoop.it
Tony Agresta's insight:

More news about Open Policy was just published on SemanticWeb.com.    With Ontotext inside..."LMI has developed a tool—OpenPolicy™—to provide agencies with the ability to capture the knowledge of their experts and use it to intuitively search their massive storehouse of policy at hyper speeds. Traditional search engines produce document-level results. There’s no simple way to search document contents and pinpoint appropriate paragraphs. OpenPolicy solves this problem. The search tool, running on a semantic-web database platform, LMI SME-developed ontologies, and web-based computing power, can currently host tens of thousands of pages of electronic documents. Using domain-specific vocabularies (ontologies), the tool also suggests possible search terms and phrases to help users refine their search and obtain better results.”

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Is MarkLogic the Adult at the NoSQL Party? - Datanami

Is MarkLogic the Adult at the NoSQL Party? - Datanami | Big Data Technology, Semantics and Analytics | Scoop.it
If big data is a big party, Hadoop would be at the center and be surrounded by Hive, Giraph, YARN, NoSQL, and the other exotic technologies that generate so much excitement.
Tony Agresta's insight:

MarkLogic continues to enhance it's approach to NoSQL further confirming it's the adult at the party.  MarkLogic 7 includes enhancements to enterprise search as well as Tiered Storage and integration with Hadoop. 

MarkLogic Semantics, also part of MarkLogic 7, provides organizations with the ability to enhance the content experience for users by including an even richer experience that includes semantic facts, documents and values in the same search experience.   By doing this, organizations can surface semantic facts stored in MarkLogic when users are searching for a topic or person of interest.  For example, if a user searches all unstructured data on a topic, facts about authors, publication dates, related articles and other facts about the topic would be part of the search results.

This could be applied in many ways.  Intelligence analysts may be interested in facts about people of interest.  Fraud and AML analysts could be interested in facts about customers with unusual transaction behavior.   Life Sciences companies may want to include documents, facts about the drug manufacturing process and values about pharma products as part of the search results.

Today, traditional search applications are being replaced by smarter, content rich semantic search.    This addition to MarkLogic continues to confirm that all of this can be done within a single, unified architecture saving organizations development time, money and resources while delivering enterprise grade technology used in the most mission critical applications today. 



more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Semantic Technologies in MarkLogic - World Class Triple Store in Version 7

Tony Agresta's insight:

This video is a fantastic overview from MarkLogic's Stephen Buxton, John Snelson and Micah Dubinko covering semantic processing, use cases for triple stores that include richer search & graph applications and the expanded architecture in MarkLogic 7.    It's an hour in length but well worth the time if you're interested in understanding how you can use documents, facts derived from text and values to build ground breaking applications.   Databases as we know them will change forever with the convergence of enterprise nosql, search and semantic processing.  This video provides you with the foundation to understand this important change in database technology.

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Semantic Publishing for News & Media: Enhancing Content and Audience Engagement - Ontotext

Semantic Publishing for News & Media: Enhancing Content and Audience Engagement - Ontotext | Big Data Technology, Semantics and Analytics | Scoop.it
Borislav Popov, head of Ontotext Media & Publishing, will show you how news & media publishers can use semantic publishing technology to more efficiently generate content while increasing audience engagement through personalisation and recommendations.
Tony Agresta's insight:

This webinar is recommended for those interested in how to apply core semantic technology to structured and unstructured data.   In this webinar you will learn:


  • The importance of text analysis, entity extraction and semantic indexing - all directly linked to a  graph database
  • The significance of training text mining algorithms to create accuracy in extraction and classification
  • The power of semantic recommendations - delivering highly relevant content using a blend of semantic analysis, reader profiles and past browsing history
  • How "Semantic Search" can be applied to isolate the most meaningful content


This webinar will show live demonstrations of semantic technology for news and media.  But if you are in government, financial services, healthcare, life sciences or education, I would still recommend the webinar.  The concepts are directly applicable and most of the technology can be adapted to meet your needs. 



more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

SWJ: 5 years in, most cited papers | www.semantic-web-journal.net

SWJ: 5 years in, most cited papers | www.semantic-web-journal.net | Big Data Technology, Semantics and Analytics | Scoop.it
Tony Agresta's insight:

The Semantic Web Journal was launched 5 years ago.  There's a wealth of information here on semanitc. 


Below is the abstract for the number two entry - GraphDB (formerly OWLIM). You can download the paper on the site.


"An explosion in the use of RDF, first as an annotation language and later as a data representation language, has driven the requirements for Web-scale server systems that can store and process huge quantities of data, and furthermore provide powerful data access and mining functionality. This paper describes OWLIM (now called GraphDB), a family of semantic repositories that provide storage, inference and novel data-access features delivered in a scalable, resilient, industrial-strength platform."

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Text Mining and Knowledge Graphs in the Cloud: The Self Service Semantic Suite (S4) - Ontotext

Text Mining and Knowledge Graphs in the Cloud: The Self Service Semantic Suite (S4) - Ontotext | Big Data Technology, Semantics and Analytics | Scoop.it
This Ontotext webinar is designed to provide a summary of the value of Semantic Technology for smarter data management, as well as a brief technical introduction to the Self-Service Semantic Suite (S4) by Ontotext, which provides on-demand capabilities in the Cloud for text analytics, RDF data management and access to knowledge graphs.
Tony Agresta's insight:

I thought some of my followers might want to attend this webinar which will be given by our CTO, Marin Dimitrov. Marin will be talking about important use cases for text mining and knowledge graphs running in the cloud.   This is worth attending.

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

How Big Data Is Changing Medicine

How Big Data Is Changing Medicine | Big Data Technology, Semantics and Analytics | Scoop.it
Used to be that medical researchers came up with a theory, recruited subjects, and gathered data, sometimes for years. Now, the answers are already there in data collections on the cloud. All researchers need is the right question.
Tony Agresta's insight:

Through semantic analysis of free flowing text and the indexing of results, fine grained details about diseases, treatments, symptoms, clinical trials and current research can be made accessible to medical practitioners in real time.   How does this work?   It typically involves creating a text mining or natural language processing "pipeline" that is used to analyze the text, identify entities (even complex bio medical terms), classify them, develop relationships between them and then "index everything."


The way we have done this successfully is by using proven text mining algorithms and tuning them to highly specific domains like life sciences, healthcare and biotech.   We use curation tools and trained curators to read the text, annotate it and gain agreement on the annotations.  Then the results are used to refine the text mining algorithms, test and validate.


This process may seem cumbersome to some but the reality is, when done by trained pros, it is not.  It has the added benefit of being done one time and then being applied for long periods of time without interruption.   Results are highly accurate.  


Seeing is believing.  You can try it for yourself here: 


  1. Go to:  https://console.s4.ontotext.com/#/home
  2. Click on "Demo for Free"
  3. Paste text into the box from an article or research paper on healthcare or life sciences - make sure the article is replete with complex bio medical terms that you don't think any automated algorithm can figure out.
  4. Select Bio Medical Tagger (by the way, you can also do this for general news or Tweets)
  5. Click Execute
  6. Analyze the results


Pretty cool.


Organizations that don't semantically enrich their content are operating at a disadvantage.  The benefits are real - saving patients lives, finding new treatment strategies, developing drugs faster and much more.


If you would like to learn more about semantics, we suggest you visit www.ontotext.com where's there's a wealth of information, demos, customer stories and news about this important subject. 


more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Sizing AWS instances for the Semantic Publishing Benchmark | LDBCouncil

Tony Agresta's insight:

This post is a bit technical but I would encourage all readers to look this over, especially the conclusions section.   The key takeaway has to do with the ability for a graph database (GraphDB 6.1 in this case) to perform updates (inserts to the database) at the same time queries are being run against the database.   Results here are impressive. 

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Ontotext Receives "Innovative Enterprise of the Year 2014" Award - Ontotext

Ontotext Receives "Innovative Enterprise of the Year 2014" Award  - Ontotext | Big Data Technology, Semantics and Analytics | Scoop.it
The Applied Research and Communications Fund together with Enterprise Europe Network – Bulgaria and KIC InnoEnergy awarded the ‘Innovative Enterprise of the Year 2014’ to Ontotext.  The contest is supported by the Bulgarian Ministry of Economy and Energy…
Tony Agresta's insight:

The growth in unstructured data and the need to discover contextual insights in your data are fueling the growth in natural language processing, text mining, graph databases and discovery interfaces.    The vertical application of this technology is widespread.  It can include patient data, lab results, insurance claims data, clinical trials and research - all of which can be analyzed and accessible in one solution designed to improve patient outcomes, expedite claims processing or quickly find current, relevant research in support of new drug development.  


The media and publishing world applies semantic technology in a different way.  Entity extraction is still used to identify and disambiguate specific people, places, events and other attributes from within free flowing text.  But this is often combined with a digital footprint of visitor behavior and past searches to deliver highly targeted, relevant articles and facts all of which are stored within a centralized knowledge base.


Other core use cases include curating new content, automated tagging, enrichment using Linked Open Data and enhanced authoring tools designed to prompt authors with relevant content they can use to add color to their current articles.


There is no limit to the application of semantic technology including manufacturing (fast access to manuals and plans), customer service (analysis of customer call notes), financial services (targeted know-your-customer and compliance-based search) or semantic ad targeting (analyzing on line news followed by targeted ads that pinpoint places to visit, hotels, restaurants).


Ontotext has been doing this longer than anyone - 15 years and built a complete portfolio of semantic tools to analyze text, extract and classify entities, enrich the data, resolve identities, optimize the storage of tens of billions of facts and make ALL of your data discoverable.   For these reasons, Ontotext has been recognized as the Innovative Enterprise of the Year for 2014. 


To learn more about semantic technology and try it for free, visit www.ontotext.com

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Not All Graph Databases Are Created Equally - An Interview with Atanas Kiryakov - Ontotext

Not All Graph Databases Are Created Equally - An Interview with Atanas Kiryakov - Ontotext | Big Data Technology, Semantics and Analytics | Scoop.it
Graph databases help enterprise organizations transform the management of unstructured data and big data.
Tony Agresta's insight:

Atanas Kiryakov is a 15 year veteran of semantic technology and graph databases.   He will be interviewed on September 30th at 11 AM EDT.   I would suggest you sign up for this webinar which will focus on the following:


  • Significant use cases for semantic technology - How are they transforming business applications today?
  • The importance of graph databases - What makes them unique?
  • Creating text mining pipelines - How are they used in conjunction with graph databases?
  • The Semantic Platform - What other tools make up a complete semantic platform and how are they used?


You can review the webinar using the link above and sign up.  Details about the webinar itself will be e-mailed to you around the middle of September.


more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Why are graph databases hot? Because they tell a story... - Ontotext

Why are graph databases hot? Because they tell a story... - Ontotext | Big Data Technology, Semantics and Analytics | Scoop.it
Graph databases, text mining and inference allow you extract meaning from text, perform semantic analysis and aid in knowledge management and data discovery
Tony Agresta's insight:

Inference is the ability to infer new facts using existing facts.  For example, if you know that Susan lives in Texas and Texas is in the USA, you can infer that Susan lives in the USA.   Inference can take on much more complex scenarios the results of which can be stored inside a graph database.  As these new facts are "materialized" they can inform websites, search applications and various forms of analysis.  This is where the real power of inference comes into play.


Do this "at scale" requires a high performance graph database that can infer new facts while users are simultaneously querying the database and new facts are being loaded - all within an enterprise resilient environment. This blog post explains more about graph databases, inference and how the semantic integration of data can improve productivity and results.

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Text Mining & Graph Databases - Two Technologies that Work Well Together - Ontotext

Text Mining & Graph Databases - Two Technologies that Work Well Together - Ontotext | Big Data Technology, Semantics and Analytics | Scoop.it
Graph databases, also known as triplestores, have a very powerful capability – they can store hundreds of billions of semantic facts (triples) from any subject imaginable.  The number of free semantic facts on the market today from sources such as DbPedia, GeoNames and others is high and continues to grow every day.   Some estimates have this total between 150 and 200 billion right now.   As a result, Linked Open Data can be a good source of information with which to load your graph databases. Linked Open Data is one source of data. When does it become really powerful?  When you create your own semantic triples from your own data and use them in conjunction with linked open data to enrich your database.  This process, commonly referred to as text mining,  extracts the salient facts from free flowing text and typically stores the results in some database.  With this done, you can analyze your enriched data, visualize it, aggregate it and report on it.  In a recent project Ontotext undertook on behalf of FIBO (Finanical Information Business Ontology), we enhanced the FIBO ontologies with Linked Open Data allowing us to query company names and stock prices at the same time to show the lowest trading prices for all public stocks in North America in the last 50 years.   To do this, we needed to combine semantic data sources,  something that’s easy to do with the Ontotext Semantic Platform. We have found that the optimal way to apply text mining is in conjunction with a graph database.  Many of our customers use our Text Mining to do just that. Some vendors only sell graph databases and leave it up to you to figure out how to mine the text.  Other vendors only sell the text mining part and leave it up to…
Tony Agresta's insight:

Here's a summary of how text mining works with graph databases.  It describes the major steps in the text mining process and ends with how entities, articles and relationships are indexed inside the graph database.  The blend of these two major classes of technology allow all of your unstructured data to be discoverable.  Search results are informed by much more than just the metadata associated with the document or e-mail.  They are informed by the meaning inside the document, the text itself which contains important insights about people, places, organizations, events and their relationship to other things. 

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

September 30th at 11 AM EDT: Not All Graph Databases Are Created Equally - An Interview with Atanas Kiryakov - Ontotext

September 30th at 11 AM EDT: Not All Graph Databases Are Created Equally - An Interview with Atanas Kiryakov - Ontotext | Big Data Technology, Semantics and Analytics | Scoop.it
Graph databases help enterprise organizations transform the management of unstructured data and big data.
Tony Agresta's insight:

Graph databases store semantic facts used to describe entities and relationships to other entities.  This educational webinar will be hosted by Ontotext and will be an interview format with Atanas Kiryakov, an expert in this field.   If you want to learn about use cases for graph databases and how you can extract meaning from free flowing text and store results in the graph databases, this webinar is must. 

more...
No comment yet.
Scooped by Tony Agresta
Scoop.it!

Understanding The Various Sources of Big Data – Infographic

Understanding The Various Sources of Big Data – Infographic | Big Data Technology, Semantics and Analytics | Scoop.it
Big data is everywhere and it can help organisations any industry in many different ways.
Tony Agresta's insight:

If you have not had the chance to review some of the free sources of big data that can enhance your content applications, take a look at the Linked Open Data Graph.  It's updated daily and you can learn more by searching for the CKAN API.   This graph represents tens of billions of semantic facts about a diverse set of topics.   These facts have been used to enhance many content driven web sites allowing users to learn more about music, geography, populations and much more.  

more...
Henry Pan's curator insight, November 9, 2013 11:19 AM

Too many data source......

Scooped by Tony Agresta
Scoop.it!

Triplestores Rise In Popularity

Triplestores Rise In Popularity | Big Data Technology, Semantics and Analytics | Scoop.it

Triplestores:  An Introduction and Applications

Tony Agresta's insight:

Triplestores are gaining in popularity.  This article does a nice job at describing what triple stores are and how they differ from graph databases.  But there isn't much in the article on how triple stores are used.  So here goes:

Some organizations are finding that when they apply a combination of semantic facts (triples) with other forms of unstructured and structured data, they can build extremely rich content applications.   In some cases, content pages are constructed dynamically.    The context based applications deliver targeted, relevant results creating a unique user experience.  Single unified architectures that can store and search semantic facts, documents and values at the same time require fewer IT and data processing resources resulting in shorter time to market.  Enterprise grade technology provides the security, replication, availability, role based access and the assurance no data is lost in the process.  Real time indexing provides instant results.

Other organizations are using triples stores and graph databases to  visually show connections useful in uncovering intelligence about your data.    These tools connect to Triplestores and NoSQL databases easily allowing users to configure graphs to show how the data is connected.   There's wide applicability for this but common use cases include identifying fraud and money laundering networks, counter-terrorism, social network analysis, sales performance, cyber security and IT asset management.  The triples, documents and values provide the fuel for  the visualization engine allowing for comprehensive data discovery and faster business decisions.

Other organizations focus on semantic enrichment and then ingest resulting semantic facts into triplestores to enhance the applications mentioned above.  Semantic enrichment extracts meaning from free flowing text and identifies triples.

Today, the growth in open data - pre-built triple stores - is allowing organizations to integrate semantic facts to create richer content applications.   There are hundreds of sources of triple stores that contain tens of billions of triples, all free.

What's most important about these approaches?  Your organization can easily integrate all forms of data in a single unified architecture.  The data is driving smart websites, rich search applications and powerful approaches to data visualization.   This is worth looking at more closely since the end results are more customers, lower costs, greater insights and happier users.

more...
No comment yet.