Inside the architecture of Google's Knowledge Graph and Microsoft's Satori.
Based on ideas put forward by a team from Yahoo Research in a 2009 paper called “A Web of Concepts,”extracting conceptual information from the wider Web to create a more knowledge-driven approach to search. They defined three key elements to creating a true “web of concepts”:
1. Information extraction: pulling structured data (addresses, phone numbers, prices, stock numbers and such) out of Web documents and associating it with an entity
2. Linking: mapping the relationships between entities (connecting an actor to films he’s starred in and to other actors he has worked with)
3. Analysis: discovering categorizing information about an entity from the content (such as the type of food a restaurant serves) or from sentiment data (such as whether the restaurant has positive reviews).
Google and Microsoft have just begun to tap into the power of that kind of knowledge. And their respective entity databases remain in their infancy. As of June 1, Satori had mapped over 400 million entities and Knowledge Graph had reached half a billion—a tiny fraction of the potential index of entities that the two search tools could amass.
Via Mark Oehlert, Judy O'Connell