The map uses data from the Global Database of Events, Language, and Tone (GDELT), which is an initiative aiming to provide a “realtime social sciences earth observatory”, by creating a freely available catalog of events derived from news stories. The database is compiled from stories in media outlets from almost every country in the world. Any story can contain more than one event, and events are automatically parsed out of news stories using a text analysis program called Tabari and encoded using a schema called Cameo.
A large portion of these events (140 million out of 250 million listed events) contains both a location of where the event happened and locations of the two primary actors involved. The Tabari algorithm associates events that it has already picked out of an article with geographic locations mentioned in the same text (by looking at verb usage in surrounding sentences). You can read the introductory paper on GDELT (Leetaru and Schrodt, 2013) for more on the specific geocoding methods employed.
We exclude all events where the two actors are geo-coded as being located in the same place (about 91 million events, or 36 percent of the full dataset), and location pairs referred to by fewer than 10 events (about 7 million events). This left us with about 43 million events (17 percent) and 216,000 connections between location pairs to visualize in the map.
The first map illustrates all the connections between pairs of locations. The brightness of each line reflects the number of events connecting the two places. The second graphic focus on international events, grouping the connections by country. Colour is used to map the world’s regions and the connections between them, with colour assigned to the ‘edges’ (i.e., connections) based on the colours of the two connected nodes. The thickness of the lines represents the number of events.