A dialect is a particular form of language limited to a specific region or social group. Linguists are fascinated by dialects because they reveal social classes, patterns of immigration and how groups have influenced each other in the past.
But studying dialects is hard work. Traditionally, linguists do this by interviewing a relatively small number of people, typically a few hundred, and asking them to fill out questionnaires. Researchers then use the results to create linguistic atlases but these are naturally limited by the choice of the locations and individuals who have been studied.
Today, Bruno Gonçalves at the University of Toulon in France and David Sánchez at the Institute for Cross-Disciplinary Physics and Complex Systems on the island of Majorca, Spain, say they have found a new way to study dialects on a global scale using messages posted on Twitter. The results reveal a major surprise about the way dialects are distributed around the world and provide a fascinating snapshot of how they are evolving under various new pressures, such as global communication mechanisms like Twitter.
Gonçalves and Sánchez begin by sampling all of the tweets written in Spanish over two years and that also contain geolocation information. That gave them a database of 50 million geolocated tweets, with most from Spain, Spanish America, and the United States.