Ancient languages hold a treasure trove of information about the culture, politics and commerce of millennia past. Yet, reconstructing them to reveal clues into human history can require decades of painstaking work.
Now, scientists at the University of California, Berkeley, have created an automated “time machine,” of sorts, that will greatly accelerate and improve the process of reconstructing hundreds of ancestral languages.
In a compelling example of how “big data” and machine learning are beginning to make a significant impact on all facets of knowledge, researchers from UC Berkeley and the University of British Columbia have created a computer program that can rapidly reconstruct “proto-languages” – the linguistic ancestors from which all modern languages have evolved. These earliest-known languages include Proto-Indo-European, Proto-Afroasiatic and, in this case, Proto-Austronesian, which gave rise to languages spoken in Southeast Asia, parts of continental Asia, Australasia and the Pacific.
”What excites me about this system is that it takes so many of the great ideas that linguists have had about historical reconstruction, and it automates them at a new scale: more data, more words, more languages, but less time,” said Dan Klein, an associate professor of computer science at UC Berkeley and co-author of the paper published online today (Feb. 11) in the journal Proceedings of the National Academy of Sciences.