Using data pulled from online genealogy sites, a renowned ‘genome hacker’ has constructed what is likely the biggest family trees ever assembled. The researcher and his team now plan to use the data — including a single uber-pedigree comprising 13 million individuals, which stretches back to the 15th century — to analyse the inheritance of complex genetic traits, such as longevity and facial features.
In addition to providing the invitation list to what would be the world’s largest family reunion, the work presented by computational biologist Yaniv Erlich at the American Society of Human Genetics annual meeting in Boston could provide a new tool for understanding the extent to which genes contribute to certain traits. The pedigrees have been made available to other researchers, but Erlich and his team at the Whitehead Institute in Cambridge, Massachusetts, have stripped the names from the data to protect privacy.
The structures of the trees themselves could provide interesting information about human demographics and population expansions, says Nancy Cox, a human geneticist at the University of Chicago, Illinois, who was not involved in the study. But more interesting, she says, is the possibility that such data may one day be linked to medical information or to DNA sequence data as more people have their genomes sequenced and deposit that information in public databases.
Pedigrees provide clues about genetic inheritance. For instance, by comparing an individual to their more distant relatives on the family tree, the change in frequency of a given trait, such as fertility, can indicate to what extent the trait has its roots in genetics. It can also provide clues as to whether the trait is controlled by a few genes that have large effects, or by many genes that each make smaller contributions.
"But it takes years to assemble genealogical data for even just a few thousand individuals", Erlich says. In the past, researchers have painstakingly gathered such data from church records and individual volunteers. Erlich and his team decided to streamline the process by collecting data from more than 43 million public profiles on the genealogy website geni.com. The profiles typically included birth and death dates, as well as locations and, occasionally, photos uploaded by the users.
The team assembled the data into family trees that ranged from a few thousand individuals up to 13 million people in size. Erlich says that pedigrees previously available for genetic studies contained hundreds of thousands of family members at best.
Lisa Cannon-Albright, a geneticist at the University of Utah in Salt Lake City, urges caution when using self-reported genealogical data. She has worked extensively with a large Utah genealogy database that is linked to some medical information. “Everyone wants to trace their family back to royalty,” she says. “For these giant pedigrees, we just don’t believe them beyond a certain date.” Cannon-Albright says that she cuts off her data at the year 1500.