Crowdsourced Family Tree Offers Insight Into Who We Are

In this 6,000-person family tree cleaned and organized using graph theory, individuals spanning seven generations are represented in green, with their marital links in red. (Columbia University)

(CN) – Your next family reunion could be a lot more crowded: After analyzing millions of interconnected online genealogy profiles, scientists have collected the largest, scientifically vetted family tree to date, which at 13 million people is slightly larger than the population of Belgium or Cuba.

The family tree offers new insights into the last 500 years of marriage and migration in North America and Europe, as well as the role of genes in longevity, according to a report the team published Thursday in the journal Science.

“Through the hard work of many genealogists curious about their family history, we crowdsourced an enormous family tree and boom, came up with something unique,” said senior author Yaniv Erlich, chief science officer at genealogy and DNA testing company MyHeritage. The company owns, which hosts the data used in the study.

“We hope that this dataset can be useful to scientists researching a range of other topics.”

The researchers downloaded 86 million public profiles and organized the data using graph theory – the study of mathematical objects known as graphs, which are comprised of vertices or nodes connected by edges.

This enabled the team to identify a single family tree of 13 million people spanning an average of 11 generations, though they would need to go back another 65 generations to converge on a common ancestor.

Regardless, the dataset is milestone as it brings family-history searches from church archives and newspaper obituaries into the digital age, enabling population-level investigations. The researchers also make it easy to overlay other datasets to examine a range of socioeconomic trends at scale.

“It’s an exciting moment for citizen science,” said Melinda Mills, a demographer at the University of Oxford who was not involved in the research. “It demonstrates how millions of regular people in the form of genealogy enthusiasts can make a difference to science. Power to the people.”

The dataset lists where and when each person was born and died and reflects the demographics of individuals, with 85 percent of profiles originating from North America and Europe. The team verified that the dataset was representative of the general U.S. population’s education level by comparing a subset of Vermonters against the state’s detailed death registry.

“The reconstructed pedigrees show that we are all related to each other,” said Peter Visscher, a quantitative geneticist at the University of Queensland who was not involved in the study. “This fact is known from basic population history principles, but what the authors have achieved is still very impressive.”

The findings show that before 1750, most Americans found a spouse within six miles of where they were born. Americans born in 1950, on the other hand, traveled about 60 miles to find a spouse.

“It became harder to find the love of your life,” Erlich joked.

Before 1850, marrying a family member was common – to a fourth cousin, on average, compared to a seventh cousin today, the team found. Between 1800 and 1850, people traveled further than ever to find a spouse – nearly 12 miles on average – but were more likely to wed a fourth cousin or closer. The researchers hypothesize that changing social norms, not rising mobility, may have led people to seek potential marriage partners outside of close kin.

The team also found that women in North America and Europe have migrated more than men over the past 300 years, though when men did migrate, they traveled significantly farther on average.

To untangle the role of nature and nurture in longevity, the researchers created a model and trained it on a dataset of 3 million relatives born between 1600 and 1910 who had lived beyond the age of 30. They excluded people who died in the U.S. Civil War, the two world wars, twins, or in a natural disaster – assumed if relatives perished within 10 days of each other.

The team then compared each person’s life span to that of their relatives and their degree of separation. They found that genes explained about 16 percent of the longevity variation documented in their data – the low end of previous estimates that ranged from about 15 percent to 30 percent.

The findings suggest that good longevity genes can extend an individual’s life by an average of five years, according to Erlich.

“That’s not a lot,” he added. “Previous studies have shown that smoking takes 10 years off of your life.

“That means some life choices could matter a lot more than genetics.”

The research also shows the genes that affect longevity operate independently, rather than acting together, a phenomenon known as epistasis. Some scientists have used epistasis to explain why large-scale genomic studies have so far been unable to find the genes that encode complex traits like longevity or intelligence.

If some genetic variants combined to influence longevity, Erlich’s team would have found a greater correlation between closely related people who share more DNA, and thus more genetic interactions. However, they identified a linear relationship between longevity and genetic relatedness, ruling out widespread epistasis.

“This is important in the field because epistasis has been proposed as a source of ‘missing heritability’,” said lead author Joanna Thornycroft, a former graduate student at the Whitehead Institute for Biomedical Research, now at Wellcome Sanger Institute in the United Kingdom.

Visscher added, “This is entirely in line with theory and previous inference from SNP (variant) data, yet for some reason many researchers in human genetics and epidemiology continue to believe that there is a lot of non-additive genetic variation for common diseases and quantitative traits.”

The dataset is available at, which was created by Erlich and his colleagues. While the FamiLinx data are anonymous, people can check to see if a family member added them to If so, there is a good chance that they may be a part of the 13-million-person family tree.

The research was funded by the U.S. National Institutes of Health, the Israel Science Foundation, and Andrea and Paul Heafy.

%d bloggers like this: