Scientists at the European Molecular Biology Laboratory (EMBL), Heiderlberg, Germany, reported that they had sequenced the genome of the Henrietta Lacks, or “HeLa”, cell line. This report was met with considerable consternation by those who wondered why scientists are still experimenting on a cell line obtained without consent in the 1950s. In response to a bit of a backlash, the researchers removed the HeLa sequence from the public internet, and even the paper itself might disappear from the formal scientific literature.
However, it is unfair to treat the authors of this paper as scapegoats for the systematic failure of scientists to deal with issues surrounding genomic “privacy”. It is a simple consequence of the fact that nearly every large-scale molecular biology techinique relies on DNA sequencing as a readout. Every time anyone does a genome-scale experiment (for example, RNA-seq or ChIP-seq) on HeLa cells and archives the data, they are explicitly making public the genome sequence of HeLa cells, and thus of Henrietta Lacks.
It’s quite trivial to demonstrate this point. HeLa cells are one of the cell lines used in the ENCODE project, an ambitious effort to functionally characterize the entirety of the human genome. As part of this effort, they (among many others) have been subjected to an array of genomic experiments, almost all of which of course generated sequencing data. I downloaded a few sequencing files from these experiments, and combined the HeLa sequence with other publicly available genome sequences from the 1000 Genomes Project.
Does the data on HeLa cells contain enough information to say anything about Henrietta Lacks? Plotted above is the output of principal components analysis computed on genetic data from Nigerians, northern Europeans, Chinese, African-Americans, and HeLa. Each point represents an individual, and individuals that fall closer together are more similar genetically. Based on this plot, we can see that the HeLa cells are quite clearly from an African-American woman (or at least someone who is admixed between European and African populations).
These ancestry results are just a proof-of-principle, but any genetic analysis of disease risk or other phenotypic trait is of course just as trivial. This is not true just for HeLa cells, but across the board–the genomes of the donors of every cell line studied by ENCODE are publicly available, and can be analysed for ancestry or disease risk. Though the identities of the donors is not known in most cases besides HeLa, using techniques like those used by Gymrek et al., it may be possible to link cell lines to last names, and thus genetic information to individual people.