As digital data expands, anonymity may become a mathematical impossibility.
Yet these data brokers today are considered somewhat old-fashioned compared with Internet companies like Facebook, which have automated the collection of personal information so it can be done in real time. According to its financial filings at the time of its IPO, Facebook stores around 111 megabytes of photos and videos for each of its users, who now number more than a billion.
That’s 100 petabytes of personal information right there. In some European legal cases, plaintiffs have learned that Facebook’s records of their interactions with the site—including text messages, things they “liked,” and addresses of computers they used—run to 800 printed pages, adding up to another few megabytes per user.
In a step that’s worrisome to digital-privacy advocates, offline and online data sets are now being connected to help marketers target advertisements more precisely. In February, Facebook announced a deal with Acxiom and other data brokers to merge their data, linking real-world activities to those on the web. At a March investor meeting, Acxiom’s chief science officer claimed that its data could now be linked to 90% of U.S. social profiles.
Such data sets are often portrayed as having been “anonymized” in some way, but the more data they involve, the less like that is to be actually true. Mobile-phone companies, for instance, record users’ locations, strip out the phone numbers, and sell aggregate data sets to merchants or others interested in people’s movements.
MIT researchers Yves-Alexandre de Montjoye and César A. Hidalgo have shown that even when such location data is anonymous, just four different data points about a phone’s position can usually link the phone to a unique person.