This week’s furor over FaceApp has largely centered on concerns that its Russian developers might be compelled to share the app’s data with the Russian government, much as the Snowden disclosures illustrated the myriad ways in which American companies were compelled to disclose their private user data to the US government. Yet the reality is that this represents a mistaken understanding of just how the modern data trade works today and the simple fact that American universities and companies routinely make their data available to companies all across the world, including in Russia and China. In today’s globalized world, data is just as globalized, with national borders no longer restricting the flow of our personal information - trend made worse by the data-hungry world of deep learning.
Data brokers have long bought and sold our personal data in a shadowy world of international trade involving our most intimate and private information. The digital era has upended this explicit trade through the interlocking world of passive exchange through analytics services. Today even the most mundane Website likely includes multiple Web trackers that in turn broadcast each visitor’s personal information to myriad companies all across the Web, which in turn repackage and resell this information even further. Many of these analytics companies are based outside the United States, meaning many of the same sites raising questions about FaceApp actually already ship their visitors’ data to other countries, including potentially Russia, without even realizing it.
In a world where tech companies expect you to cobble together different tools to protect your privacy putting the burden on you, we are providing an easy-to-use solution with just one Firefox login to get the full benefit of all of the protections and capabilities we’ve built into our products and services.
Yet it is the deep learning revolution that has had perhaps the most devastating impact on digital privacy. In their rush to build massive new training and testing datasets, myriad companies and universities have released large datasets of observational data to be used by deep learning researchers all across the world. Few of these datasets have restrictions on how they can be used and even those with restrictions have little control over their data once it is released into the wild.
Nowhere is this privacy threat more apparent than in the rise of vast facial recognition datasets. Such datasets have been culled from Web searches, gleaned from photo sharing sites, harvested from social media platforms and even extracted from surveillance cameras. The universities and companies producing these datasets typically release them publicly or redistribute them to fellow academic and corporate research institutions as part of the replication process of academic publication or to seed broader research collaboration.
99 Percent Foundation
Given the number of deep learning researchers outside the US, especially in countries like China, these datasets rarely remain exclusively within the borders of the United States. Instead, they are downloaded by researchers and companies all across the world who in turn often redistribute them, ensuring their perpetual propagation.
The opaque form of deep learning models means it is impossible to tell whether a given dataset has been used to train a particular model unless it has distinctive labels or other artifacts that were preserved by the model creators. This means that even if a dataset is distributed with a legal license that prohibits its use in government surveillance, once that dataset is in the wild there is little to stop its misuse.
Ten years later, after the horrors of World War II, George Orwell published 1984, which described a dystopian future far less comforting than Huxley’s, and was positively terrifying in many ways. A cypherpunk is any activist advocating widespread use of strong cryptography and privacy-enhancing technologies as a route to social and political change.
As the New York Times reported last week, the end result is that facial recognition datasets created by American universities have found their way into the hands of companies working for the Chinese government to build ethnic surveillance systems that have received international condemnation. The examples cited by the Times represent just a fraction of the university-created or curated datasets that have found their way into governmental use.
Putting this all together, in today’s globalized open data world, datasets are no longer national assets. They are global resources that are freely shared by researchers across national borders, despite containing enormously sensitive private information. In some cases, American researchers working for American universities have created the datasets that have helped train and tune the surveillance states of repressive regimes throughout the world, including nations with which the US government is engaged in hostilities with. This is simply a fact of life of the modern digital world.
In the end, while all of the digital privacy attention this week has focused on FaceApp, this concern belies the simple fact that there is a far greater volume of far more sensitive data flowing out of US borders to hostile states each day. Such is privacy in our globalized digital world.