In November 2016, the bureau staged something of an attack on itself. Using only the summary tables with their eight billion numbers, Mr. Abowd formed a small team to try to generate a record for every American that would show the block where he or she lived, as well as his or her sex, age, race and ethnicity — a “reconstruction” of the person-level data.
Each statistic in a summary table leaks a little information, offering clues about, or rather constraints on, what respondents’ answers to the census could look like. Combining statistics from different aggregate tables at different levels of geography, we start to get a picture of the demographics of who is living where.
On the face of it, finding a reconstruction that satisfies all of the constraints from all the tables the bureau produces seems impossible. But Mr. Abowd says the problem gets easier when you notice that these tables are full of zeros. Each zero indicates a combination of variables — values for one or more of block, sex, age, race and ethnicity — for which no one exists in the census. We might find, for example, that there is no one below voting age living on a particular block. We can then ignore any reconstructions that include people under 18 living there. This greatly reduces the set of viable reconstructions and makes the problem solvable with off-the-shelf software.
As an illustration, following the details available in public presentations from Mr. Abowd and his colleagues, we were able to perform our own reconstruction experiment on Manhattan. Roughly 1.6 million people are divided among 3,950 census blocks — which typically correspond to actual city blocks. The summary tables we needed came from the census website ; we used simple tools like R and the Gurobi Optimizer; and within a week we had our first results.
By this summer, Mr. Abowd and his team had completed their reconstruction for nearly every part of the country. When they matched their reconstructed data to the actual, confidential records — again comparing just block, sex, age, race and ethnicity — they found about 50 percent of people matched exactly. And for over 90 percent there was at most one mistake, typically a person’s age being missed by one or two years. (At smaller levels of geography, the census reports age in five-year buckets.)
This level of accuracy was alarming. Mr. Abowd and his peers say that their reconstruction, while still preliminary, is not a violation of Title 13. Instead it is seen as a red flag that their current disclosure limitation system is out of date.
The bureau has long had procedures to protect respondents’ confidentiality. For example, census data from 2010 showed that a single Asian couple — a 63-year-old man and a 58-year-old woman — lived on Liberty Island, at the base of the Statue of Liberty.