US Census Bureau headquarters in Washington, DC. (Photo: census.gov)
A new technique to protect the privacy of participants in the 2020 Census could foster distrust between the Census Bureau and researchers if it results in too many inaccuracies, demographers warned officials Wednesday.
The demographers, who study population changes, delivered the message to bureau officials at a workshop at The National Academies of Sciences, Engineering and Medicine in Washington. The agency was participating in the workshop to hear from data users about the technique known as “differential privacy,” which will be implemented next year for the 2020 Census.
The technique adds some mathematical “noise” to the data to obscure any given individual’s identity but still gather statistically valid information. Bureau officials said the change is needed in an era of Big Data when participants can be identified through public and private data sets.
But if differential privacy is implemented in a way that jeopardizes the quality of the data, it could create distrust between data users and the bureau, some demographers warned.
“Trust between users and the Census Bureau is fundamentally important and that needs to be addressed,” said Seth Spielman, a professor of geography at the University of Colorado in Boulder.
State and local government officials assume that the Census Bureau data is accurate when they create budgets or decide on infrastructure projects, said Nicholas Nagle, a demographer at the University of Tennessee.
“We assume that and we allocate real dollars based on that assumption,” Nagle said.
Ron Jarmin, deputy director of the Census Bureau, said the agency needs to find the “sweet spot” between data confidentiality and data accuracy as it comes up with a final algorithm for differential privacy.
“We are at an important crossroads,” Jarmin said.
The bureau is implementing differential privacy for the 2020 Census in response to privacy threats. In a recent test, the agency went back to the last national headcount, in 2010, and reconstructed individual profiles from thousands of publicly available tables. Bureau researchers then matched those records against other public population data, and were able to infer the identities of 52 million Americans.
Some researchers worry differential privacy will result in data inaccuracies, especially at small geographies, such as neighborhood blocks.
In a letter to the bureau last month, a group of demographers wrote they were concerned that the technique would diminish the accuracy of data for small communities. The bureau’s adoption of differential privacy won’t solve concerns that private companies are gathering and releasing personal data, said the letter from the State Data Centers, the Census Information Centers and the Federal State Cooperative for Population Estimates.
At Wednesday’s workshop, Phil LeClerc of the Census Bureau told attendees that differential privacy was chosen over other techniques because it was more accurate for large geographic areas, such as states and higher.