As a data-driven advertising company, Google’s business model hinges on knowing as much about its users as possible. But as the public has increasingly awakened to its privacy rights this imperative has generated more friction. One protection Google has invested in is the field of data science known as “differential privacy,” which strategically adds random noise to user information stored in databases so that companies can still analyze it without being able to single people out. And now the company is releasing a tool to help other developers achieve that same level of differential privacy defense.
Today Google is announcing a new set of open source differential privacy libraries that not only offer the equations and models needed to set boundaries and constraints on identifying data, but also include an interface to make it easier for more developers to actually implement the protections. The idea is to make it possible for companies to mine and analyze their database information without invasive identity profiles or tracking. The measures can also help mitigate the fallout of a data breach, because user data is stored with other confounding noise.
“It’s really all about data protection and about limiting the consequences of releasing data,” says Bryant Gipson, an engineering manager at Google. “This way, companies can still get insights about data that are valuable and useful to everybody without doing something to harm those users.”
Google currently uses differential privacy libraries to protect all different types of information, like location data, generated by its Google Fi mobile customers. And the techniques also crop up in features like the Google Maps meters that tell you how busy different businesses are throughout the day. Google intentionally built its differential privacy libraries to be flexible and applicable to as many database features and products as possible.
Differential privacy is similar to cryptography in the sense that it’s extremely complicated and difficult to do right. And as with encryption, experts strongly discourage developers from attempting to “roll your own” differential privacy scheme, or design one from scratch. Google hopes that its open source tool will be easy enough to use that it can be a one-stop shop for developers who might otherwise get themselves into trouble.
“The underlying differential privacy noisemaking code is very, very general,” says Lea Kissner, chief privacy officer of the workplace behavior startup Humu and Google’s former global lead of privacy technology. Kissner oversaw the differential privacy project until her departure in January. “The interface that’s put on the front of it is also quite general, but it’s specific to the use case of somebody making queries to a database. And that interface matters. If you want people to use it right you need to put an interface on it that is actually usable by actual human beings who don’t have a PhD in the area.” (Which Kissner does.)
Developers could use Google’s tools to protect all sorts of database queries. For example, with differential privacy in place, employees at a scooter share company could analyze drop-offs and pickups at different times without also specifically knowing who rode which scooter where. And differential privacy also has protections to keep aggregate data from revealing too much. Take average scooter ride length: even if one user’s data is added or removed, it won’t change the average ride number enough to blow that user’s mathematical cover. And differential privacy builds in many such protections to preserve larger conclusions about trends no matter how granular someone makes their database queries.