How much still needs to be done to make algorithms more ethical
In the following interview computer scientists Michael Kearns and Aaron Roth research how computer programs can affect our lives. Even when written without malice, algorithms have the potential to treat people unfairly or violate their privacy. In their book “The Ethical Algorithm,” the authors explain what it means to get algorithms to be decent. The interview is conducted by Bas den Hond, who writes about Earth science, physics, mathematics and language. Visit journal.getabstract.com for the original interview.
Q: Michael and Aaron, has either of you ever felt an algorithm wronged you?
Kearns: Even though we don’t think we have been treated unfairly, we may not know it. I certainly don’t think I’ve been a victim of some of the worst cases that we talk about in the book, like unfairly being denied a loan or given a criminal sentence that I didn’t deserve. But I have definitely — like many Internet users — raised my eyebrows when ads pop up for things I had never searched for on Google itself, but somehow it knew that I might be interested in such a product via some other channel.
Roth: Privacy violations have become so ubiquitous that it hardly seems notable or personal that you might have experienced one. But here is an example that springs to my mind from grad school: On Facebook, they have this feature “people you might know.” They mostly suggest people who are reasonable, but in this case, they suggested I might want to be friends with my landlord. And it is unclear how they could have connected us because the relationship that I had with my landlord was entirely offline. I would occasionally send her a check for the rent, we had no friends in common on the social network, we had exchanged emails on Google, not on Facebook, and I had not provided Facebook access to my email records. So this stood out to me.
Q: If we stop noticing most violations of privacy, as you say, are they then still violations? Or is our concept of privacy eroding?
Roth: For sure, privacy protections are quickly eroding. But it’s not that people don’t care, it’s largely that they’re unaware. Every once in a while, people become aware of how much is known about them. There was an incident that was popular in the press, maybe five years ago, about the retailer Target. They had sent an advertisement for products for pregnant women in the mail to a young woman who still lived with her parents. Her family did not know she was pregnant. And Target had figured this out — by essentially using large-scale machine learning, where an algorithm, or “model,” is derived automatically from large amounts of data — to correlate the purchases that she had made with the purchases of other customers.
Q: Is it now futile to hope companies will take better care of our privacy?
Roth: I don’t think it’s a lost cause. First, because people do care about privacy, companies can make privacy a selling point. One of the ways Apple brands itself as distinct from Google is by saying that they offer stronger privacy protections — and that’s true to an extent. They use several technologies that we talk about in the book. For example, differential privacy. This is where they compare data and information on someone with data that does not tell them much about that person. Second, as privacy-preserving technologies like differential privacy mature and become feasible without destroying the business model of companies that want to use machine learning, it becomes more and more possible to modernize privacy regulation to require the use of these technologies.
Kearns: Clearly, regulators care about privacy, and they care increasingly, as demonstrated by the GDPR (General Data Protection Regulation) in Europe, and at least at the state level in the United States, like the California Consumer Privacy Act. We know, just from professional experience, by being asked to talk to regulators and the like, that there is increasing alarm over privacy erosion, and an increasing appetite to take concrete regulatory action against those erosions.
Q: How does differential privacy work, for instance, with the data troves that the US Census Bureau is holding?
Roth: Well, it has always been required by law to protect privacy; it’s just that the law doesn’t specify exactly what that means. They’ve never released raw data. They’ve always imposed some kind of attempt to anonymize the data. But up until this year, they’ve been using heuristic techniques that don’t work very well. Before, they’d compute statistics after randomly swapping families from different neighborhoods. The effectiveness of this kind of technique relied on the secrecy of what they were doing. So nobody really knew how accurate the released statistics were. Starting this year, they’re going to use differential privacy. There’s no swapping families around, the statistics that they want to release will be computed exactly, and then perturbed with some amount of noise.
Q: What exactly is different, now?
Roth: It’s different in two ways, both of which are good. First, it provides formal guarantees: You don’t have to worry that someone cleverer will figure out how to undo your protections and break through them. Second: Because the process is not secret, it’s now possible to do more rigorous statistical analysis on the actual data that’s being released, for example to calculate the accuracy and therefore quantify the uncertainty in the statistics that the privacy protections are adding.
Kearns: A very powerful use of differential privacy, which we’re hoping to collaborate with the Census on, is creating synthetic data sets. You start with raw census data and produce a fake data set. So the data don’t correspond to real people at all, but in aggregate, the synthetic data set preserves desired statistical properties. And then people outside the Census can take that data set and do whatever they want with it — still privacy is preserved.
Q: It is good to see the implementation of such techniques that are capable of maintaining people’s privacy. In your book, however, you describe a real struggle with other ethical considerations, right?
Roth: Yes that’s right.
We know that the study of algorithmic fairness will necessarily be messier than privacy. In our view, and the view I think of all our colleagues, differential privacy is kind of the right definition of privacy, in the sense that it provides very strong guarantees to individuals yet it still permits you to do very powerful things with data. And it’s a single definition that gives you a very general, powerful notion of privacy.
Q: What about fairness?
Roth: We already know that fairness won’t be like that. However mature it eventually gets, we will always have multiple, potentially competing definitions of fairness. And even within a single definition, there might be quantitative trade-offs between the fairness you provide to different groups. Providing more of some type of fairness, for example by race, might mean that you have less fairness by gender. This isn’t about the state of our knowledge, and it’s not that we don’t know how to handle this yet; these are actual realities. That’s why the study of fairness is more controversial then privacy.
Q: Does it surprise you how people perceive privacy and fairness?
Roth: As scientists we think: These are both interesting and important social norms, and we want to make a science out of embedding them in algorithms. And so to us, there is a similarity between them. But if you go out into the world and talk to people about them, people have much stronger feelings about fairness and whether we should be enforcing it and for whom, than they do about privacy.