differential privacy
1. intuition
A randomized algorithm is differentially private if it's output on some dataset cannot be used to determine whether or not a certain individual was included in the data to some probability.
1.1. definition
Let \(\epsilon\) be a real number. Let \(\mathcal{A}\) be our randomized algorithm and \(D_1\) and \(D_2\) be two datasets that differ only by one element. Then \(\mathcal{A}\) is \(\epsilon\) DP if for all \(S \subseteq im(\mathcal{A})\), we have: \(P(S\in \mathcal{A}(D_1)) \leq \exp(\epsilon)P(S\in \mathcal{A}(D_2))\)
1.1.1. initial thoughts
The way I'm thinking about it now: I don't want my face to show up in a model where my name is queried. So let's say that \(\mathcal{A}\) is a model that takes text and outputs an image. Let's say that \(D_1\) is consists of the training set + my face and my name, as a query, and \(D_2\) is a training set without my face but my name still as a query. Then, let \(S\) consist of just my face. I want the probability of the model showing my face to be just as (within a bound) low as the probability had the training data not included my face in it.
1.1.2. second thoughts
- I'm thinking that \(\mathcal{A}\) is SGD and \(S\) corresponds to a particular configuration of weights.
- Often the proofs will show that the gradients are bounded between some values. And then the noise added to the gradients will be of a similar scale.
- So, the guarantee of differential privacy is a bound on how important a single piece of data is to achieving a particular set of weights
- composability is important: it means we can keep applying DP-SGD to our weights and still be DP
2. machine learning
Use DP-SGD to perturn gradients to achieve DP.