cumulative distribution function
1. definition: CDF
Let \(X\) be a random variable. Then, the function \(F_X : \mathbb{R} \rightarrow [0,1]\) defined by: \[ F_X(x) = \mathbb{P}(X \leq x) \] is called the cumulative distribution function (CDF) of \(X\)
2. properties
Let \(X\) be a random variable with CDF \(F\). Then \(F\) has the following properties:
- (monotonicity) if \(x \leq y\), \(F(x) \leq F(y)\)
- (limiting value) \(\lim_{x\rightarrow -\infty} F_X(x) = 0\) and \(\lim_{x\rightarrow \infty} F_X(x) = 1\)
- (right continuity) For every \(x\), we have \(\lim_{y \downarrow x} F_{X}(y) = F_X(X)\)
3. getting the probability law from a CDF
It turns out that any function \(F\) that satisfies the above properties is a CDF for some random variable. In fact, for some suitable function \(g : (0,1) \rightarrow \mathbb{R}\), we will have \(X = g(U)\) such that \(F_X = F\)
3.1. Theorem
Let \(F\) be a given distribution function. Consider the probability space \(([0,1], \mathcal{B}, \mathbb{P})\) where \(\mathcal{B}\) is the Borel measure and \(\mathbb{P}\) is the Lebesgue measure (see notes on Lebesgue Measure). Then, there exists a measurable function \(X\), i.e. a random variable, such that \(F_X = F\).
3.1.1. proof
We give a version of the proof where we also assume that \(F\) is continuous and strictly increasing. (More general proof to be found in the linked notes). Then, the range of \(F\) is \((0,1)\) and \(F\) is invertible. We define the uniform random variable \(U(\omega) = \omega\) for all \(\omega\). And now we define our random variable \(X = F^{-1}(\omega)\) for every \(\omega \in (0,1)\). In other words, \(X(\omega) = F^{-1}(U(\omega))\). Basically, \(F^{-1}(x)\) says "this is the \(c\), such that \(\mathbb{P}(\{X \leq c\} = x\)". This is very similar to what happens in inverse transform sampling.
Let's do a quick type check:
- \(U : [0,1] \rightarrow \mathbb{R}\)
- \(F : \mathbb{R} \rightarrow [0,1]\) (but only invertible on (0,1))
- \(X = F^{-1}(U) : (0,1) \rightarrow \mathbb{R}\)
Note that \(F(F^{-1}(\omega)) = \omega\), so that \(F(X) = U\). Since \(F\) is strictly increasing, we have \(X(\omega) \leq x\) if and only if \(F(X(\omega)) \leq F(x)\), that is \(U(\omega) \leq x\). By the way, this also tells us that \(X\) is a valid random variable.
So, for every \(x \in \mathbb{R}\): \[ F_X(x) = \mathbb{P}(\{X \leq x\}) = \mathbb{P}(\{F(X) \leq F(x)\}) = \mathbb{P}(\{U \leq F(x)\}) = F(x) \]
To recap, we now have a random variable \(X\) with a CDF \(F_X = F\).
4. Corollary
It turns out there is a one-to-one correspondence between cumulative distribution functions and the probability laws of random variables. Or, there is a one-to-one correspondence between distribution functions \(F\) and probability measures \(\mathbb{P}\) on \((\mathbb{R}, \mathcal{B})\).
4.1. proof
First, we show that this mapping is surjective: every distribution function \(F\) has a corresponding probability measure. By the above theorem, for any CDF \(F\), we can find a r.v. \(X\) such that \(F_X = F\). But each \(X\) also induces a probability measure \(\mathbb{P}_X\) on \((\mathbb{R}, \mathcal{B})\). And given this measure, we can recover the CDF by defining \(F(c) = \mathbb{P}_X((-\infty, c])\).
Now, we show that this correspondence is injective: different probability measures \(\mathbb{P}_X\) and \(\mathbb{P}_{X'}\) necessarily yield different CDFs. Indeed, for any two \(\mathbb{P}_X\) and \(\mathbb{P}_{X'}\) that coincide on all intervals \((-\infty, c]\), they will be equal for all other Borel measurable sets, because the collection of intervals \((-\infty, c]\) is a generating $p$-system for \(\mathcal{B}\) (see Caratheodory's Extension Theorem).