cumulative distribution function

1. definition: CDF

Let $X$ be a random variable. Then, the function $F_X : \mathbb{R} \rightarrow [0,1]$ defined by: \[ F_X(x) = \mathbb{P}(X \leq x) \] is called the cumulative distribution function (CDF) of $X$

2. properties

Let $X$ be a random variable with CDF $F$. Then $F$ has the following properties:

(monotonicity) if $x \leq y$, $F(x) \leq F(y)$
(limiting value) $\lim_{x\rightarrow -\infty} F_X(x) = 0$ and $\lim_{x\rightarrow \infty} F_X(x) = 1$
(right continuity) For every $x$, we have $\lim_{y \downarrow x} F_{X}(y) = F_X(X)$

3. getting the probability law from a CDF

It turns out that any function $F$ that satisfies the above properties is a CDF for some random variable. In fact, for some suitable function $g : (0,1) \rightarrow \mathbb{R}$, we will have $X = g(U)$ such that $F_X = F$

3.1. Theorem

Let $F$ be a given distribution function. Consider the probability space $([0,1], \mathcal{B}, \mathbb{P})$ where $\mathcal{B}$ is the Borel measure and $\mathbb{P}$ is the Lebesgue measure (see notes on Lebesgue Measure). Then, there exists a measurable function $X$, i.e. a random variable, such that $F_X = F$.

3.1.1. proof

We give a version of the proof where we also assume that $F$ is continuous and strictly increasing. (More general proof to be found in the linked notes). Then, the range of $F$ is $(0,1)$ and $F$ is invertible. We define the uniform random variable $U(\omega) = \omega$ for all $\omega$. And now we define our random variable $X = F^{-1}(\omega)$ for every $\omega \in (0,1)$. In other words, $X(\omega) = F^{-1}(U(\omega))$. Basically, $F^{-1}(x)$ says "this is the $c$, such that $\mathbb{P}(\{X \leq c\} = x$". This is very similar to what happens in inverse transform sampling.

Let's do a quick type check:

$U : [0,1] \rightarrow \mathbb{R}$
$F : \mathbb{R} \rightarrow [0,1]$ (but only invertible on (0,1))
$X = F^{-1}(U) : (0,1) \rightarrow \mathbb{R}$

Note that $F(F^{-1}(\omega)) = \omega$, so that $F(X) = U$. Since $F$ is strictly increasing, we have $X(\omega) \leq x$ if and only if $F(X(\omega)) \leq F(x)$, that is $U(\omega) \leq x$. By the way, this also tells us that $X$ is a valid random variable.

So, for every $x \in \mathbb{R}$: \[ F_X(x) = \mathbb{P}(\{X \leq x\}) = \mathbb{P}(\{F(X) \leq F(x)\}) = \mathbb{P}(\{U \leq F(x)\}) = F(x) \]

To recap, we now have a random variable $X$ with a CDF $F_X = F$.

4. Corollary

It turns out there is a one-to-one correspondence between cumulative distribution functions and the probability laws of random variables. Or, there is a one-to-one correspondence between distribution functions $F$ and probability measures $\mathbb{P}$ on $(\mathbb{R}, \mathcal{B})$.

4.1. proof

First, we show that this mapping is surjective: every distribution function $F$ has a corresponding probability measure. By the above theorem, for any CDF $F$, we can find a r.v. $X$ such that $F_X = F$. But each $X$ also induces a probability measure $\mathbb{P}_X$ on $(\mathbb{R}, \mathcal{B})$. And given this measure, we can recover the CDF by defining $F(c) = \mathbb{P}_X((-\infty, c])$.

Now, we show that this correspondence is injective: different probability measures $\mathbb{P}_X$ and $\mathbb{P}_{X'}$ necessarily yield different CDFs. Indeed, for any two $\mathbb{P}_X$ and $\mathbb{P}_{X'}$ that coincide on all intervals $(-\infty, c]$, they will be equal for all other Borel measurable sets, because the collection of intervals $(-\infty, c]$ is a generating $p$-system for $\mathcal{B}$ (see Caratheodory's Extension Theorem).