lagrange multipliers

1. setting

Let's say we are in the 2D case, and we want to find \((x,y)\) that maximizes \(f(x,y)\) subject to the constraint \(g(x,y) = 0\)

2. solution

consider the Lagrangian:

\[ \mathcal{L} = f(x,y) - \lambda g(x,y) \]

Find where \(\mathcal{L}\) is stationary
That is, solve the system of equations:

\[\begin{align*} \nabla f(x,y) &= \lambda \nabla g(x,y) \\ g(x,y) &= 0 \end{align*}\]

3. Intuition 1

As you move along the constraint curve \(g(x,y)\), how do you know you are at a local extreme point? Necessarily, \(f(x,y)\) should not increase or decrease locally at that point. This happens when moving along \(g(x,y)\) moves you along a contour of \(f(x,y)\). (Remember that all values along a contour have the same value.)

4. Intuition 2

What does the Lagrange multiplier \(\lambda\) mean? Let's look at our augmented objective: \[ \mathcal{L}(x,y,\lambda) = f(x,y) - \lambda g(x,y) \] Claim: Necessarily, the point \((x,y)\) that maximizes \(f\) while respecting constraint \(g\) can be found among the stationary points of \(\mathcal{L}\).

Reasoning: \(\lambda\) punishes us for not respecting \(g(x,y)=0\). If \(g(x,y) \neq 0\), then we can push \(\lambda \rightarrow -\infty\), and \(\mathcal{L}\) will go to \(\infty\). This is not the type of maximum we are looking for. So, we should find places where \(\mathcal{L}\) is stationary w.r.t \(\lambda\), i.e. when \(g(x,y) = 0\). By adding the \(\lambda\) multiplier, we are ensuring that our \(\mathcal{L}\) will never be at a stationary point, unless \(g(x,y) = 0\).

5. other settings

Often, the method of Lagrange multipliers is used in a slightly different way. For example, in the proof of the Neyman-Pearson Lemma, where the following theorem is used

6. theorem

For a fixed non-negative \(\lambda\), consider the objective: \[ \mathcal{L}(x, \lambda) = f(x) - \lambda g(x) \] Then, let \(x_0\) be the value that maximizes \(\mathcal{L}\). It turns out that \(x_0\) maximizes \(f(x)\) for all \(x\) that satisfy \(g(x) \leq g(x_0)\).

6.1. proof

By assumption \(x_0\) maximizes \(\mathcal{L}\), so for any \(x\): \[ f(x) - \lambda g(x) \leq f(x_0) - \lambda g(x_0) \] Then \[ f(x_0) - f(x) \geq \lambda g(x_0) - \lambda g(x) \]

Then, define the set \(S = \{x \mid g(x) \leq g(x_0)\}\). So, we have, for each \(x \in S\): \[ f(x_0) - f(x) \geq \lambda g(x_0) - \lambda g(x) \Rightarrow f(x_0) \geq f(x) \]

6.2. corollary

Let \(x_{0,\lambda}\) be the maximizing \(x\) for a given \(\lambda\). Then, if we find some \(\lambda^*\) such that \(C = g(x_{0,\lambda^*})\), then \(x_{0,\lambda^*}\) maximizes \(f(x)\) for all \(g(x) \leq C\).

This is often how the Lagrange method is used in proofs. We have some \(C\) in mind, and then we maximize \(\mathcal{L}\), with \(\lambda\) chosen so that \(C = g(x_{0, \lambda})\)

6.3. Note

What if we're dealing with an equality constraint? Note that our original statement, implies that \(x_0\) maximizes \(f(x)\) for \(x_0 = g(x_0)\) as well.