UP | HOME

lagrange multipliers

1. setting

Let's say we are in the 2D case, and we want to find \((x,y)\) that maximizes \(f(x,y)\) subject to the constraint \(g(x,y) = 0\)

2. solution

  • consider the Lagrangian:

\[ \mathcal{L} = f(x,y) - \lambda g(x,y) \]

  • Find where \(\mathcal{L}\) is stationary
  • That is, solve the system of equations:

\[\begin{align*} \nabla f(x,y) &= \lambda \nabla g(x,y) \\ g(x,y) &= 0 \end{align*}\]

3. Intuition 1

As you move along the constraint curve \(g(x,y)\), how do you know you are at a local extreme point? Necessarily, \(f(x,y)\) should not increase or decrease locally at that point. This happens when moving along \(g(x,y)\) moves you along a contour of \(f(x,y)\). (Remember that all values along a contour have the same value.). The \(\nabla f(x,y)\) is perpendicular to the contour, because if you want the greatest increase, you should not move along the curve that keeps you at the same value.

3.1. technical details

The \(\nabla g(x,y)\) is perpendicular to the constraint curve, because we do not want to make any increase in \(g\). We want to stay on the constraint curve, so we need to be perpendicular to \(\nabla g(x,y)\). For the 2D case, the vector along the constraint curve is parallel to the direction along the contour if and only if their gradients are parallel.

In the higher dimensional case, The set of all vectors perpendicular to \(\nabla g\) is the "admissible set" and the set of all vectors perpindicular to \(\nabla f\) is the set of vectors along the contour. These two sets are equal if and only if the gradients are parallel. Why?

  • First, remember that in \(n\) -dimensions, the space of vectors perpendicular to a particular vector \(v\) is \((n-1)\) -dimensional.
    • Why? You can find that space using Gram-Schmidt
  • So, we just need to remember that two vectors \(w,v\) are perpendicular to the same \((n-1)\) dimensional space if and only if they are parallel.
    • Why? (If) You can't have more than \(n\) linearly independent vectors. If the \(w,v\) were not parallel, but were both perpendicular to the same space, you would have a total of \(n+1\) linearly independent vectors. (Only if) Scaling a vector doesn't change what is perpendicular to it.
  • See further discussion and here.

4. Intuition 2

What does the Lagrange multiplier \(\lambda\) mean? Let's look at our augmented objective: \[ \mathcal{L}(x,y,\lambda) = f(x,y) - \lambda g(x,y) \] Claim: Necessarily, the point \((x,y)\) that maximizes \(f\) while respecting constraint \(g\) can be found among the stationary points of \(\mathcal{L}\).

Reasoning: \(\lambda\) punishes us for not respecting \(g(x,y)=0\). If \(g(x,y) \neq 0\), then we can push \(\lambda \rightarrow -\infty\), and \(\mathcal{L}\) will go to \(\infty\). This is not the type of maximum we are looking for. So, we should find places where \(\mathcal{L}\) is stationary w.r.t \(\lambda\), i.e. when \(g(x,y) = 0\). By adding the \(\lambda\) multiplier, we are ensuring that our \(\mathcal{L}\) will never be at a stationary point, unless \(g(x,y) = 0\).

5. other settings

Often, the method of Lagrange multipliers is used in a slightly different way. For example, in the proof of the Neyman-Pearson Lemma, where the following theorem is used

6. theorem

For a fixed non-negative \(\lambda\), consider the objective: \[ \mathcal{L}(x, \lambda) = f(x) - \lambda g(x) \] Then, let \(x_0\) be the value that maximizes \(\mathcal{L}\). It turns out that \(x_0\) maximizes \(f(x)\) for all \(x\) that satisfy \(g(x) \leq g(x_0)\).

6.1. proof

By assumption \(x_0\) maximizes \(\mathcal{L}\), so for any \(x\): \[ f(x) - \lambda g(x) \leq f(x_0) - \lambda g(x_0) \] Then \[ f(x_0) - f(x) \geq \lambda g(x_0) - \lambda g(x) \]

Then, define the set \(S = \{x \mid g(x) \leq g(x_0)\}\). So, we have, for each \(x \in S\): \[ f(x_0) - f(x) \geq \lambda g(x_0) - \lambda g(x) \Rightarrow f(x_0) \geq f(x) \]

6.2. corollary

Let \(x_{0,\lambda}\) be the maximizing \(x\) for a given \(\lambda\). Then, if we find some \(\lambda^*\) such that \(C = g(x_{0,\lambda^*})\), then \(x_{0,\lambda^*}\) maximizes \(f(x)\) for all \(g(x) \leq C\).

This is often how the Lagrange method is used in proofs. We have some \(C\) in mind, and then we maximize \(\mathcal{L}\), with \(\lambda\) chosen so that \(C = g(x_{0, \lambda})\)

6.3. Note

What if we're dealing with an equality constraint? Note that our original statement, implies that \(x_0\) maximizes \(f(x)\) for \(x_0 = g(x_0)\) as well.

7. useful links

Created: 2025-11-02 Sun 18:48