Difference between revisions of "Lagrange function"
m (Typo) |
m (Typos) |
||
Line 18: | Line 18: | ||
Then there exists $\lambda_0^\star, \lambda_1^\star, \ldots, \lambda_m^\star \in \mathbb R$ such that, if $F$ denotes the function | Then there exists $\lambda_0^\star, \lambda_1^\star, \ldots, \lambda_m^\star \in \mathbb R$ such that, if $F$ denotes the function | ||
\[ | \[ | ||
− | F(\lambda, x) = \lambda_0 f(x) + \sum_{i=1}^m \lambda_i (g_i (x) | + | F(\lambda, x) = \lambda_0 f(x) + \sum_{i=1}^m \lambda_i (b_i - g_i (x))\, , |
\] | \] | ||
then | then | ||
Line 29: | Line 29: | ||
\] | \] | ||
− | $F$ is called Lagrange function and the numbers $\lambda_0, \lambda_1, \ldots , \lambda_m$ | + | $F$ is called Lagrange function and the numbers $\lambda_0, \lambda_1, \ldots , \lambda_m$ Lagrange multipliers. The statement takes a more elegant form under the assumption that $b = (b_1, \ldots , b_m)$ is a [[Regular value|regular value]] of the map $g= (g_1, \ldots , g_m)$. More precisely |
− | the assumption that $b = (b_1, \ldots , b_m)$ is a [[Regular value|regular value]] of the map $g= (g_1, \ldots , g_m)$. More precisely | ||
'''Theorem 2''' | '''Theorem 2''' | ||
Line 40: | Line 39: | ||
In fact the statement of Theorem 2 is more common than that of Theorem 1 and it is typically the slightly less general version of \eqref{e:Lagrange_function} to | In fact the statement of Theorem 2 is more common than that of Theorem 1 and it is typically the slightly less general version of \eqref{e:Lagrange_function} to | ||
− | which the name "Lagrange function" refers to. Observe also that Theorem 2 gives a system of $m+n$ equations in the $m+n$ unknowns $x^\star_1, \ldots , x^\star_n, \lambda^\star_1 , \ldots , \lambda^\ | + | which the name "Lagrange function" refers to. Observe also that Theorem 2 gives a system of $m+n$ equations in the $m+n$ unknowns $x^\star_1, \ldots , x^\star_n, \lambda^\star_1 , \ldots , \lambda^\star_m$. |
− | For most practical problems Theorem 2 is enough. However it is not difficult to give examples where Theorem 2 does not apply and one has to resort to Theorem 1 | + | For most practical problems Theorem 2 is enough. However it is not difficult to give examples where Theorem 2 does not apply and one has to resort to Theorem 1 (see {{Cite|Sm}} or {{Cite|Bl}}). It is possible to give an interpretation of the Lagrange multipliers $\lambda_i^\star$ that has a definite physical meaning and which can be succesfully generalized to constraints in the form of inequalities (see [[Lagrange multipliers|Lagrange multipliers]]). |
− | (see {{Cite|Sm}} or {{Cite|Bl}}) It is possible to give an interpretation of the Lagrange multipliers $\lambda_i^\star$ that has a definite physical meaning and which can be succesfully generalized to constraints in the form of inequalities (see [[Lagrange multipliers|Lagrange multipliers]]). | ||
In the case when the function $f$ to be optimized is quadratic and the constraints $g_i$ are linear, the resulting system of necessary conditions turns out to be linear. Such system is otherwise nonlinear and in general one has to resort to appropriate computational methods to approximate the solutions (such as the [[Newton method|Newton method]]). The main issue, apart from the computational difficulties of solving a system of non-linear equations, is the problem of obtaining all solutions satisfying the necessary conditions. The fact that no computational process ensures that one obtains all solutions of the system is one of the main restrictions on the practical applicability of the method of Lagrange multipliers. | In the case when the function $f$ to be optimized is quadratic and the constraints $g_i$ are linear, the resulting system of necessary conditions turns out to be linear. Such system is otherwise nonlinear and in general one has to resort to appropriate computational methods to approximate the solutions (such as the [[Newton method|Newton method]]). The main issue, apart from the computational difficulties of solving a system of non-linear equations, is the problem of obtaining all solutions satisfying the necessary conditions. The fact that no computational process ensures that one obtains all solutions of the system is one of the main restrictions on the practical applicability of the method of Lagrange multipliers. | ||
Line 81: | Line 79: | ||
F (x, \lambda^\star) \leq F (x^\star, \lambda^\star) \leq F (x^\star, \lambda)\, | F (x, \lambda^\star) \leq F (x^\star, \lambda^\star) \leq F (x^\star, \lambda)\, | ||
\] | \] | ||
− | for all $x$ and $\lambda$ in a neighborhood, respectively, of $x^\star$ and $\lambda^\star$. In the case when $f$ is concave | + | for all $x$ and $\lambda$ in a neighborhood, respectively, of $x^\star$ and $\lambda^\star$. In the case when $f$ is concave, $g_i$ is convex when $\lambda^\star_i>0$ and concave when $\lambda^\star_i <0$, then such conditions turn out to be sufficient. |
− | The general case considered above can be easily reduced, upon a transformation of the dependent and independent variables, into the problem of finding the maximum of a function $f$ under the restriction that $x_i \geq 0$ for all $i$, $g_j (x) \geq 0$ for all $1\leq j \leq \bar{m}$ and $g_i (x) =0$ for all $\bar{m}+1 \leq i \leq | + | The general case considered above can be easily reduced, upon a transformation of the dependent and independent variables, into the problem of finding the maximum of a function $f$ under the restriction that $x_i \geq 0$ for all $0\leq i \leq n$, $g_j (x) \geq 0$ for all $1\leq j \leq \bar{m}$ and $g_i (x) =0$ for all $\bar{m}+1 \leq i \leq m$. Obviously in this case the conditions \eqref{e:Lagr_gen} take a simpler form. |
====Calculus of varations==== | ====Calculus of varations==== |
Latest revision as of 23:04, 27 June 2014
2010 Mathematics Subject Classification: Primary: 49-XX [MSN][ZBL]
A function, related to the method of Lagrange multipliers, that is used to derive necessary conditions for conditional extrema of functions of several variables or, in a wider setting, of functionals. The primary example is that of locating the local maxima (resp. minima) of a function $g$ of $n$ variables $(x_1, \ldots , x_n)=:x$ over the set $\Sigma$ of points $x$ which satisfy the constraints $g_1 (x) = \ldots = g_m (x) = 0$. The method of Lagrange multipliers is particularly useful because the corresponding conditions can often be solved without resorting to explcit formulae describing the set of points in $\Sigma$ in terms of $n-m$ independent variables. In fact the necessary conditions obtained by means of a Lagrange function form often a closed system of relations, i.e. a system of $n+m$ equations in $n+m$ variables. Lagrange functions are used in both theoretical questions of linear and non-linear programming as in applied problems where they provide often explicit computational methods.
Contents
The method of Lagrange multipliers
A rather general statement in the finite-dimensional setting is the following
Theorem 1 Assume $f, g_1, \ldots, g_m$ are $C^1$ function of $n$ variables and let $x^\star$ be a local maximum (resp. minimum) point of $f$ subject to the constraint $g_1, \ldots, g_m$: namely there is a neighborhood $U$ of $x^\star$ such that \[ f (x^\star) \geq f(y) \quad \big(\mbox{resp. } f (x^\star) \leq f(y)\big)\qquad \mbox{for all } y \mbox{ such that } g_i (x^\star)= g_i (y) = b_i \qquad \forall i\in \{1, \ldots , m\}\, . \] Then there exists $\lambda_0^\star, \lambda_1^\star, \ldots, \lambda_m^\star \in \mathbb R$ such that, if $F$ denotes the function \[ F(\lambda, x) = \lambda_0 f(x) + \sum_{i=1}^m \lambda_i (b_i - g_i (x))\, , \] then \[ \frac{\partial F}{\partial x_j} (x^\star, \lambda^\star) = 0 \qquad \forall j\in \{1, \ldots , n\} \] and \[ \frac{\partial F}{\partial \lambda_i} (x^\star, \lambda^\star) = 0 \qquad \forall i\in \{1, \ldots, m\}\, . \]
$F$ is called Lagrange function and the numbers $\lambda_0, \lambda_1, \ldots , \lambda_m$ Lagrange multipliers. The statement takes a more elegant form under the assumption that $b = (b_1, \ldots , b_m)$ is a regular value of the map $g= (g_1, \ldots , g_m)$. More precisely
Theorem 2 Let $f$, $g= (g_1, \ldots , g_m)$ and $x^\star$ be as in Theorem 1 and assume in addition that $m<n$ and that the differential of $g$ at $x^\star$ is surjective. Then there exists $\lambda^\star = (\lambda_1^\star, \ldots, \lambda_m^\star) \in \mathbb R^m$ such that $(x^\star, \lambda^\star)$ is a critical point of the function \begin{equation}\label{e:Lagrange_function} F(\lambda, x) = f(x) + \sum_{i=1}^m \lambda_i (b_i - g_i (x))\, . \end{equation}
In fact the statement of Theorem 2 is more common than that of Theorem 1 and it is typically the slightly less general version of \eqref{e:Lagrange_function} to which the name "Lagrange function" refers to. Observe also that Theorem 2 gives a system of $m+n$ equations in the $m+n$ unknowns $x^\star_1, \ldots , x^\star_n, \lambda^\star_1 , \ldots , \lambda^\star_m$.
For most practical problems Theorem 2 is enough. However it is not difficult to give examples where Theorem 2 does not apply and one has to resort to Theorem 1 (see [Sm] or [Bl]). It is possible to give an interpretation of the Lagrange multipliers $\lambda_i^\star$ that has a definite physical meaning and which can be succesfully generalized to constraints in the form of inequalities (see Lagrange multipliers).
In the case when the function $f$ to be optimized is quadratic and the constraints $g_i$ are linear, the resulting system of necessary conditions turns out to be linear. Such system is otherwise nonlinear and in general one has to resort to appropriate computational methods to approximate the solutions (such as the Newton method). The main issue, apart from the computational difficulties of solving a system of non-linear equations, is the problem of obtaining all solutions satisfying the necessary conditions. The fact that no computational process ensures that one obtains all solutions of the system is one of the main restrictions on the practical applicability of the method of Lagrange multipliers.
Constraints in the form of inequalities
As already mentioned, the method can be extended to minimization problems in which some of the constraints appear as inequalities. Consider in particular the following setting. $x^\star$ is a local extremum for the function $f: \mathbb R^n \to \mathbb R$ among the points $x$ which satisfy the following constraints: \begin{equation}\label{e:gen_const} \left\{ \begin{array}{ll} g_i (x) \leq b_i \qquad\qquad & \mbox{ for } 1 \leq i \leq m_1\\ g_i (x) \geq b_i \qquad\qquad & \mbox{ for } m_1+1 \leq i \leq m_2\\ g_i (x) = b_i \qquad\qquad & \mbox{ for } m_2+1 \leq i \leq m\\ x_j \geq 0 \qquad\qquad &\mbox{ for } 1 \leq j \leq n\, . \end{array}\right. \end{equation} Specifically, let us assume that $x^\star$ is a local maximum, that the functions $f$ and $g_i$ are $C^1$, that $m < n$ and that $b$ is a regular value for the map $g$ (or, more generally, that the differential of $g$ is surjective at the point $x^\star$). If we consider the Lagrange function as in \eqref{e:Lagrange_function} and denote by $J$ the set of indices $j$ for which $x^\star_j > 0$ and by $I$ the set of indices for which the inequalities in \eqref{e:gen_const} are strict, then there is a vector $\lambda^\star \in \mathbb R^m$ which satisfies the following set of conditions \begin{equation}\label{e:Lagr_gen} \left\{ \begin{array}{ll} \lambda^\star_i \geq 0 \qquad\qquad&\mbox{ for } 1 \leq i \leq m_1\\ \lambda^\star_i \leq 0 \qquad\qquad &\mbox{ for } m_1+1 \leq i \leq m_2\\ \lambda^\star_i = 0 \qquad\qquad &\mbox{ for } i \in I\\ \frac{\partial F}{\partial \lambda_i} (x^\star, \lambda^\star) \geq 0 \qquad\qquad &\mbox{ for } 1\leq i \leq m_1\\ \frac{\partial F}{\partial \lambda_i} (x^\star, \lambda^\star) \leq 0 \qquad\qquad &\mbox{ for } m_1+1 \leq i \leq m_2\\ \frac{\partial F}{\partial \lambda_i} (x^\star, \lambda^\star) = 0 \qquad\qquad &\mbox{ for } m_2 +1 \leq i \leq m\\ \frac{\partial F}{\partial x_j} (x^\star, \lambda^\star) = 0 \qquad\qquad &\mbox{ for } j\in J\\ \frac{\partial F}{\partial x_j} (x^\star, \lambda^\star) \leq 0 \qquad\qquad &\mbox{ for } j\not\in J\, . \end{array} \right. \end{equation} Obviously these conditions generalize the ones of Theorem 2. Using the concept of a saddle point of the function $F$, one can interpret these conditions above as necessary conditions to ensure that \[ F (x, \lambda^\star) \leq F (x^\star, \lambda^\star) \leq F (x^\star, \lambda)\, \] for all $x$ and $\lambda$ in a neighborhood, respectively, of $x^\star$ and $\lambda^\star$. In the case when $f$ is concave, $g_i$ is convex when $\lambda^\star_i>0$ and concave when $\lambda^\star_i <0$, then such conditions turn out to be sufficient.
The general case considered above can be easily reduced, upon a transformation of the dependent and independent variables, into the problem of finding the maximum of a function $f$ under the restriction that $x_i \geq 0$ for all $0\leq i \leq n$, $g_j (x) \geq 0$ for all $1\leq j \leq \bar{m}$ and $g_i (x) =0$ for all $\bar{m}+1 \leq i \leq m$. Obviously in this case the conditions \eqref{e:Lagr_gen} take a simpler form.
Calculus of varations
An analogue of the Lagrange function is used in the calculus of variations in considering problems on a conditional extremum of functionals. Here also it is convenient to write the necessary conditions for optimality in a problem on a conditional extremum as necessary conditions for some functional (the analogue of the Lagrange function) constructed by means of Lagrange multipliers. Suppose, for example, that one is given the Bolza problem: Find a minimum of the functional \[ J (x) = \int_{t_1}^{t_2} f (t, x(t), \dot{x} (t))\, dt + g (t_1, x (t_1), t_2, x(t_2))\, \] in the presence of differential constraints of type \[ \phi (t, x(t), \dot{x} (t)) =0 \qquad \qquad \forall t\in [t_1, t_2] \] and boundary conditions \[ \psi (t_1, g(t_1), t_2, g(t_2))\, , \] where the unknown $x$ is a curve $x: [t_1, t_2]\to \mathbb R^n$ and the given functions $f$, $\phi$ and $\psi$ are $f: [t_1, t_2]\times \mathbb R^n \times \mathbb R^n \to \mathbb R$, $\phi: [t_1, t_2] \times \mathbb R^n \times \mathbb R^n \to \mathbb R^m$, for some $m< n$, and $\psi: [t_1, t_2]\times \mathbb R^n \times\mathbb R^n \to \mathbb R^p$ for some $p\leq 2n+2$.
Under appropriate assumptions a necessary condition for a minimizer is the existence of Lagrange multipliers $\lambda_1, \ldots, \lambda_m : [t_1, t_2]\to \mathbb R$ and $e_1, \ldots, e_p\in \mathbb R$ such that the triple $(x, \lambda, e)$ is a critical point of the functional \[ I (x, \lambda, e) := \int_{t_1}^{t_2} \left( f (t, x(t), \dot{x} (t)) + \sum_i \lambda_i (t) \phi (t, x(t), \dot{x} (t)\right)\, dt + g (t_1, x(t_1), t_2, x(t_2)) + \sum_\mu e_\mu \psi_\mu (t_1, g(t_1), t_2, g (t_2))\, . \] These necessary conditions, which form a closed system of relations for determining an optimal solution and the corresponding Lagrange multipliers, are written in specific formulations as the Euler equation, the Weierstrass conditions (for a variational extremum) and the transversality condition.
Comments
Since the early 1960's, much effort has gone into the development of a generalized kind of differentiation for extended-real-valued functions on real vector spaces that can serve well in the analysis of problems of optimization. In such problems, which range from models in economics and operations research to variational principles that correspond to partial differential equations, it is often necessary to treat functions that are not differentiable in any traditional two-sided sense and may even have infinite values at some points. See [Ro2].
References
[Ak] | N.I. Akhiezer, "The calculus of variations" , Blaisdell (1962) |
[Av] | M. Avriel, "Nonlinear programming: analysis and methods" , Prentice-Hall (1976) |
[Bl] | G.A. Bliss, "Lectures on the calculus of variations" , Chicago Univ. Press (1947) |
[Ce] | L. Cesari, "Optimization - Theory and applications" , Springer (1983) |
[Ha] | G.F. Hadley, "Nonlinear and dynamic programming" , Addison-Wesley (1964) |
[KT] | H.W. Kuhn, A.W. Tucker, "Nonlinear programming" , Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950) , Univ. California Press (1951) |
[Ro] | R.T. Rockafellar, "Convex analysis" , Princeton Univ. Press |
[Ro2] | R.T. Rockafellar, "The theory of subgradients and its applications to problems of optimization. Convex and nonconvex functions" , Heldermann (1981) |
[Sm] | V.I. Smirnov, "A course of higher mathematics" , 1 , Addison-Wesley (1964) |
Lagrange function. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Lagrange_function&oldid=32317