Subsections


1.5.2 Brent's Method (Golden Section Search in one dimension)

Recall how the bisection method finds roots of functions in one dimension. The root is supposed to have been bracketed in an interval $ (a,b),$ where $ f(a)$ and $ f(b)$ have different signs. One then evaluates the function at an intermediate point x and obtains a new, smaller bracketing interval, either $ (a,x)$ or $ (x,b)$. The process continues until the bracketing interval is acceptably small. It is optional to choose $ x$ to be the midpoint of $ (a,b)$ so that the decrease in the interval length is maximized, when the function is as uncooperative as it can be, i.e., when the (bad) luck of the draw forces you to take the bigger bisected segment.

A similar procedure with a simple alteration can be applied to the problem of minimization. What does it mean to bracket a minimum? A root of a function is known to be bracketed by a pair of points, $ a<b$, when the function has opposite sign at those two points. A minimum, by contrast, is known to be bracketed only when there is a triplet of points, $ a<b<c$, such that $ f(b)$ is less than or equal to both $ f(a)$ and $ f(c)$. In this case we know that the function (if it is nonsingular) has a minimum in the interval $ (a,c)$.

The analog of bisection is to choose a new point $ x$, either between $ a$ and $ b$ or between $ b$ and $ c$. Suppose, to be specific, that we make the latter choice. Then we evaluate $ f(x)$. If $ f(b)<f(x)$, then the new bracketing triplet of points is $ a<b<x$; otherwise, if $ f(b)>f(x)$, then the new bracketing triplet is $ b<x<c$. In all cases the middle point of the new triplet is the abscissa whose ordinate is the best minimum achieved so far; see Figure 1.6 . We continue the process of bracketing until the distance between the two outer points of the triplet is sufficiently small.

\includegraphics[width=375pt]{eps/Prob15.ps}
Figure 1.6: Successive bracketing of a minimum. The minimum is originally, bracketed by points $ a_1$, $ b_1$ and $ c_1$. The function is evaluated in the first step at $ x_1$, which becomes $ c_2$, since we rename the bracketing points for the next step to $ a_2$, $ b_2$ and $ c_2$. The second evaluation gives us $ x_2$, which becomes $ c_3$. In the third step the function is evaluated at $ x_3$. This yields a new minimal value, so $ x_3$ becomes $ b_4$, and $ b_3$ becomes $ a_4$. After these three shown steps, the minimum is bracketed by points $ a_4$, $ b_4$ and $ c_4$.

How small is ''sufficiently'' small? For a minimum located at a value $ b$, you might naively think that you will be able to bracket it in as small a range as $ (1-\epsilon)b<b<(1+\epsilon)b$, where $ \epsilon$ is your computer's floating point epsilon, a number like $ 3\times10^{-8}$ (single precision) or $ 10^{-15}$ (double precision).1.4

Not so! In general, the shape of your function $ f(x)$ near $ b$ will be given by Taylor's theorem

$\displaystyle f(x)\approx f(b)+f^{\prime}(b)(x-b)+\frac{1}{2}f^{\prime\prime}(b)(x-b)^{2}$ (1.5)

The second term will be 0, since when the function has a minimum at $ b,$ $ f^{\prime}(b)$ will be zero, and the last term will be negligible compared to the first (that is, will be a factor $ \epsilon$ smaller and will act just like zero when added to it) whenever

$\displaystyle \frac{1}{2}f^{\prime\prime}(b)(x-b)^{2}$ $\displaystyle <\varepsilon\cdot f(b)$    
or $\displaystyle (x-b) ^{2}$ $\displaystyle <2\varepsilon\frac{f(b)}{ f^{\prime\prime}(b) }$    

$\displaystyle \frac{\vert x-b\vert}{b}<\sqrt{\epsilon}\sqrt{\frac{2\vert f(b)\vert}{b^{2}f^{\prime\prime}(b)}}$ (1.6)

The reason for writing the inequality in this way is that the left hand side is the relative error in $ x$, and that for most functions, the final square root is a number of order unity. Therefore, as a rule of thumb, it is hopeless to ask for bracketing with a width of less than $ O(\sqrt{\epsilon})
$. Knowing this inescapable fact will save you a lot of useless bisection!

The minimum-finding routines of this chapter will often call for a user-supplied argument $ \tau$, and return with an abscissa whose fractional precision is about $ \pm\tau$ (bracketing interval of fractional size about $ 2\tau$). Unless you have a better estimate for the right-hand side of equation (1.6), you should set $ \tau$ equal to (not much less than) the square root of your machine's floating precision, since smaller values will gain you nothing.

It remains to decide on a strategy for choosing the new point $ x$, given $ a,b,c$. Suppose that $ b$ is a fraction $ W$ of the way between $ a$ and $ c$, i.e.

$\displaystyle \frac{b-a}{c-a}=W,\quad\quad\frac{c-b}{c-a}=1-W$ (1.7)

Also suppose that our next trial point $ x$ is an additional fraction $ Z$ beyond $ b$,

$\displaystyle \frac{x-b}{c-a}=Z$ (1.8)

Then the next bracketing segment will either be of length $ W+Z$ relative to the current one, or else of length $ 1-W$. If we want to minimize the worst case possibility, then we will choose $ Z$ to make them equal, namely

$\displaystyle Z=1-2W$ (1.9)

We see at once that the new point is the symmetric point to $ b$ in the interval, namely with $ \vert b-a\vert$ equal to $ \vert x-c\vert$. This implies that the point $ x$ lies in the larger of the two segments ($ Z$ is positive only if $ W<\frac{1}{2}$). But where in the larger segment? Where did the value of $ W$ itself come from? Presumably from the previous stage of applying our same strategy. Therefore, if $ Z$ is chosen to be optimal, then so was $ W$ before it. This scale similarity implies that $ x$ should be the same fraction of the way from $ b$ to $ c$ (if that is the bigger segment) as was $ b$ from $ a$ to $ c$, in other words,

$\displaystyle \frac{Z}{1-W}=\frac{W}{1}.$ (1.10)

Equations (1.9) and (1.10) yield the quadratic equation

$\displaystyle W^{2}-3W+1=0$   yielding $\displaystyle W=\frac{3-\sqrt{5}}{2}\approx 0.38197$ (1.11)


In other words, the optimal bracketing interval $ a<b<c$ has its middle point $ b$ a fractional distance $ 0.38197$ from one end (say, $ a$), and $ 0.61803$ from the other end (say, $ b$). These fractions are those of the so-called golden mean or golden section, whose supposedly aesthetic properties were known to the ancient Pythagoreans. This optimal method of function minimization, the analog of the bisection method for finding zeros, is thus called the golden section search (or Brent's method), summarized as follows:

Given, at each stage, a bracketing triplet of points, the next point to be tried is that which is a fraction $ 0.38197$ into the larger of the two intervals (measuring from the central point of the triplet). If you start out with a bracketing triplet whose segments are not in the golden ratios, the procedure of choosing successive points at the golden mean point of the larger segment will quickly converge to the proper, self- replicating ratios.

The golden section search guarantees that each new function evaluation will (after self-replicating ratios have been achieved) bracket the minimum to an interval just $ 0.61803$ times the size of the preceding interval - even in the worst case. This is comparable to, but not quite as good as, the $ 0.5$ which holds when finding roots by bisection. Note that the convergence is linear, meaning that successive significant figures are gained linearly with additional function evaluations.

MATLAB Algorithm for the Golden Section Search:

function [xmin, fmin] = golden(f, ax, bx, cx, tol)
 
%GOLDEN   Minimize function of one variable using golden section search
%
%   [xmin, fmin] = golden(f, ax, bx, cx, tol) computes a local minimum
%   of f. xmin is the computed local minimizer of f and fmin is
%   f(xmin). xmin is computed to an relative accuracy of TOL.
%
%   The parameters ax, bx and cx must satisfy the following conditions:
%   ax < bx < cx, f(bx) < f(ax) and f(bx) < f(cx).
%
%   xmin satisfies ax < xmin < cx. golden is guaranteed to succeed if f
%   is continuous between ax and cx
%
%   Roman Geus, ETH Zuerich, 9.12.97
 
C = (3-sqrt(5))/2;
R = 1-C;
 
x0 = ax;
x3 = cx;
if (abs(cx-bx) > abs(bx-ax)),
  x1 = bx;
  x2 = bx + C*(cx-bx);
else
  x2 = bx;
  x1 = bx - C*(bx-ax);
end
f1 = feval(f,x1);
f2 = feval(f,x2);
 
k = 1;
while abs(x3-x0) > tol*(abs(x1)+abs(x2)),
  fprintf(1,'k=%4d, |a-b|=%e\n', k, abs(x3-x0));
  if f2 < f1,
    x0 = x1;
    x1 = x2;
    x2 = R*x1 + C*x3;   % x2 = x1+c*(x3-x1)
    f1 = f2;
    f2 = feval(f,x2);
  else
    x3 = x2;
    x2 = x1;
    x1 = R*x2 + C*x0;   % x1 = x2+c*(x0-x2)
    f2 = f1;
    f1 = feval(f,x1);
  end
  k = k+1;
end
 
if f1 < f2,
  xmin = x1;
  fmin = f1;
else
  xmin = x2;
  fmin = f2;
end

1.5.2.1 Brent's exponential search

Brent's exponential search is part of Brent's method: When beginning a Golden section search by choosing three points $ a,b,c$ and evaluating their values $ f(a) ,f(b) $ and $ f(c)$ it's likely to happen, that either $ f(a) \leq f(b) \leq f(c) $ or that $ f(a) \geq f(b) \geq f(c) .$ Then it is impossible to start Brent's minimization, instead the following exponential search is used:

When $ f(a_{i}) \geq f(b_{i}) \geq f(
c_{i}) $ then choose $ x_{i}$ so that $ \frac{x_{i}-c_{i}}{c_{i}-b_{i}}=
\frac{2}{\sqrt{5}-1}\approx 1.618$.

\includegraphics[width=360pt]{eps/BrentsExpoSearch.ps}
Figure 1.7: Brent's Exponential Search - Will the minimum be found?

Compute $ f(x_{i}) $ and replace $ a_{i}$, $ b_{i}$, $ c_{i}$ by $ a_{i+1}=b_{i}$, $ b_{i+1}=c_{i}$ and $ c_{i+1}=x_{i}$.

Repeat this, until $ f(x_{n}) \geq f(c_{n}) $, then start with Brent's minimization algorithm.

In each such step, $ c_{i}-b_{i}$ grows by a factor of $ \frac{2}{\sqrt{5}-1}
\approx1.618$.

Notice that this search may not terminate - e.g. for functions like $ %%
f(x) =\frac{1}{x}$ or $ f(x) =$ $ e^{-x}$, in which case a general algorithm should limit the number of trials of the exponential search.

1.5.2.2 Parabolic Interpolation

The golden section search (Brent's method) is designed to handle, in effect, the worst possible case of function minimization in one dimension, with the uncooperative minimum hunted down and cornered like a scared rabbit. But why assume the worst? If the function is nicely parabolic near its minimum - surely the generic case for sufficiently smooth functions - then the parabola fitted through any three points ought to take us in a single leap to the minimum, or at least very near ´ it. Since we want to find an abscissa rather than an ordinate, the procedure is technically called inverse parabolic interpolation.

The formula for the abscissa $ x$ which is the minimum of a parabola through three points $ f(a)$, $ f(b)$ and $ f(c)$ is

$\displaystyle x=b-\frac{1}{2}\frac{(b-a)^{2}[f(b)-f(c)]-(b-c)^{2}[f(b)-f(a)]}{ (b-a)[f(b)-f(c)]-(b-c)[f(b)-f(a)]}.$ (1.12)

For the derivation of this formula, we choose the following approach:

$\displaystyle f(x)=f(a)\frac{(x-b)(x-c)}{(a-b)(a-c) } + f(b) \frac{(x-c) (x-a) }{(b-c) (b-a) }+ f(c) \frac{(x-a)(x-b)}{( c-a) (c-b) }$    

The idea is; that the right hand side is of grade 2 in $ x$, and is identical to the left hand side, i.e. $ f(x)$ for the three $ x$-values $ a,$ $ b$ and $ c$ . Since a parabola (with given orientation) is uniquely defined by three points, this is the parabola, that we want. As we are searching for a minimum of the parabola, we compute the derivative and set it to zero:

$\displaystyle f^{\prime}(x) =f(a)\frac{(x-b) +(x-c) }{(a-b)(a-c) }+f(b) \frac{(x-c) +(x-a) }{(b-c) (b-a) }+f(c) \frac{(x-a) +(x-b) }{(c-a)(c-b) } \overset{!}{=}0$    

We multiply with the denominator $ (a-b) (b-c) (c-a)$ and get

\begin{displaymath}
\begin{array}{c@{\;=\;}l}
0 & f(a)(c-b)(2x-b-c)+f(b)(a-c)(2x...
... -f(b) \right] (b-a) }\\
\multicolumn{2}{r}{ qed.}
\end{array}\end{displaymath}

This formula fails only if three points are collinear, in which case the denominator is zero (minimum of the parabola is infinitely far away). Note, however, that Equation (1.12) is as likely to jump to a parabolic maximum as to a minimum. No minimization scheme that depends solely on Equation (1.12) is likely to succeed in practice.

1.5.2.3 A further improvement

A good goal is to design a scheme which relies on a sure-but-slow technique, like golden section search, when the function is not cooperative, but which switches over to Equation (1.12) when the function behaves well. The task is nontrivial for several reasons, including:

(i)
The housekeeping needed to avoid unnecessary function evaluations in switching between the two methods can be complicated.
(ii)
Careful attention must be given to the ''endgame'', where the function is being evaluated very near to the roundoff limit of Equation  (1.12).
(iii)
The scheme for detecting a cooperative versus non-cooperative function must be very robust.

This improved method is up to the task in all particulars, but will not be discussed in detail here, since the algorithm has a length of about 4 printed pages.

Gaston Gonnet, Institute for Scientific Computing, ETH Zürich, Switzerland
2002-02-24

With assistance from SkillsOnline and Web Pearls