Commit 12e4c540 authored by Jean-Baptiste Mouret's avatar Jean-Baptiste Mouret
Browse files

add a tutorial about the optimization sub-api + improve the doc about optimization

parent 18e51287
......@@ -161,7 +161,27 @@ Default Parameters
Optimization functions (opt)
------------------------------
Optimizers are used both to optimize acquisition functions and to optimize hyper-parameters. Some optimizers require the gradient, some don't.
In Limbo, optimizers are used both to optimize acquisition functions and to optimize hyper-parameters. However, this API might be helpful in other places whenever an optimization of a function is needed.
.. warning::
Limbo optimizers always MAXIMIZE f(x), whereas many libraries MINIMIZE f(x)
Most algorithms are wrappers to external libraries (NLOpt and CMA-ES). Only the Rprop (and a few control algorithms like 'RandomPoint') is implemented in Limbo. Some optimizers require the gradient, some don't.
The tutorial :ref:`Optimization sub-API <opt-api>` describes how to use the opt:: API in your own algorithms.
The return type of the function to be optimized is ``eval_t``, which is defined as a pair of a double (f(x)) and a vector (the gradient):
.. code-block:: cpp
typedef std::pair<double, boost::optional<Eigen::VectorXd>> eval_t;
To make it easy to work with ``eval_t``, Limbo defines a few shortcuts:
.. doxygengroup:: opt_tools
:members:
Template
^^^^^^^^^
......@@ -181,18 +201,32 @@ Template
.. code-block:: cpp
limbo::opt::eval_t my_function(const Eigen::VectorXd& v)
limbo::opt::eval_t my_function(const Eigen::VectorXd& v, bool eval_grad = false)
{
double x = <function_value>;
double fx = <function_value>;
Eigen::VectorXd gradient = <gradient>;
return std::make_pair(x, gradient);
return {fx, gradient};
}
It is possible to make it a bit more generic by not computing the gradient when it is not asked, that is:
.. code-block:: cpp
limbo::opt::eval_t my_function(const Eigen::VectorXd& v, bool eval_grad = false)
{
double fx = <function_value>;
if (!eval_grad)
return opt::no_grad(v);
Eigen::VectorXd gradient = <gradient>;
return {fx, gradient};
}
- If the gradient of ``f`` is not known:
.. code-block:: cpp
limbo::opt::eval_t my_function(const Eigen::VectorXd& v)
limbo::opt::eval_t my_function(const Eigen::VectorXd& v, bool eval_grad = false)
{
double x = <function_value>(v);
return limbo::opt::no_grad(x);
......@@ -201,6 +235,21 @@ Template
- ``init`` is an optionnal starting point (for local optimizers); many optimizers ignore this argument (see the table below): in that case, an assert will fail.
- ``bounded`` is true if the optimization is bounded in [0,1]; many optimizers do not support bounded optimization (see the table below).
- ``eval_grad`` allows Limbo to avoid computing the gradient when it is not needed (i.e. when the gradient is known but we optimize using a gradient-free optimizer).
To call an optimizer (e.g. NLOptGrad):
.. code-block:: cpp
// the type of the optimizer (here NLOpt with the LN_LBGFGS algorithm)
opt::NLOptGrad<ParamsGrad, nlopt::LD_LBFGS> lbfgs;
// we start from a random point (in 2D), and the search is not bounded
Eigen::VectorXd res_lbfgs = lbfgs(my_function, tools::random_vector(2), false);
std::cout <<"Result with LBFGS:\t" << res_lbfgs.transpose()
<< " -> " << my_function(res_lbfgs).first << std::endl;
Not all the algorithms support bounded optimization:
+-------------+---------+-------+
|Algo. | bounded | init |
......@@ -216,6 +265,11 @@ Template
|RandomPoint | yes | no |
+-------------+---------+-------+
For example, to use the LBFGS optimizer to optimize :math:`(x-0.5)^2`:
.. code-block:: cpp
Available optimizers
^^^^^^^^^^^^^^^^^^
.. doxygengroup:: opt
......@@ -226,11 +280,6 @@ Default parameters
.. doxygengroup:: opt_defaults
:undoc-members:
Utility functions & typedefs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. doxygengroup:: opt_tools
:members:
Models / Gaussian processes (model)
......@@ -245,6 +294,8 @@ The hyper-parameters of the model (kernel, mean) can be optimized. The following
.. doxygengroup:: model_opt
:members:
See the `Gaussian Process`_ tutorial for a tutorial about using GP without using a Bayesian optimization algorithm.
Kernel functions (kernel)
--------------------------
......
......@@ -6,7 +6,8 @@ Tutorials
compilation
basic_example
external_libs
gp
advanced_example
statistics
external_libs
gp
opt
.. _opt-api:
Optimization Sub-API
====================
Limbo uses optimizers in several situations, most notably to optimize hyper-parameters of Gaussian processes and to optimize acquisition functions. Nevertheless, these optimizers might be useful in other contexts. This tutorial briefly explains how to use it.
Optimizers in Limbo are wrappers around:
- NLOpt (which provides many local, global, gradient-based, gradient-free algorithms)
- libcmaes (which provides the Covariance Matrix Adaptation Evolutionary Strategy, that is, CMA-ES)
- a few other algorithms that are implemented in Limbo (in particular, RPROP, which is gradient-based optimization algorithm)
We first need to define a function to be optimized. Here we chose :math:`-(x_1-0.5)^2 - (x_2-0.5)^2`, whose maximum is [0.5, 0.5]:
.. literalinclude:: ../../src/tutorials/opt.cpp
:language: c++
:linenos:
:lines: 27-34
.. warning::
Limbo optimizers always MAXIMIZE f(x), whereas many libraries MINIMIZE f(x)
The first thing to note is that the functions needs to return an object of type ``eval_t``, which is actually a pair made of a double (:math:`f(x)`) and a vector (the gradient). We need to do so because (1) many fast algorithm use the gradient, (2) the gradient and the function often share some computations, therefore it is often faster to compute both the function value and the gradient at the same time (this is, for instance, the case with the log-likelihood that we optimize to find the hyper-parameters of Gaussian processes).
Thanks to c++11, we can simply return ``{v, grad}`` and an object of type ``eval_t`` will be created. When we do not know how to compute the gradient, we return ``opt::no_grad(v)``, which creates a special object without the gradient information (using boost::optional).
The boolean ``eval_grad`` is true when we need to evaluate the gradient for x, and false otherwise. This is useful because some algorithms do not need the gradient: there is no need to compute this value.
As usual, each algorithm has some parameters (typically the number of iterations to perform). They are defined as the other parameters in Limbo (see `Parameters`):
.. literalinclude:: ../../src/tutorials/opt.cpp
:language: c++
:linenos:
:lines: 7-10
Now we can instantiate our optimizer and call it:
.. literalinclude:: ../../src/tutorials/opt.cpp
:language: c++
:linenos:
:lines: 40-45
We can do the same with a gradient-free optimizer from NLOpt:
.. literalinclude:: ../../src/tutorials/opt.cpp
:language: c++
:linenos:
:lines: 47-53
Or with CMA-ES:
.. literalinclude:: ../../src/tutorials/opt.cpp
:language: c++
:linenos:
:lines: 58-62
Here is the full file.
.. literalinclude:: ../../src/tutorials/opt.cpp
:language: python
:linenos:
#include <limbo/opt.hpp>
#include <limbo/tools.hpp>
// this short tutorial shows how to use the optimization api of limbo (opt::)
using namespace limbo;
struct ParamsGrad {
struct opt_nloptgrad {
BO_PARAM(int, iterations, 80);
};
};
struct ParamsNoGrad {
struct opt_nloptnograd {
BO_PARAM(int, iterations, 80);
};
};
struct ParamsCMAES {
struct opt_cmaes : public defaults::opt_cmaes {
};
};
// we maximize -(x_1-0.5)^2 - (x_2-0.5)^2
// the maximum is [0.5, 0.5] (f([0.5, 0.5] = 0))
opt::eval_t my_function(const Eigen::VectorXd& params, bool eval_grad = false)
{
double v = -(params.array() - 0.5).square().sum();
if (!eval_grad)
return opt::no_grad(v);
Eigen::VectorXd grad = (-2 * params).array() + 1.0;
return {v, grad};
}
int main(int argc, char** argv)
{
#ifdef USE_NLOPT
// the type of the optimizer (here NLOpt with the LN_LBGFGS algorithm)
opt::NLOptGrad<ParamsGrad, nlopt::LD_LBFGS> lbfgs;
// we start from a random point (in 2D), and the search is not bounded
Eigen::VectorXd res_lbfgs = lbfgs(my_function, tools::random_vector(2), false);
std::cout <<"Result with LBFGS:\t" << res_lbfgs.transpose()
<< " -> " << my_function(res_lbfgs).first << std::endl;
// we can also use a gradient-free algorith, like DIRECT
opt::NLOptNoGrad<ParamsNoGrad, nlopt::GN_DIRECT> direct;
// we start from a random point (in 2D), and the search is bounded in [0,1]
// be careful that DIRECT does not support unbounded search
Eigen::VectorXd res_direct = direct(my_function, tools::random_vector(2), true);
std::cout <<"Result with DIRECT:\t" << res_direct.transpose()
<< " -> " << my_function(res_direct).first << std::endl;
#endif
#ifdef USE_LIBCMAES
// or Cmaes
opt::Cmaes<ParamsCMAES> cmaes;
Eigen::VectorXd res_cmaes = cmaes(my_function, tools::random_vector(2), false);
std::cout <<"Result with CMA-ES:\t" << res_cmaes.transpose()
<< " -> " << my_function(res_cmaes).first << std::endl;
#endif
return 0;
}
......@@ -27,3 +27,9 @@ def build(bld):
target='gp',
uselib='BOOST EIGEN TBB LIBCMAES NLOPT',
use='limbo')
obj = bld.program(features='cxx',
source='opt.cpp',
includes='. .. ../../',
target='opt',
uselib='BOOST EIGEN TBB LIBCMAES NLOPT',
use='limbo')
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment