@@ -23,19 +23,6 @@ Replacing :math:`\mathbf{P}_{1:t}` by :math:`\mathbf{P}_{1:t}-\mathcal{P}(\mathb
See :ref:`the Limbo implementation guide <mean-api>` for the available mean functions.
Black lists
-----------
When performing experiments, it is possible that some solutions cannot be properly evaluated. For example, this situation happens often with a physical robot, typically because (1) the robot may be outside the sensor’s range, for example when the robot is not visible from the camera’s point of view, making it impossible to assess its performance and (2) the sensor may return intractable values (infinity, NaN,...).
Different solutions exist to deal with missing data. The simplest way consists in redoing the evaluation. This may work, but only if the problem is not deterministic, otherwise the algorithm will be continuously redoing the same, not working, evaluation. A second solution consists in assigning a very low value to the behavior’s performance, like a punishment. This approach will work with evolutionary algorithms because the corresponding individual will very likely be removed from the population in the next generation. By contrast, this approach will have a dramatic effect on algorithms using models of the reward function, like Bayesian Optimization, as the models will be completely distorted.
These different methods to deal with missing data do not fit well with the Bayesian Optimization framework. Limbo uses a different approach, compatible with Bayesian Optimization, which preserves the model’s stability. The overall idea is to encourage the algorithm to avoid regions around behaviors that could not be evaluated, which may contain other behaviors that are not evaluable too, but without providing any performance value, which is likely to increase the model’s instability.
In order to provide the information that some behaviors have already been tried, we define a blacklist of samples. Each time a behavior cannot be properly evaluated, this behavior is added into the blacklist (and not in the pool of tested behaviors). Because the performance value is not available, only the behavior’s location in the search space is added to the blacklist. In other words, the blacklists are a list of samples with missing performance data.
Thanks to this distinction between valid samples and blacklisted ones, the algorithm can consider only the valid samples when computing the mean of the Gaussian Process and both valid and blacklisted samples when computing the variance. By ignoring blacklisted samples, the mean will remain unchanged and free to move according to future observations. By contrast, the variance will consider both valid and blacklisted samples and will “mark” them as already explored .