Viewing a single comment thread. View all comments

sugar_scoot t1_jajjyiq wrote

I'm not an expert but I believe the use case is if you're in an environment where you have no gradient to learn from, or even, without the hope of approximating a gradient to learn from.

97

sobe86 t1_jalldpg wrote

Plus also there needs to be a learnable, nontrivial 'strategy' to take advantage of, otherwise it's not going to beat simulated annealing except on speed. The couple of times I've used it in practice, SA was about as good as we could get performance-wise.

17

TobusFire OP t1_janxqya wrote

My thoughts too. Simulated annealing and similar strategies seem to intuitively be better is most cases where traditional gradient methods aren't applicable. I can imagine a handful of cases where genetic algorithms MIGHT be better, but even then I am not fully convinced and it just feels gimmicky.

1

rm-rf_ t1_jambi6b wrote

Don't Bayesian approaches generally work better in gradient-free optimization?

5

scawsome t1_janbtva wrote

Not necessarily. Bayesian methods work great when you have expensive objective function evaluations that can only be evaluated in serial (or limited parallel evaluations). Bayesian methods aren't ideal in massively parallelizable evaluations (evaluating >100 points at a time) or when evaluations are relatively cheap. It depends on the cost of optimizing the acquisition function. I've actually played around with combining BO with evolutionary algorithms to extend BO towards massively parallelizable evaluations and have seen some promising results.

4

_TheHalfTruth_ t1_jbdaf27 wrote

Metaheuristic algorithms like GA and simulated annealing are almost identical to Bayesian methods/MCMC. Metaheuristic algorithms are Bayesian methods if you can pretend that your objective function is proportional to a probability distribution that you want to maximize. They just take unique approaches to exploring the posterior distribution. But conceptually they’re identical

1

ajt9000 t1_janfx47 wrote

This comment make me wonder if the same rules about using one-hot encoding instead of ordinal encoding for classifiers still apply to a neural net trained with a gradient-less search algorithm like a GA instead of backprop.

1