ChrisRackauckas

ChrisRackauckas OP t1_isxmv6s wrote

We mentioned at the end of the paper that reverse mode requires smoothing in a way that works but indices a bias (except in some cases like the particle filter). This is something we will be looking deeper into.

2

ChrisRackauckas OP t1_iswr0wc wrote

(1) while running your primal program, you run another problem that is propagating infinitesimal probabilities of certain pieces changing, and then it chooses the flips according to the right proportion (as derived in the paper) to give two correlated but different runs to difference for Y(p). But this Y(p) is defined to have the property that E[Y(p)]=dE[X(p)]/dp with a low variance, so you do this a few times and that thing is your gradient estimate. (2) unlike previous other algorithms with known exponential cost scaling (for example, see https://openreview.net/forum?id=KAFyFabsK88 for a deep discussion on previous work's performance), this scales linearly. 1024 should be fine. Note that this is related to forward mode AD so "really big" needs more work, but that size is fine.

2

ChrisRackauckas OP t1_ist3bu8 wrote

> What do you mean by that or rather what is a program that is not mathematical?

If it outputs strings or code it may not work with this method. It should output numbers in a way that has a well-defined (differentiable) expectation.

> Is it correct to say that the execution path of the program changes with different outcomes for p, i.e., if(random_event(p)) {return 1;} else {return 0;}? Or is this a different problem?

It can. One of the examples is differentiation of an inhomogeneous random walk, which is a stress test of doing this kind of branch handling.

> but can you estimate how much work it is to incorporate this kind of AD in existing AD tools like CoDiPack?

That's hard to say. I would say it wouldn't be "too hard", though it may be hard to add this without overhead for "normal" cases? It would make the code more complicated but the standard cases are just a special case here, so it should be fine.

1

ChrisRackauckas OP t1_issyai3 wrote

We mean the standard "agent based model" https://www.pnas.org/doi/10.1073/pnas.082080899, https://en.wikipedia.org/wiki/Agent-based_model . The kind of thing you'd use Agents.jl for. For example, look at agent-based infection models. In these kind of models you create many individuals (agents) with rules. Each agent moves around, but if one is standing near an agent that is infected, there's a probability of infecting the nearby agent. What is the average percentage of infected people at time t?

2