Viewing a single comment thread. View all comments

schwagggg t1_isxz4cw wrote

then this sounds like measure valued derivative a bit? you perturb then calculate derivative. then wouldn’t this be at least O(D) expensive for one layer, and O(LD) for L layers of D dim rvs?

1

ChrisRackauckas OP t1_isy96fg wrote

O(LD) yes, so yeah you want reverse mode O(L+D) but without bias and at a low variance, and that's the next steps here.

1