Submitted by SaltyStackSmasher t3_11euzja in MachineLearning
so I was just going through the VAE reparameterization and thought whether it can be extended to beam sampling. is this possible at all ? I think if we can backprop through beam sampling, we can directly optimise for bleu ?
please correct me if I'm wrong. I'm happy to explore a bit as well, I just don't know where to start.
cnapun t1_jage50a wrote
I'm not an expert on this topic, but I've discussed it with coworkers. I do believe you should be able to backprop through sampling, mathematically at least. My suspicion is that you'll run into the same problem as you have with RNNs, where backpropping through many steps leads to high variance in gradients. I'd search for some papers that have explored this; I assume they exist.