janpf

janpf t1_j75zh5u wrote

Reply to comment by asarig_ in [R] Graph Mixer Networks by asarig_

Ha, the funny thing is that in the Google paper at least they replace the O(n^(2)) by a O(n*D_S), where D_S is a constant, so linear. But it so happens that D_S > n in their studies, so it's not really faster :) ... (edit: there is another constant in the transformers version also, but effectively the mixer was using same order of magnitute amount of TPU time to train)

But MLP-Mixers are a very interesting proposition anyway. Other types of mixers used are things like FFT (FNet).

3

janpf t1_j4ai7ia wrote

If you use synthetic data (from the crop simulation models), the model will kind of reverse-engineer it (it will learn what the simulation models are doing).

Using a mix of it with real word data, is like regularizing your model (adding a prior) to the simulation rules.

This is something that makes sense, and mixing data often is used. But "making sense" doesn't necessarily means it helps ... that depends a lot on your application. Also the next question is how much synthetic data you may want to mix ... fundamentally you'll have to figure it out by trial&error and having some way of measuring if things are getting better for whatever your extrinsic goal is (your business objective).

0