Submitted by asarig_ t3_10sj2qf in MachineLearning
SatoshiNotMe t1_j71t20w wrote
For those not clued in, can you briefly explain what are MLP-Mixers and how they are relevant to GNNs?
asarig_ OP t1_j71wbqs wrote
Of course, MLP-Mixers is a new approach first developed as image classification and was developed independently by Google and Oxford researchers in May 2021.
The MLP-Mixer, also known simply as "Mixer", is a type of image architecture that doesn't incorporate convolutions or self-attention. Instead, it relies solely on the use of multi-layer perceptrons (MLPs) that are repeatedly applied either to different spatial locations or feature channels.
Instead of Transformers, which are normally applied on the Graph, in this work, I tried to use Mixers as a new kernel method on graphs, which aims to find out how it performs with linear complexity, avoiding the O(n***^(2)***) complexity of Transformers
janpf t1_j75zh5u wrote
Ha, the funny thing is that in the Google paper at least they replace the O(n^(2)) by a O(n*D_S), where D_S is a constant, so linear. But it so happens that D_S > n in their studies, so it's not really faster :) ... (edit: there is another constant in the transformers version also, but effectively the mixer was using same order of magnitute amount of TPU time to train)
But MLP-Mixers are a very interesting proposition anyway. Other types of mixers used are things like FFT (FNet).
gdpoc t1_j7337zm wrote
That is fascinating work.
I'd like to read the paper and will, given the time; are the results promising?
It seems reasonable that a graph with a small branching factor could reasonably replicate logarithmic search complexity of the input space to at least some extent; I'm very interested in exploring this space.
asarig_ OP t1_j73g1ne wrote
Thanks for your interest. If you open an issue on GitHub about this, I will keep it in mind as a reminder, and I can share pre-trained weights at the appropriate time.
Viewing a single comment thread. View all comments