Submitted by asarig_ t3_10sj2qf in MachineLearning

I began exploring MLP-Mixer[1,2] on Graph Neural Networks in October 2021 and completed my implementation the ZINC dataset in November of the same year. My implementation is available on Github, but I was unable to fully conduct the experiments due to lack of computational resources.

In December 2022, a group of leading figures in the field, including Xiaoxin He, Bryan Hooi, Thomas Laurent, Adam Perold, Yann Lecun, and Xavier Bresson, published a paper titled "A Generalization of ViT/MLP-Mixer to Graphs". Although I am pleased to be working alongside these prominent researchers on the application of MLP-Mixers to Graphs, I regret that I was unable to finish my experiments. Encouraged by my friends and advisors, I decided to make my work public by publishing it on arxiv. The paper and code can be found as the following:

Paper/report: https://arxiv.org/abs/2301.12493
Github: https://github.com/asarigun/GraphMixerNetworks

I used PNA as my baseline and did not utilize patches in my study, unlike the other study. I hope someone finds them interesting/useful.

14

Comments

You must log in or register to comment.

SatoshiNotMe t1_j71t20w wrote

For those not clued in, can you briefly explain what are MLP-Mixers and how they are relevant to GNNs?

4

asarig_ OP t1_j71wbqs wrote

Of course, MLP-Mixers is a new approach first developed as image classification and was developed independently by Google and Oxford researchers in May 2021.

The MLP-Mixer, also known simply as "Mixer", is a type of image architecture that doesn't incorporate convolutions or self-attention. Instead, it relies solely on the use of multi-layer perceptrons (MLPs) that are repeatedly applied either to different spatial locations or feature channels.

Instead of Transformers, which are normally applied on the Graph, in this work, I tried to use Mixers as a new kernel method on graphs, which aims to find out how it performs with linear complexity, avoiding the O(n***^(2)***) complexity of Transformers

5

janpf t1_j75zh5u wrote

Ha, the funny thing is that in the Google paper at least they replace the O(n^(2)) by a O(n*D_S), where D_S is a constant, so linear. But it so happens that D_S > n in their studies, so it's not really faster :) ... (edit: there is another constant in the transformers version also, but effectively the mixer was using same order of magnitute amount of TPU time to train)

But MLP-Mixers are a very interesting proposition anyway. Other types of mixers used are things like FFT (FNet).

3

gdpoc t1_j7337zm wrote

That is fascinating work.

I'd like to read the paper and will, given the time; are the results promising?

It seems reasonable that a graph with a small branching factor could reasonably replicate logarithmic search complexity of the input space to at least some extent; I'm very interested in exploring this space.

2

asarig_ OP t1_j73g1ne wrote

Thanks for your interest. If you open an issue on GitHub about this, I will keep it in mind as a reminder, and I can share pre-trained weights at the appropriate time.

2