Viewing a single comment thread. View all comments

qalis t1_j8driqb wrote

I am working in this field for my PhD, so I think I can help.

A bit of self promotion, but my Master's thesis was about GNNs: https://arxiv.org/abs/2211.03666. It should be very beginner-friendly, since I had to write it while also learning about this step by step.

"Introduction to. Graph Neural Networks". Zhiyuan Liu and Jie Zhou. Tsinghua University is slightly outdated due to how fast this field is going on, but good intro.

"Graph Neural Networks Foundations, Frontiers, and Applications" (https://graph-neural-networks.github.io/) is cutting-edge, good reviews. I haven't read it though, but looks very promising.

Overviews and articles are also great, e.g. https://distill.pub/2021/gnn-intro/ or a well known (in this field) https://arxiv.org/abs/1901.00596. You should also definitely read papers about GCN (very intuitively written), GAT, GraphSAGE and GIN, the most classic 4 graph convolution architectures.

Fair comparison is, unfortunately, not common in this field. Many well-known works, e.g. GIN, do not even use a test set, and are quite unclear about this, so approach every paper with a lot of suspicion. This paper about fair comparison is becoming more and more used: https://arxiv.org/abs/1912.09893. This baseline, not GNN but similar, gives very strong results: https://arxiv.org/abs/1811.03508. I will be releasing a paper about a related method, LTP (Local Topological Profile), you can look out for it in the later part of the year.

Other interesting architectures to read about: graph transformers, Simple Graph Convolution (SGC), DiffPool, gPool, PinSAGE, DimeNet.

This very exciting area is just starting to develop, despite a lot of work done. There is no well working way to do transfer learning, for example. It is very hard to predict what will happen in 4-5 years, but e.g. Google Maps travel time prediction is currently based on GAT, and Pinterest recommendations on PinSAGE, so graph-based ML is already used in large-scale production systems. Those methods are also more and more commonly used in biological sciences, where molecular data is ubiquitous.

3