Hello everyone,

I have a question about GCNs and would appreciate any thoughts. Do we typically use only one graph for GCN training/inference?

I'm asking this because when I saw official DGL website, there was only one example graph after loading it. Based on my experience with DNNs, I expected a batch of examples. However, it was not the case for GCNS. I could find PPI dataset with multiple graph examples (24) but for other widely used datasets (e.g., Cora, Citeseeer, and Pubmed), there was only one.

Thank you!

Comments

laaweel t1_j4pm5q3 wrote on January 17, 2023 at 11:12 AM

Hello,
it depends on the problem but it is also possible to train over many graphs.
I am also a beginner, especially in the area of graph neural networks, and found it very confusing that in all the examples only one graph was trained on at a time.
But it seems to be no problem. I am currently training a model and have 200k+ example graphs and I do predict node features.
I collected the dataset myself though. But I think there are also datasets with many graphs in the field of biology / medicine.

Feel free to reach out if you need help :)

ramya_1995 OP t1_j51h0ka wrote on January 19, 2023 at 7:07 PM

Thank you u/laaweel!

ramya_1995 OP t1_j56ovcx wrote on January 20, 2023 at 7:51 PM

u/laaweel I have another quick question. Cora dataset splits the labels into 140 trains, 500 for valid and 1000 for test (according to DGL website). I found that these numbers correspond to the number of nodes (node classification problem). But any thought why the sum (140+500+1000) does not match the total node number in Cora dataset (2708 nodes)? Is it because the rest of the nodes are unlabeled? Thank you!

laaweel t1_j56s7g8 wrote on January 20, 2023 at 8:12 PM

I didn't know it either but I found this blog post:

https://medium.com/mlearning-ai/ultimate-guide-to-graph-neural-networks-1-cora-dataset-37338c04fe6f#:~:text=Interesting!%20The%20training%20data%20contains%2020%20data%20for%20each%20class.%20The%20validation%20and%20test%20data%20do%20not%20have%20equal%20proportions%20of%20classes%2C%20but%20the%20two%20have%20a%20similar%20distribution.%20And%20these%20are%20similar%20to%20the%20percentages%20of%20classes%20in%20the%20overall%20data%20seen%20in%20the%20Classes%20section.

VonPosen t1_j573frm wrote on January 20, 2023 at 9:23 PM

You can train on multiple graphs.

multiple graph example

clemda2 t1_j5eliix wrote on January 22, 2023 at 1:10 PM

You CAN batch train GCNs (or some of them are very amenable to that) some of the most scalable GCNs rely on something like GraphSAGE convolution which doesn’t require the whole graph laplacian for updates (this approach is used by Wikipedia, Uber, Pinterest) to train highly scalable GCNs). Other convolutional operators like GAT also can be batch trained.

You can use the Python package PyTorch-Geometric documentation as a jumping off point for reading about practical graph sub sampling.