Replies: 7 comments
-
|
Besides, I also tried graphs with random node features. In this case, GNNs outperform MLPs by 2% but compared with the provided features, the performance is down by 45%. |
Beta Was this translation helpful? Give feedback.
-
|
Hi Mingxuan,
Here are the models: As you mentioned, I do see that the MLP performs quite well compared to the GNNs but for similar runs and model sizes GNNs do have an edge but this is an interesting observation that should be studied more closely. Regarding your comment about the 45% drop in performance, I tested that right now as well and saw a similar drop. We ran this experiment as discussed in our paper and noticed that GNN performance drops significantly when we use synthetic node embeddings as opposed to node embeddings generated using NLP methods. |
Beta Was this translation helpful? Give feedback.
-
|
@akhatua2 can you please check the impact of labeled nodes on MLP vs GNN performance? Perhaps having more labeled nodes helps MLP. I don't recall testing this. The other experiment worth studying is reducing embedding dim and seeing the impact. |
Beta Was this translation helpful? Give feedback.
-
|
I ran the experiment with GraphSAGE and MLP models using variable embedding shape until training convergence.
Note the 512-dim embedding model is a multilingual model (which causes the slight drop there overall). Essentially it seems that the shape of the embedding has a major impact on the performance of the MLP model but we need to study this further. |
Beta Was this translation helpful? Give feedback.
-
|
This is precisely why the IGB dataset is created - to study the impact of variable-size embeddings. If the community is interested in doing this sort of study, we should consider figuring out the right APIs to help them. Our current APIs do not provide this flexibility at the fingertips. |
Beta Was this translation helpful? Give feedback.
-
|
Dear authors, Thanks for the timely and detailed answers! I trained MLPs and GNNs through an early-stopping strategy according to the validation loss, giving me a 71 ACC for GCN and 72 ACC for MLPs. MLPs and GNNs I used are all two-layer with 64 hidden dimensions. I feel that training only 10 epochs is probably not fully utilizing the capability of either MLPs or GNNs. Can you try to increase the training steps or explore the early-stopping strategies to push the model a little further? Best, |
Beta Was this translation helpful? Give feedback.
-
|
I updated the results in my last comment. I ran the models until the training accuracy converged. The MLP does seem to perform very well however the node embedding dimension has a greater impact on its performance. We hypothesize that the node embeddings generated by the NLP models are providing a lot of information so that affects the MLP performance which only relies on the embeddings, whereas the GNNs which also learn from the graph structure are impacted less. The performance of MLP wrt to node embedding dimensions, node embedding model types and fraction of labelled data is a very interesting experiment and requires more understanding. Arpan |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Dear IGB authors,
Thanks for releasing these amazing datasets that help promote the graph community!
I am excited and can't wait to try out some of those datasets. Below is one question I have:
I tried the tiny (100K) and small (1M) versions and MLPs all easily outperform GNNs with the same amount of parameters, which is really strange. Usually, we should expect a 10-40% performance downgrade when the graph structures are missing. This phenomenon indicates that graph structures in this dataset are detrimental to the performance of node classification.
Could you please help me resolve my question?
Best,
Mingxuan
Beta Was this translation helpful? Give feedback.
All reactions