Node Classification on Knowledge Graphs using PyTorch Geometric

Рет қаралды 37,482

Күн бұрын

Пікірлер: 121

@prajwol_poudel 2 жыл бұрын

Do you need to build the F.softmax in the final classification layer in the model? I think the torch.nn.CrossEntropy loss does this internally for us.

@DeepFindr 2 жыл бұрын

Yep that's right! There are two variants of Cross entropy (one with logits and one without). The regular one already applies softmax, I think I forgot that in this video. I'm used to this since it was not always included. :D

@elyseemanimpiregasana2117 2 жыл бұрын

@@DeepFindr I think we used cross entropy since we have multi-class classification and then softmax is the best activation function to produce probability of each class at the final layer since it assigns high probability to the predicted class.

@MyGeoStats 3 жыл бұрын

Best Pytorch geometric tutorial forever. Thank you for save my research project

@kvnptl4400 3 ай бұрын

Highly appreciate the effort. I really like how you started with the theoretical part and then worked on a real dataset. Thanks!

@Braininfection 3 жыл бұрын

Thank you so much. I watched so many videos about NNs and yours is one of the most impressiv so far. I know it's a very special topic but I hope that you will reach more people in the future! Your combination of theoretical aspects and practice is so helpfull. I have seen that you just started. I hope you keep it up!

@DeepFindr 3 жыл бұрын

Thanks for the kind feedback! :)

@zjy1716 2 жыл бұрын

Thank you very much for this great tutorial! The best on KZbin

@simonwinkler2015 4 жыл бұрын

Never looked at it from that perspective, thanks man.

@MdFarhan-gh7ez 2 жыл бұрын

Great video!! How do we install pytorch geometric packages for cuda version 11.3? I see that the colab notebook in my system is showing the version 11.3. Thanks!

@DeepFindr 2 жыл бұрын

Simply select your version here and give it a try: pytorch-geometric.readthedocs.io/en/latest/notes/installation.html If you have conda installed on colab, things have become much easier with PyG. :)

@hizircanbayram9898 3 жыл бұрын

Fantastic video! Thanks for this hands-on video. Subscribed!

@TorjusNilsen-bs5st 3 жыл бұрын

Softmax was used in the final layer but from Pytorchs page on Crossentropy loss "criterion combines LogSoftmax and NLLLoss in one single class" doesn't this mean we should skip the F.softmax in our forward pass as it's already included in the loss function? Also, in this setting all data was input at once. If the graph + featurenodes are too large for this, is it possible to train using batches of smaller graphs/adjacency matrices? Thank you for a good intro to GCNs!

@DeepFindr 3 жыл бұрын

Hi! Yes at some point softmax was included into cross entropy loss, I didn't know it when I made this video. So it's not needed anymore :) If you have a large graph that doesn't fit into memory, there are generally two approaches: 1. Use sampling (checkout cluster GCN or neighbor sampler in PyG) - this is the recommended way 2. Divide your graph into a couple of smaller graphs if possible - batching is supported by Pytorch geometric. However I think an implementation for a shared edge matrix is not available. Hope that helps

@TorjusNilsen-bs5st 3 жыл бұрын

@@DeepFindr Yes thanks! I have a project with high feature image data, and a way to generate a graph representation from the images, in a semi-supervised problem. Wanted to try graph convolutions (out of interest, of course not much can compare to CNNs for computervision) but realized GCNs lack generalization across graphs/laplacians for batching, so currently trying if it can work for node classification (pixels) with batched images. Do you think it could be possible to train a CNN supervised to create some kind of embeddings with a graph structure and use that to classify nodes semi-supervised? Sry for a very general question but I haven't been able to find a lot of discussion on this as GCNs are relatively new.

@DeepFindr 3 жыл бұрын

Hi, if I understood you correctly you try to convert an image to a graph. You might want to check out that blog post which also talks about graphs in computer vision: medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d If you want to use node embeddings for predictions you maybe could use a couple of CNN layers on fixed grids of an image, and then use the corresponding grids as nodes and the activations as node features. E.g. Divide your image into a 5x5 grid and then apply the filters only on each section of the grid. Then you would have 25 nodes which are connected according to the neighboring in the grid. And then you could perform message passing to learn about the rest of the image. I don't know :D actually just a trivial idea. What classification would you perform on each of the nodes? Object detection?

@TorjusNilsen-bs5st 3 жыл бұрын

@@DeepFindr Haha nono I appreciate it :D My problem is semantic segmentation, i.e. classifying each pixel of an image. I was however interested in GCNs so wanted to try including it. The two possible ways I considered was: 1. Converting images to a graph directly(of which i have a method), and training a GCN net similar to CNN. Problem here is each image will have a different adjacency/graph Laplacian and learning on one subgraph/image might not generalize well to others so it could be hard/impossible to train weights across batches. 2. Similar to what u mentioned, i considered training some CNN layers supervised and trying to create some graph structure from their embeddings (or superpixels) to see if it can somehow boost learning.

@DeepFindr 3 жыл бұрын

@@TorjusNilsen-bs5st hi, yes the first approach sounds a bit sparse to me. But regarding different adjacency matrices, I wouldn't consider this a problem. Because if you use molecules for instance, there might be ones with hundreds of atoms and very different adjacency matrices. But what would be the node feature information for the first case? The 3 color channels? Would you treat each pixel as a node? Thanks :)

@baklavatv4981 Ай бұрын

Can we do the same also for a GIN? Just with changing the Gcnconv with Ginconv?

@i_shy Жыл бұрын

Hello, is normalization of node features necessary?

@aaff8573 Жыл бұрын

Can GNN explainer capable of handling heterogeneous graph structures?

@El_Pancho_Alvarez 3 жыл бұрын

This is wonderful! Could you make a video of node classification for fraud detection?

@DeepFindr 3 жыл бұрын

Hi! Thanks :) sounds interesting. Sure can do that. However there are a couple of vids on the list I first need to finish :) Do you know a specific dataset for the fraud detection? Best

@JinoR-o6h Жыл бұрын

great video ! how do i then convert these predictions back to a graph to visualize it?

@nerdinvestdor 3 жыл бұрын

Hi the idea of sending the complete graph with some masked nodes is what they say transductive setting ? It’s safe to assume that if we need another graph we need to train the whole thing again?

@DeepFindr 3 жыл бұрын

Hi! Generally larger graphs can only be trained in a transductive setting. There exist other layer types like GraphSAGE that are also able to train inductively. If you have many smaller graphs you can of course also train them inductively. I've also seen inductive variants of GAT for example. So there are certainly workarounds :)

@YeeYes 2 жыл бұрын

Wonderful Video! I have a question, could you please tell me why train mask is very small compare to test mask?

@DeepFindr 2 жыл бұрын

Hi! The size should always be the same. Or do you mean the number of ones in the mask?

@DeepFindr 2 жыл бұрын

How many ones are there in each of the masks?

@riyajatar6859 2 жыл бұрын

Thanks for such a nice explanation. Could you also make some videos and notebook on inductive graph Neural Network. That would be great fun

@DeepFindr 2 жыл бұрын

I'll note it down :) thanks!

@emreipek4485 9 ай бұрын

Hello sir. I have a question. In CNN training, Convolutional layers (kernels) is trained while training in order to provide better feature extraction of convolutional layers. Is GCNConv layer being trained while training? As I understand, GCNConv layers just provide passing messages of nodes and aggregations. In this way, there is no parameter to train in GCNConv, am I wrong? I guess, only linear layers is trained over training. Can you enlighten me please?

@DeepFindr 9 ай бұрын

Hi, there are also weights in GCNConv. For more details on this have a look at the Graph Attention video on my channel, where I discuss this in detail :)

@emreipek4485 9 ай бұрын

@@DeepFindr Thank you sir :)

@bryancc2012 4 жыл бұрын

detailed content! really appreciated. this is still more like supervised training, with the existing class information to train and predict. if we only have citation information and the bag of word vector, (no class information at all), can we cluster the papers?

@DeepFindr 4 жыл бұрын

Thanks! Yes this particular video is supervised. In the unsupervised situation we don't have the classes. Therefore another approach for optimizing the embeddings is required. Probably you are already familiar with these papers: paperswithcode.com/task/graph-clustering. The first one (Attributed Graph Clustering via Adaptive Graph Convolution) is actually also applied on the Cora dataset, where they try to partition the nodes into clusters. They use a graph filters / a k-order graph convolution, so basically another (unsupervised) layer type to generate the embeddings. These embeddings are then used to perform the clustering. I hope this is what you where looking for :) You probably will have to create the new layer by yourself, if you need help I can also make a video on that!

@cat-cu1cx 2 жыл бұрын

Thank you for this series! When you say 75% nodes are predicted correctly, that only includes within the 140 labelled datasets right? As we have no way to validate the rest of the nodes?

@DeepFindr 2 жыл бұрын

Yes exactly :)

@trangquyen4307 11 ай бұрын

Hi can you show how to save best model and predict it on totally new test set

@santiagoinfantino2368 3 жыл бұрын

Ohh dude thanks for posting this videos, i'll use this knowledge for my research

@santiagoinfantino2368 3 жыл бұрын

How are the BoW embeddings calculated here? I have my own dataset and would like to know how to create the embeddings for the node features :D

@DeepFindr 3 жыл бұрын

Happy it helps :) In that dataset they are pre-calculated, so I don't know in detail. But I would assume they are just based on word counts. So basically search your text and count the occurrence of every word. These counts represent your feature vectors then. Is that what you are looking for? :D

@juanete69 18 күн бұрын

Why is it "x" and not "self.x" ? And why self.training and not training?

@kk008 Жыл бұрын

now if I want to get deep dive into these predictions to understand the decision making process, which GNN explanation model should I use or any specific technique?

@DeepFindr Жыл бұрын

Hi, I have some videos on XAI on graphs. I would look into GNN explainer as it is already implemented in Pytorch geometric. There are some other libraries as well, such as dive into graphs

@kk008 Жыл бұрын

@DeepFindr thank u so much. I will also look into ur XAI videos. I'm really grateful.

@kk008 Жыл бұрын

I have mailed you with some queries also.

@chandrasutrisno 2 жыл бұрын

Does GNN sensitive to imbalanced dataset?

@DeepFindr 2 жыл бұрын

Yes, it's the same as with all other neural networks. But you can try weighted loss functions to improve the learning.

@wilfredomartel7781 Жыл бұрын

Nice explication!

@alizindari4044 3 жыл бұрын

Hi thanks for your great series on GNN. just a question. I have a case that I don't need the label for each node. i want to get the final feature vector of each node after training (in both training and test nodes) after applying forward propagation. can you please tell me how to do it? because I need to compare these new vectors with feature vectors before training.

@DeepFindr 3 жыл бұрын

Hi :) do you mean that you want to access the learned representations instead of the actual prediction? In the forward function you can simply return the prediction and the propagated node embeddings. So basically calling your model would return pred, embedding = mymodel(inputs) Did I understand it correctly? :)

@DeepFindr 3 жыл бұрын

This way you can still train your model with the predictions and the loss function but also have access to the final embeddings

@alizindari4044 3 жыл бұрын

@@DeepFindr HI I tried the code you said but it just returned one argument that is the final prediction for each node in the shape of (2708,7) but the final embedding should be (2708,1433). am I making mistake?

@DeepFindr 3 жыл бұрын

@@alizindari4044 hi! Can you please send me a screenshot of the code to deepfindr@gmail.com? :) I'll reply via mail then. Thanks!

@johanaabizmil955 3 жыл бұрын

Very good explanations ! If I want to try it on my own text dataset I have to build a graph with geometric.transforms right? Do you know a doc / tuto for that ?

@DeepFindr 3 жыл бұрын

Hi! In the documentation there is one page how to create your own dataset: pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html Is this what you are looking for? :)

@johanaabizmil955 3 жыл бұрын

@@DeepFindr Yes it is helping me but I still don't understand where I can specify that I want a word as a node for example

@DeepFindr 3 жыл бұрын

I also found this github project. Take a look at the python notebooks, he is also creating custom datasets there. github.com/khuangaf/PyTorch-Geometric-YooChoose If it doesn't help, let me know and I will make a video on how to do that :)

@DeepFindr 3 жыл бұрын

Especially take a look at the YouChooseDataset he creates :)

@johanaabizmil955 3 жыл бұрын

@@DeepFindr Thanks I let you know :)

@JorGe-eu3wi 3 жыл бұрын

Great video! Can you show what the code would look like when using a dataset with more than one graph and how to use certain graphs only for training and others only for testing. Thank you!

@DeepFindr 3 жыл бұрын

Hi :) in part 3 of the GNN series I uploaded I show an example for molecules. Hope that helps!

@jff711 3 жыл бұрын

Very nice video, thank you!

@AI_ML_DL_LLM 3 жыл бұрын

Hi sir, when you mask out node A from the training, does it mean that other nodes will not see it at all in the training and massage passing (both features and label of node A are totally ignored, as if it doesn't exist)? or the features of node A will be passed on to other nodes but in the loss function the label of the node A will be ignored? thank you again

@DeepFindr 3 жыл бұрын

Hi! The mask is just applied to the loss. This is also called transductive learning, because the model has seen all node embeddings during training. If you want to train inductively, you need to use layers like GraphSAGE which will sample parts of the graph :) Best regards

@AI_ML_DL_LLM 3 жыл бұрын

@@DeepFindr many thanks, it was a great help

@itprime2399 2 жыл бұрын

First of all, thank you so much Sir for such an amazing lecture on GNN. Sir, I want to extract every class data in the form of a graph after node classification. How can I do that?

@DeepFindr 2 жыл бұрын

You mean how to convert your data to a graph dataset?

@umarmalhi3072 4 жыл бұрын

Thanks Man. it really help me!

@jonimatix 3 жыл бұрын

Im really enjoying learning about GNNs, so thanks a lot for your video, so keep posting :) One question, how is pytorch geometric different from GraphSAGE?

@DeepFindr 3 жыл бұрын

Thanks for your feedback! GraphSAGE is just one implementation of a GNN layer. It's especially intended for larger graphs as it uses sampling while training the model. Besides GraphSAGE there are many other layer types like Graph Conv, Graph Attention, Graph Isomorphosm... All of those are available in Pytorch geometric including many more, and GraphSAGE as well (it's called SageConv in PyG). Hope that answers the question :)

@jonimatix 3 жыл бұрын

@@DeepFindr Yes thanks a lot! Do you think you can create a tutorial on how to create a GNN from scratch (including data set preparation, etc.) for product recommendations? That would be really helpful to further understand and apply a problem to solve with GNN. Thanks

@DeepFindr 3 жыл бұрын

Hi:) actually something similar is already there. In my video series GNN Project I talk about all of this e.g. Dataset creation in part 2. But it's not for product recommendations unfortunately :/

@Fdutchman 2 жыл бұрын

Danke schön :)

@awadelrahman 3 жыл бұрын

at 3:30 I think you meant bidirectionally instead of uni-directionally?

@DeepFindr 3 жыл бұрын

Oh yes, absolutely :) I see you are a very attentive viewer, thanks!!

@awadelrahman 3 жыл бұрын

@@DeepFindr Great videos ! this is why :)

@lolokay7044 3 жыл бұрын

Thank you so much for this, super helpful. Is there one on graph regression in gnn as well?

@DeepFindr 3 жыл бұрын

Hi! I haven't uploaded a video on graph regression yet, but changing from classification to regression is very simple. The GNN architecture stays the same, only the output layer needs to change (e.g. A single value per node) and you need to adjust the loss function to e.g. MSE and that's it :)

@lolokay7044 3 жыл бұрын

@@DeepFindr Oh, thank you! Do you have any particular dataset in mind that I can work on for graph regression?

@DeepFindr 3 жыл бұрын

Unfortunately there is no node-level regression dataset available in PyG and other libraries :/ If you want to perform regression in order to "practice" you can of course take any binary node-classification dataset and simply treat the class labels (0 and 1) as float values and perform node-level regression. Another thing are for example traffic networks (predicting the travel speed for nodes), which is also a regression problem. It however also requires to consider a temporal component. I'm currently working on a video on that. :)

@lolokay7044 3 жыл бұрын

@@DeepFindr Thank you so much for the replies, really appreciate your work. Looking forward to the video!

@Missbeautyqueeb 2 жыл бұрын

How would you calculate the AUC score for this one? and plot a ROC curve? So for a node classification GNN

@DeepFindr 2 жыл бұрын

Hi, the easiest way is convert the predictions and labels for each node of interest (after applying the test mask) to two arrays and feed them to sklearn's ROC-score or something like that. Let me know if this helps :)

@peterstrom8522 4 жыл бұрын

Nice! Thanks!

@DeepFindr 4 жыл бұрын

Sure, no problem. I hope this is what you were looking for :)

@giannismanousaridis4010 3 жыл бұрын

In GCN class after calling super() you 've written "torch.manual_seed(42) " is this used ?

@DeepFindr 3 жыл бұрын

Hi! Yes I would say so (implicitly). Whenever there are random processes (like random weight initialization in a neural network) the seed ensures that the numbers are reproducible.

@DeepFindr 3 жыл бұрын

You could also place this after import torch :)

@giannismanousaridis4010 3 жыл бұрын

@@DeepFindr thanks for the explanation

@ry_zakari 3 жыл бұрын

@@DeepFindr Hello, Thank you so much for these wonderful tutorials, I would like to talk to you in private; I'm currently a PhD student: my mail id: rufaig6@gmail.com

@inFamous16 2 жыл бұрын

Hey, Can you please make a video on how to perform Node Classification on custom datasets? I currently have a csv file containing source_node, relation and destination_node. How to generate node_vector for the custom datasets?

@DeepFindr 2 жыл бұрын

Hi! I have to recent videos on that topic - how to convert a tabular dataset to a graph. :)

@inFamous16 2 жыл бұрын

@@DeepFindr I have loaded csv into Neo4j and able to access KG. But not understanding how to generate node_vector? As CORA dataset is provided with node_vectors, which is generated using BOW model. But isn't there any way to calculate node_vector for custom KG when we don't have any dictionary to generate BOW?

@DeepFindr 2 жыл бұрын

Each of the elements in your Knowledge graph should have attributes - out of which you can build the node vector. If they don't have properties you can't build node features. The BOW nodefeatures are based on the publications in Cora. Therefore those are also simply properties of the nodes

@inFamous16 2 жыл бұрын

@@DeepFindr ok sir.. thank you for your valuable time

@siddhantmathur7319 3 жыл бұрын

Hey, I have a conventional csv training dataset. Can u tell me how to integrate it with pytorch_geometric so that rather than using the features of already present datasets within the library, I can create my own custom datasets and apply GNN on them for further classification ?

@DeepFindr 3 жыл бұрын

Hi! On the documentation website there is also a section on how to create a custom dataset: pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html Also you would need to connect your data points in the csv somehow. That means you need to create the adjacency information. Let me know if you need further help :)

@siddhantmathur7319 3 жыл бұрын

@@DeepFindr Exactly! I am not able to connect the data points in csv to produce further adjacency info like data edges, attributes etc so that i can further load them in the format from torch_geometric.data import Data, DataLoader data_list = [Data(...), ..., Data(...)] loader = DataLoader(data_list, batch_size=32)

@DeepFindr 3 жыл бұрын

@@siddhantmathur7319 I think I should make a video on this in the future. Further down is another comment with the same question - I linked an example repository there in which they go from table to pytorch geometric dataset. Hope that helps :) otherwise let me know and I will make a video in the next couple of weeks!

@ShaliniRamnath 3 жыл бұрын

Thank you BOSS !!

@김은송-u1n 3 жыл бұрын

Thanks so much!

@mhadnanali 3 жыл бұрын

Make playlists, please. I visited your channel and the videos are scattered. You will get more views.

@DeepFindr 3 жыл бұрын

Actually I have Playlists for the different topics. Somehow the one for GNNs doesn't appear. I have to check. Thanks for the hint

@mhadnanali 3 жыл бұрын

@@DeepFindr that would be nice. its difficult to keep track. sometimes you mention in the video "I explained in the previous video" and viewers have no idea that which one was the previous video. Maybe you can add links in the description. Your content is great. I hope you get more views and be motivated.

@DeepFindr 3 жыл бұрын

Hi! Sure I understand that :) Yep I should work a bit on the descriptions.

@DeepFindr 3 жыл бұрын

I just realized that the GNN Playlist was set to private. So now it should be in chronological order. Thanks again for your comment

@mhadnanali 3 жыл бұрын

@@DeepFindr I just saw. that seems better. Thank you for the content and your work. more power to you.

@البداية-ذ1ذ 3 жыл бұрын

Hi again ,could you please guide me how can i convert image to graph representation, using python ,i used by matlab and i used sift to extract feature and another functor for extract edges .i have lunges data set and i want to extract it as graph ,my target to study the spread of deases in this area either erosion or dilation, i really appreaciate you alot in advance🙂

@DeepFindr 3 жыл бұрын

Hi. I have never done that before but intuitively I would write a function that creates a custom dataset for Pytorch. Here you would pass #pixles times the extracted node features as the node feature matrix. That means you re-arrange the shape of your image and add the node features. So x would have a shape of [pixles, node features]. Then you have to add the connectivity information somehow, the only way I can think of is to connect each pixel to all the surrounding pixels. For pytorch geometric you can create a COO format list of connection pairs. Best regards :)

@البداية-ذ1ذ 3 жыл бұрын

@@DeepFindr thanks alot for your quick respond,i have being learned from you alot.regarding to your answer,do you mean you dont normally used images as graph representation or you just used build in database .becouse am searching for images for different test experiment either medical or normal .for that i think to convert the rgb images to graph.

@DeepFindr 3 жыл бұрын

@@البداية-ذ1ذ I meant that I never tried to convert images to graphs. Maybe you should try to use Convolutional Neural Networks instead, as they are better suited for images? Or do you want to do some fancy stuff with graphs? :D

@البداية-ذ1ذ 3 жыл бұрын

@@DeepFindr this is the idea to see how graph with behave with cnn .thanks again

@DeepFindr 3 жыл бұрын

@@البداية-ذ1ذ ok yeah sure. Sounds interesting :) but it should be pretty easy to convert the images to graphs. You just create a feature vector for each Pixel and this vector could for instance contain the color values (RGB). For the Adjacency info you then simply take the surrounding 9 pixels (with exceptions for the borders). So you have a node feature vector of size [# pixels x Node features (RGB)] and an adjacency matrix of size [# pixels x # pixels]. I would write a function im2graph or something that does this for each of the images. Cool project! I would be interested to hear about the result :)