Hi Mashaan, should be noted that using torch.max with dim=-1 will give you the max of the row data since the end data represent the sequence of row data. It's quite different with the order when indexing in tensor e.g. Tensor([[0,1],[2,3]])[0,1] -> means row 0, col 1 that seems more intuitive. The same with torch.mean. * referring to the early explanation of plot attention function segment
@mashaan148 күн бұрын
you're right, in 30:30 I said "the maximum along the column dimension", actually it takes the maximum along the row dimension.
@honglu67920 күн бұрын
Man, you did a great job digging into the code details and also put in your own thoughts. I usually dont leave a comment, but your video is way way better than those ones that claims to teach something complicated in 10 or 15 mins with random visualization. One suggestion, maybe you could do a video on the code analysis of metaAI omnivore and omniMAE, they are extensions of swintransformer but support both video and image.
@mashaan1420 күн бұрын
I'm so glad that you liked the video. Thanks for suggesting these two papers. I'll definitely look into those. The thing is I'm recording two videos on an entirely different topic. It might take me a while before getting back to vision transformers.
@inquisitiverakib584429 күн бұрын
great! Can you make a separate video on depth-wise and point-wise convolution ?
@mashaan1429 күн бұрын
Thanks for your suggestion. I'll add it to my to do list. The thing is I'm recording two videos on an entirely different topic.
@0兒-y4cАй бұрын
Hi sir Im a student who is studying on it i would like to use swin transformer on object detection from my project how can i accomplish thank you sir
@mashaan14Ай бұрын
Usually, an image classification model is used at the beginning of object detection pipeline, and it’s called backbone. Most object detection pipelines use ResNet as backbone. I assume that you want to replace ResNet with Swin, just like what they did in the paper (section 4.2). If that’s the case, your best option is to use MMDetection library. They already included Swin as a backbone on their github: github.com/open-mmlab/mmdetection/blob/cfd5d3a985b0249de009b67d04f37263e11cdf3d/mmdet/models/backbones/swin.py
@pradyumagarwal3978Ай бұрын
you said the 200 epochs test u ran is not a proper wxperiment to judge the quality of this transformer architecture. So other than increasing the c vaue back to 96, what other things should I look into to experiment and get the best performance out of this architecture
@mashaan14Ай бұрын
The settings I used in the video were simple just to have a taste of this transformer. In my opinion, a proper experiment would be replicating the results in the paper on ImageNet-1K dataset (the ones in table 1). This way we can judge the model, and then look for improvement.
@pradyumagarwal3978Ай бұрын
@@mashaan14 im sorry. which table? (Also, big thanks your videos and replies have been a big help. however, any chance I can ask somewhere more convenient than yt comments?)
@mashaan14Ай бұрын
Table 1 on page 6 of swin transformer paper. You can text me on twitter or linkedin, whichever is convenient to you. twitter.com/mashaan_14 linkedin.com/in/mashaan
@pradyumagarwal3978Ай бұрын
@@mashaan14 Okay thanks
@pradyumagarwal3978Ай бұрын
you had set the patch size to 4 and widow size to 6. c went from 96 to 48. what were default values for patch size and window size? also, this model does not have any positional encoding right? the regular sin-cos positional encoding or better, adding relative positional encoding is possible for improvement right?
@mashaan14Ай бұрын
The defaults were patch size=4 and window size=7. You can find all default settings in: github.com/microsoft/Swin-Transformer/blob/main/config.py You’re right, there are no positional encoding in this model. The model uses (Relative position bias) and they said, “We observe significant improvements over counterparts without this bias term or that use absolute position embedding”. They added the (Relative position bias) inside WindowAttention class: github.com/microsoft/Swin-Transformer/blob/f82860bfb5225915aca09c3227159ee9e1df874d/models/swin_transformer.py#L101
@pradyumagarwal3978Ай бұрын
is the notebook where you test the model with C = 48 and 200 epochs available somewhere, I would really like to check it out
@mashaan14Ай бұрын
here you go: github.com/mashaan14/KZbin-channel/blob/main/notebooks/2024_08_19_swin_transformer.ipynb
@pradyumagarwal3978Ай бұрын
@@mashaan14 thankssss
@mashaan14Ай бұрын
Hi everyone 👋 It’s been a while since I post this video, and it’s time to reflect back. First, there are multiple ways to visualize attention in vision transformers. This paper (“Transformer Interpretability Beyond Attention Visualization”, arxiv.org/abs/2012.09838v1 ) compared different visualization methods. What I did in this video is just testing the attention at the first layer by feeding a test image and pulling out the response, which are the query, key, and value matrices. Multiplying the query matrix with the transposed key matrix gives us a squared matrix showing how each patch is “paying attention” to every other patch. If we order the patches back to their positions in the original image, we’ll see which patches have the highest attention values. I updated the code by adding more comments and printouts, just to make it more readable.
@tylervandermateАй бұрын
This is exactly what I've been trying to find for model visualization. Thank you! It's difficult finding any in-depth info on transformers involving the query, key, and value matrices.
@mashaan14Ай бұрын
Thank you, that's great to hear. I'm currently working on a Swin transformer visualization video. Hopefully, I'll post it in a week or so.
@petermerrill9803Ай бұрын
Excellent explanations. Thank you very much.
@mashaan14Ай бұрын
Glad it was helpful!
@akashprajapathi60562 ай бұрын
Of course
@yasir_rashid77803 ай бұрын
Great video sir with lot of information ... Sir can we use GCN for idenitfying influential nodes in social networks
@mashaan143 ай бұрын
Thanks for your question. It made me go and dig a little bit. I read a couple of papers and I found out that GCN can’t be used alone to find influential nodes. However, it can be used as an introductory step to find influential nodes. For example, in a paper named: “Finding Critical Users in Social Communities via Graph Convolutions” the authors used GCN before graph attention to learn “criticalness” over nodes. By criticalness, they mean influential nodes. Another paper called: “SocialGCN: An Efficient Graph Convolutional Network based Model for Social Recommendation” arxiv.org/pdf/1811.02815 They built a recommender system based on GCN embeddings not the feature matrix. They stated that “the proposed SocialGCN model is flexible when the user and item attributes are not available”, which makes sense! because if a node does not have features, GCN will assign embeddings similar to its neighbors.
@mashaan144 ай бұрын
Mistakes in the video (the ones that I know about 😅): - I changed the implementation for the neighbor_sampler function. It previously takes one minute to sample from Cora dataset. Now, it samples in around 10 seconds. That’s because I removed the unique function and used sparse matrices indexing. The link in the description has the new code. - In 32:54 I said that the probability of picking an edge is 1. That's actually not true, it's 1/|E|, where |E| is the number of edges in the graph. I fixed it on my github notes.
@Savi_Ann4 ай бұрын
Nice visualization!
@oneplus3834 ай бұрын
Short but informative
@AbdulQadeerRasooli-l8k4 ай бұрын
Hello sir if possible please make a video related to the practical work of this paper Paper Title: GCN-FFNN: A two-stream deep model for learning solution to partial differential equations
@mashaan144 ай бұрын
Thanks for your suggestion. I skimmed though the code on github. I think the main contribution is class Ensemble(), which you can find in (models.py). That class concatenates the outputs from GCN and FFNN. I’ll try to fit the paper in one of my upcoming videos.
@tomoki-v6o5 ай бұрын
great trick . i would use (X-XT)**2+(Y+YT)**2
@mashaan144 ай бұрын
That's the beauty of coding, it can be done in different ways..
@mashaan145 ай бұрын
Code walkthrough: kzbin.info/www/bejne/r2K9noCZgr6dobs You can access the notebook in github: github.com/mashaan14/VisionTransformer-MNIST/blob/main/VisionTransformer_MNIST.ipynb
@mashaan145 ай бұрын
After playing with jax I don’t feel comfortable linking the notebook I showed in the video. Most of the video content is still valid because it shows the difference between pyg and jraph. However, in the notebook I used haiku which is not recommended by Google DeepMind. They recommend using flax instead. So, I linked a new notebook showing GCN code in JAX/Flax: github.com/mashaan14/KZbin-channel/blob/main/notebooks/2024_03_21_jraph_GCN.ipynb Here’s another video where I explained graph attention code in JAX/Flax: kzbin.info/www/bejne/hWLdeIqDesyKbaM
@mehmeterenbulut60765 ай бұрын
Hi man, beautiful video explaining both libraries! Loved your explanation; clear and on point. About the issue that the test results of PyG and Jraph being different, I think it is because even though both obtained 100% training accuracy (which also means they overfitted the data), the decision boundary they draw for the training set is not necessarily the same. One reason that might lead them to be different is that PyG's and Jraph's GNN weights are probably initialized randomly. Therefore, their different decision boundaries can easily result in 2 different results on the test set.
@mashaan145 ай бұрын
I loved your explanation, yeah it totally makes sense. But it was far simpler than that, I was training on two different feature matrices. If you notice in the jraph part, I passed this command: nodes=jnp.eye(data_Cora.x.shape[0]) I was training jraph on the identity matrix while training pyg on the feature matrix. I know it’s crazy how jraph got so close with only the identity matrix. Anyways, I couldn’t fix the notebook in the video because it was written in haiku. So I took it down and write a new one with JAX/Flax: github.com/mashaan14/KZbin-channel/blob/main/notebooks/2024_03_21_jraph_GCN.ipynb I’d love if you can take a look at the new code.
@mashaan145 ай бұрын
There was a mistake in the video. I accidentally used the identity matrix instead of the feature matrix when I packaged the graph in a jraph.GraphsTuple. This line of code: nodes=jnp.eye(data_Cora.x.shape[0]), should be changed to: nodes=jnp.asarray(data_Cora.x), I fixed it on github and the notebook should work fine.
@oneplus3835 ай бұрын
Very good explanation. 😊
@mashaan145 ай бұрын
Glad it was helpful!
@doublesami5 ай бұрын
very informative, Can you please make a video on vision mamba or Vmamba and explain the theoretical as well as the implementation part ? Looking forward
@mashaan145 ай бұрын
Thanks, I just checked VMamba on github. Sure I'll add it to my todo list. The thing is I'm recording a series on graph neural networks. Once I'm done with that, I'll get back to vision transformers.
@oneplus3835 ай бұрын
How can we convert pytoch model into jax?
@mashaan145 ай бұрын
actually I'm not aware of any tool that can convert pytorch to jax.
@oneplus3835 ай бұрын
@@mashaan14 can you give me some idea about what can I change to run PyTorch to jax
@mashaan145 ай бұрын
@@oneplus383 Instead of torch tensors you have to use jax.numpy arrays. For each pytorch layer, check its equivalent in jax documentation. Start with something small and build your way up. Here's how to code a simple NN in jax kzbin.info/www/bejne/fX-vgJRqp86sqZo
@oneplus3835 ай бұрын
Thansk mashaan brother
@oneplus3835 ай бұрын
You live in United States?
@nill5136 ай бұрын
Kudos to you, sir!
@mashaan146 ай бұрын
happy to help..
@Sridhar.SubramanianMtech20236 ай бұрын
Thank you for this insightful video 😊
@dossantos44156 ай бұрын
Could you the same but for NLP
@mashaan146 ай бұрын
I guess you want maps similar to the ones in this paper: Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate, 2014 If that's the case, I'm sorry I'm not familiar with that topic.
@vivekdabholkar59656 ай бұрын
Mashaan, Very proud of you, your accomplishments, and sharing of knowledge!
@mashaan146 ай бұрын
thanks Vivek.. great to reconnect with you. It’s been a long time!!
@Falconoo73836 ай бұрын
Very informative video...
@mashaan146 ай бұрын
Glad you liked it
@Falconoo73836 ай бұрын
Thank you for the informative video to explain softmax. Can you please explain how to install manim easily?
@mashaan146 ай бұрын
I’m installing manim into google colab using the command: !pip install manim You can check the code I used to make this video: github.com/mashaan14/manim/blob/main/manim_visualizeSoftmax.ipynb
@Falconoo73836 ай бұрын
@@mashaan14 Thank you. Last time, I don't know why but when i used its not working properly. Let me check again.
@alexanderzikal72447 ай бұрын
Thank You, I tried it out! 1 mistake I found: Inside the class ToyDataset the function "def make_nested_classes(self):" has wrong brackets by n_samples=(int(self.size*0.6), int(self.size*0.2)) on github.
@mashaan147 ай бұрын
Thanks for bringing it up. You're right. It should be square brackets like this: n_samples=[int(self.size*0.6), int(self.size*0.2)]
@alexanderzikal72447 ай бұрын
A Tupel is needed -> round brackets, then all works fine.
@mashaan147 ай бұрын
I just checked it on scikit-learn . It does need a tuple as an input. That’s strange!! when I run it on colab, it didn’t throw an error. Anyways, I’ll fix it on github.
@oneplus3838 ай бұрын
salam alaikum brother Mashan I am a student and I have found your tutorials via Pytorch Geometric, Since you are perfect in Graph Neural networks I wanna ask you something and hope I will find you kind in this regard, I wanna learn Graph neural networks using Jraph library would you please let me know about, How Can I learn it quickly.
@mashaan148 ай бұрын
وعليكم السلام، To be honest I just heard about Jraph library from you. I’m trying to learn Jax, but I haven’t made any tutorials using Jax yet. Anyway, to learn Jraph, I suggest starting with something that you already know the outcome, for example a graph with 10 nodes. I’ll make sure to put Jraph on my todo list ✅.
@oneplus3838 ай бұрын
@@mashaan14 brother I appreciate your reply. Can you suggest me steps or things to study to get into the Graph neural network in short terms.
@mashaan148 ай бұрын
@@oneplus383I think Stanford CS224W is good way to start: kzbin.info/www/bejne/gHKlkKOin5elmKM
@khanfor9 ай бұрын
Great explaination Mashaan.
@mashaan149 ай бұрын
thank you 🙏
@mohsinaljoaithen23439 ай бұрын
Very informative, keep the good work.
@mashaan149 ай бұрын
thank you 🙏
@Islam_peacefull_Religion9 ай бұрын
🥲i wish I could have GPU in my lappi(laptop)
@mashaan149 ай бұрын
You can start playing with GPUs in google colab. But they have limitations on overusing GPUs.
@aboudramanediarra70869 ай бұрын
Hello, thank you very much for this beautiful presentation. I work with image data. I'd like to represent them in the form of a graph and predict the links between the pixels. Can you help me with some ideas or a piece of source code? Thanks in advance.
@mashaan149 ай бұрын
I think you need to define some similarities between image pixels. For example, color and position similarities. Those similarities will serve as graph edges. Here’s a good reference on modeling image data as graphs, it summarizes a decade long of research: Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2010). Contour detection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence, 33(5), 898-916.
@darpan26484 ай бұрын
Thank you..helpful video. Can i use GCN for some biological dataset? How it give results,pls suggest and help
@mashaan144 ай бұрын
@@darpan2648 GCN can be used wherever you got a feature matrix X and an adjacency matrix A. For example, one of the most used datasets in GCN is Cora dataset, which contains a set of documents. The words inside the documents represent the feature matrix X. The citation links between the documents represent the adjacency matrix A. So, before running GCN on the biological dataset, you need to identify what are the feature and adjacency matrices.
@guanenteng487010 ай бұрын
thank you. i hope you can create tutorial for spatial temporal data.
@mashaan1410 ай бұрын
So glad it helped you. Thanks for your suggestion, I saw several papers using GCN for spatial temporal data. Hopefully, I can write a tutorial on one of these.
@bigjeffystyle701110 ай бұрын
Great video. The code is really easy to understand for this application. Thanks for taking the time to summarize and condense a ton of information. Q: Have you seen any implementations in PyTorch Lightning or with DataLoaders for larger applications at scale? I'm looking at using this for a larger production application and any additional sources or packages would be helpful.
@mashaan1410 ай бұрын
Glad it was helpful! Actually, I haven’t come across a GCN for large scale applications. But there’s an interesting algorithmic change to GCN. The authors claim it runs faster with the same accuracy. paper: proceedings.mlr.press/v97/wu19e/wu19e.pdf code: github.com/Tiiiger/SGC