Swin Transformer Attention
41:40
Swin Transformer Code
41:33
2 ай бұрын
Cluster GCN in JAX
53:47
3 ай бұрын
GNN Node Sampler in JAX
46:58
4 ай бұрын
Graph Attention Networks in DGL
12:15
Vision Transformer (ViT) in JAX
19:33
Graph Attention Networks in JAX
18:39
JAX Conv Layer
18:52
7 ай бұрын
Simple Neural Net in JAX
16:10
7 ай бұрын
Sparse Subspace Clustering (SSC)
18:06
GCN Variants: SGC and ASGC
10:48
8 ай бұрын
PyTorch Conv2d Explained
13:41
8 ай бұрын
DETR Object Detection
11:24
9 ай бұрын
Spectral Clustering Code
9:29
9 ай бұрын
CNN vs ViT: PyTorch Training
12:57
Simple Neural Net in PyTorch
11:43
10 ай бұрын
PyTorch code for GCN and SGC
9:15
10 ай бұрын
Perceptron Algorithm Code
9:04
10 ай бұрын
pytorch softmax function in manim
0:27
Пікірлер
@kevinkatsuradanisitanggang2243
@kevinkatsuradanisitanggang2243 8 күн бұрын
Hi Mashaan, should be noted that using torch.max with dim=-1 will give you the max of the row data since the end data represent the sequence of row data. It's quite different with the order when indexing in tensor e.g. Tensor([[0,1],[2,3]])[0,1] -> means row 0, col 1 that seems more intuitive. The same with torch.mean. * referring to the early explanation of plot attention function segment
@mashaan14
@mashaan14 8 күн бұрын
you're right, in 30:30 I said "the maximum along the column dimension", actually it takes the maximum along the row dimension.
@honglu679
@honglu679 20 күн бұрын
Man, you did a great job digging into the code details and also put in your own thoughts. I usually dont leave a comment, but your video is way way better than those ones that claims to teach something complicated in 10 or 15 mins with random visualization. One suggestion, maybe you could do a video on the code analysis of metaAI omnivore and omniMAE, they are extensions of swintransformer but support both video and image.
@mashaan14
@mashaan14 20 күн бұрын
I'm so glad that you liked the video. Thanks for suggesting these two papers. I'll definitely look into those. The thing is I'm recording two videos on an entirely different topic. It might take me a while before getting back to vision transformers.
@inquisitiverakib5844
@inquisitiverakib5844 29 күн бұрын
great! Can you make a separate video on depth-wise and point-wise convolution ?
@mashaan14
@mashaan14 29 күн бұрын
Thanks for your suggestion. I'll add it to my to do list. The thing is I'm recording two videos on an entirely different topic.
@0兒-y4c
@0兒-y4c Ай бұрын
Hi sir Im a student who is studying on it i would like to use swin transformer on object detection from my project how can i accomplish thank you sir
@mashaan14
@mashaan14 Ай бұрын
Usually, an image classification model is used at the beginning of object detection pipeline, and it’s called backbone. Most object detection pipelines use ResNet as backbone. I assume that you want to replace ResNet with Swin, just like what they did in the paper (section 4.2). If that’s the case, your best option is to use MMDetection library. They already included Swin as a backbone on their github: github.com/open-mmlab/mmdetection/blob/cfd5d3a985b0249de009b67d04f37263e11cdf3d/mmdet/models/backbones/swin.py
@pradyumagarwal3978
@pradyumagarwal3978 Ай бұрын
you said the 200 epochs test u ran is not a proper wxperiment to judge the quality of this transformer architecture. So other than increasing the c vaue back to 96, what other things should I look into to experiment and get the best performance out of this architecture
@mashaan14
@mashaan14 Ай бұрын
The settings I used in the video were simple just to have a taste of this transformer. In my opinion, a proper experiment would be replicating the results in the paper on ImageNet-1K dataset (the ones in table 1). This way we can judge the model, and then look for improvement.
@pradyumagarwal3978
@pradyumagarwal3978 Ай бұрын
@@mashaan14 im sorry. which table? (Also, big thanks your videos and replies have been a big help. however, any chance I can ask somewhere more convenient than yt comments?)
@mashaan14
@mashaan14 Ай бұрын
Table 1 on page 6 of swin transformer paper. You can text me on twitter or linkedin, whichever is convenient to you. twitter.com/mashaan_14 linkedin.com/in/mashaan
@pradyumagarwal3978
@pradyumagarwal3978 Ай бұрын
@@mashaan14 Okay thanks
@pradyumagarwal3978
@pradyumagarwal3978 Ай бұрын
you had set the patch size to 4 and widow size to 6. c went from 96 to 48. what were default values for patch size and window size? also, this model does not have any positional encoding right? the regular sin-cos positional encoding or better, adding relative positional encoding is possible for improvement right?
@mashaan14
@mashaan14 Ай бұрын
The defaults were patch size=4 and window size=7. You can find all default settings in: github.com/microsoft/Swin-Transformer/blob/main/config.py You’re right, there are no positional encoding in this model. The model uses (Relative position bias) and they said, “We observe significant improvements over counterparts without this bias term or that use absolute position embedding”. They added the (Relative position bias) inside WindowAttention class: github.com/microsoft/Swin-Transformer/blob/f82860bfb5225915aca09c3227159ee9e1df874d/models/swin_transformer.py#L101
@pradyumagarwal3978
@pradyumagarwal3978 Ай бұрын
is the notebook where you test the model with C = 48 and 200 epochs available somewhere, I would really like to check it out
@mashaan14
@mashaan14 Ай бұрын
here you go: github.com/mashaan14/KZbin-channel/blob/main/notebooks/2024_08_19_swin_transformer.ipynb
@pradyumagarwal3978
@pradyumagarwal3978 Ай бұрын
@@mashaan14 thankssss
@mashaan14
@mashaan14 Ай бұрын
Hi everyone 👋 It’s been a while since I post this video, and it’s time to reflect back. First, there are multiple ways to visualize attention in vision transformers. This paper (“Transformer Interpretability Beyond Attention Visualization”, arxiv.org/abs/2012.09838v1 ) compared different visualization methods. What I did in this video is just testing the attention at the first layer by feeding a test image and pulling out the response, which are the query, key, and value matrices. Multiplying the query matrix with the transposed key matrix gives us a squared matrix showing how each patch is “paying attention” to every other patch. If we order the patches back to their positions in the original image, we’ll see which patches have the highest attention values. I updated the code by adding more comments and printouts, just to make it more readable.
@tylervandermate
@tylervandermate Ай бұрын
This is exactly what I've been trying to find for model visualization. Thank you! It's difficult finding any in-depth info on transformers involving the query, key, and value matrices.
@mashaan14
@mashaan14 Ай бұрын
Thank you, that's great to hear. I'm currently working on a Swin transformer visualization video. Hopefully, I'll post it in a week or so.
@petermerrill9803
@petermerrill9803 Ай бұрын
Excellent explanations. Thank you very much.
@mashaan14
@mashaan14 Ай бұрын
Glad it was helpful!
@akashprajapathi6056
@akashprajapathi6056 2 ай бұрын
Of course
@yasir_rashid7780
@yasir_rashid7780 3 ай бұрын
Great video sir with lot of information ... Sir can we use GCN for idenitfying influential nodes in social networks
@mashaan14
@mashaan14 3 ай бұрын
Thanks for your question. It made me go and dig a little bit. I read a couple of papers and I found out that GCN can’t be used alone to find influential nodes. However, it can be used as an introductory step to find influential nodes. For example, in a paper named: “Finding Critical Users in Social Communities via Graph Convolutions” the authors used GCN before graph attention to learn “criticalness” over nodes. By criticalness, they mean influential nodes. Another paper called: “SocialGCN: An Efficient Graph Convolutional Network based Model for Social Recommendation” arxiv.org/pdf/1811.02815 They built a recommender system based on GCN embeddings not the feature matrix. They stated that “the proposed SocialGCN model is flexible when the user and item attributes are not available”, which makes sense! because if a node does not have features, GCN will assign embeddings similar to its neighbors.
@mashaan14
@mashaan14 4 ай бұрын
Mistakes in the video (the ones that I know about 😅): - I changed the implementation for the neighbor_sampler function. It previously takes one minute to sample from Cora dataset. Now, it samples in around 10 seconds. That’s because I removed the unique function and used sparse matrices indexing. The link in the description has the new code. - In 32:54 I said that the probability of picking an edge is 1. That's actually not true, it's 1/|E|, where |E| is the number of edges in the graph. I fixed it on my github notes.
@Savi_Ann
@Savi_Ann 4 ай бұрын
Nice visualization!
@oneplus383
@oneplus383 4 ай бұрын
Short but informative
@AbdulQadeerRasooli-l8k
@AbdulQadeerRasooli-l8k 4 ай бұрын
Hello sir if possible please make a video related to the practical work of this paper Paper Title: GCN-FFNN: A two-stream deep model for learning solution to partial differential equations
@mashaan14
@mashaan14 4 ай бұрын
Thanks for your suggestion. I skimmed though the code on github. I think the main contribution is class Ensemble(), which you can find in (models.py). That class concatenates the outputs from GCN and FFNN. I’ll try to fit the paper in one of my upcoming videos.
@tomoki-v6o
@tomoki-v6o 5 ай бұрын
great trick . i would use (X-XT)**2+(Y+YT)**2
@mashaan14
@mashaan14 4 ай бұрын
That's the beauty of coding, it can be done in different ways..
@mashaan14
@mashaan14 5 ай бұрын
Code walkthrough: kzbin.info/www/bejne/r2K9noCZgr6dobs You can access the notebook in github: github.com/mashaan14/VisionTransformer-MNIST/blob/main/VisionTransformer_MNIST.ipynb
@mashaan14
@mashaan14 5 ай бұрын
After playing with jax I don’t feel comfortable linking the notebook I showed in the video. Most of the video content is still valid because it shows the difference between pyg and jraph. However, in the notebook I used haiku which is not recommended by Google DeepMind. They recommend using flax instead. So, I linked a new notebook showing GCN code in JAX/Flax: github.com/mashaan14/KZbin-channel/blob/main/notebooks/2024_03_21_jraph_GCN.ipynb Here’s another video where I explained graph attention code in JAX/Flax: kzbin.info/www/bejne/hWLdeIqDesyKbaM
@mehmeterenbulut6076
@mehmeterenbulut6076 5 ай бұрын
Hi man, beautiful video explaining both libraries! Loved your explanation; clear and on point. About the issue that the test results of PyG and Jraph being different, I think it is because even though both obtained 100% training accuracy (which also means they overfitted the data), the decision boundary they draw for the training set is not necessarily the same. One reason that might lead them to be different is that PyG's and Jraph's GNN weights are probably initialized randomly. Therefore, their different decision boundaries can easily result in 2 different results on the test set.
@mashaan14
@mashaan14 5 ай бұрын
I loved your explanation, yeah it totally makes sense. But it was far simpler than that, I was training on two different feature matrices. If you notice in the jraph part, I passed this command: nodes=jnp.eye(data_Cora.x.shape[0]) I was training jraph on the identity matrix while training pyg on the feature matrix. I know it’s crazy how jraph got so close with only the identity matrix. Anyways, I couldn’t fix the notebook in the video because it was written in haiku. So I took it down and write a new one with JAX/Flax: github.com/mashaan14/KZbin-channel/blob/main/notebooks/2024_03_21_jraph_GCN.ipynb I’d love if you can take a look at the new code.
@mashaan14
@mashaan14 5 ай бұрын
There was a mistake in the video. I accidentally used the identity matrix instead of the feature matrix when I packaged the graph in a jraph.GraphsTuple. This line of code: nodes=jnp.eye(data_Cora.x.shape[0]), should be changed to: nodes=jnp.asarray(data_Cora.x), I fixed it on github and the notebook should work fine.
@oneplus383
@oneplus383 5 ай бұрын
Very good explanation. 😊
@mashaan14
@mashaan14 5 ай бұрын
Glad it was helpful!
@doublesami
@doublesami 5 ай бұрын
very informative, Can you please make a video on vision mamba or Vmamba and explain the theoretical as well as the implementation part ? Looking forward
@mashaan14
@mashaan14 5 ай бұрын
Thanks, I just checked VMamba on github. Sure I'll add it to my todo list. The thing is I'm recording a series on graph neural networks. Once I'm done with that, I'll get back to vision transformers.
@oneplus383
@oneplus383 5 ай бұрын
How can we convert pytoch model into jax?
@mashaan14
@mashaan14 5 ай бұрын
actually I'm not aware of any tool that can convert pytorch to jax.
@oneplus383
@oneplus383 5 ай бұрын
@@mashaan14 can you give me some idea about what can I change to run PyTorch to jax
@mashaan14
@mashaan14 5 ай бұрын
@@oneplus383 Instead of torch tensors you have to use jax.numpy arrays. For each pytorch layer, check its equivalent in jax documentation. Start with something small and build your way up. Here's how to code a simple NN in jax kzbin.info/www/bejne/fX-vgJRqp86sqZo
@oneplus383
@oneplus383 5 ай бұрын
Thansk mashaan brother
@oneplus383
@oneplus383 5 ай бұрын
You live in United States?
@nill513
@nill513 6 ай бұрын
Kudos to you, sir!
@mashaan14
@mashaan14 6 ай бұрын
happy to help..
@Sridhar.SubramanianMtech2023
@Sridhar.SubramanianMtech2023 6 ай бұрын
Thank you for this insightful video 😊
@dossantos4415
@dossantos4415 6 ай бұрын
Could you the same but for NLP
@mashaan14
@mashaan14 6 ай бұрын
I guess you want maps similar to the ones in this paper: Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate, 2014 If that's the case, I'm sorry I'm not familiar with that topic.
@vivekdabholkar5965
@vivekdabholkar5965 6 ай бұрын
Mashaan, Very proud of you, your accomplishments, and sharing of knowledge!
@mashaan14
@mashaan14 6 ай бұрын
thanks Vivek.. great to reconnect with you. It’s been a long time!!
@Falconoo7383
@Falconoo7383 6 ай бұрын
Very informative video...
@mashaan14
@mashaan14 6 ай бұрын
Glad you liked it
@Falconoo7383
@Falconoo7383 6 ай бұрын
Thank you for the informative video to explain softmax. Can you please explain how to install manim easily?
@mashaan14
@mashaan14 6 ай бұрын
I’m installing manim into google colab using the command: !pip install manim You can check the code I used to make this video: github.com/mashaan14/manim/blob/main/manim_visualizeSoftmax.ipynb
@Falconoo7383
@Falconoo7383 6 ай бұрын
@@mashaan14 Thank you. Last time, I don't know why but when i used its not working properly. Let me check again.
@alexanderzikal7244
@alexanderzikal7244 7 ай бұрын
Thank You, I tried it out! 1 mistake I found: Inside the class ToyDataset the function "def make_nested_classes(self):" has wrong brackets by n_samples=(int(self.size*0.6), int(self.size*0.2)) on github.
@mashaan14
@mashaan14 7 ай бұрын
Thanks for bringing it up. You're right. It should be square brackets like this: n_samples=[int(self.size*0.6), int(self.size*0.2)]
@alexanderzikal7244
@alexanderzikal7244 7 ай бұрын
A Tupel is needed -> round brackets, then all works fine.
@mashaan14
@mashaan14 7 ай бұрын
I just checked it on scikit-learn . It does need a tuple as an input. That’s strange!! when I run it on colab, it didn’t throw an error. Anyways, I’ll fix it on github.
@oneplus383
@oneplus383 8 ай бұрын
salam alaikum brother Mashan I am a student and I have found your tutorials via Pytorch Geometric, Since you are perfect in Graph Neural networks I wanna ask you something and hope I will find you kind in this regard, I wanna learn Graph neural networks using Jraph library would you please let me know about, How Can I learn it quickly.
@mashaan14
@mashaan14 8 ай бұрын
وعليكم السلام، To be honest I just heard about Jraph library from you. I’m trying to learn Jax, but I haven’t made any tutorials using Jax yet. Anyway, to learn Jraph, I suggest starting with something that you already know the outcome, for example a graph with 10 nodes. I’ll make sure to put Jraph on my todo list ✅.
@oneplus383
@oneplus383 8 ай бұрын
@@mashaan14 brother I appreciate your reply. Can you suggest me steps or things to study to get into the Graph neural network in short terms.
@mashaan14
@mashaan14 8 ай бұрын
@@oneplus383I think Stanford CS224W is good way to start: kzbin.info/www/bejne/gHKlkKOin5elmKM
@khanfor
@khanfor 9 ай бұрын
Great explaination Mashaan.
@mashaan14
@mashaan14 9 ай бұрын
thank you 🙏
@mohsinaljoaithen2343
@mohsinaljoaithen2343 9 ай бұрын
Very informative, keep the good work.
@mashaan14
@mashaan14 9 ай бұрын
thank you 🙏
@Islam_peacefull_Religion
@Islam_peacefull_Religion 9 ай бұрын
🥲i wish I could have GPU in my lappi(laptop)
@mashaan14
@mashaan14 9 ай бұрын
You can start playing with GPUs in google colab. But they have limitations on overusing GPUs.
@aboudramanediarra7086
@aboudramanediarra7086 9 ай бұрын
Hello, thank you very much for this beautiful presentation. I work with image data. I'd like to represent them in the form of a graph and predict the links between the pixels. Can you help me with some ideas or a piece of source code? Thanks in advance.
@mashaan14
@mashaan14 9 ай бұрын
I think you need to define some similarities between image pixels. For example, color and position similarities. Those similarities will serve as graph edges. Here’s a good reference on modeling image data as graphs, it summarizes a decade long of research: Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2010). Contour detection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence, 33(5), 898-916.
@darpan2648
@darpan2648 4 ай бұрын
Thank you..helpful video. Can i use GCN for some biological dataset? How it give results,pls suggest and help
@mashaan14
@mashaan14 4 ай бұрын
@@darpan2648 GCN can be used wherever you got a feature matrix X and an adjacency matrix A. For example, one of the most used datasets in GCN is Cora dataset, which contains a set of documents. The words inside the documents represent the feature matrix X. The citation links between the documents represent the adjacency matrix A. So, before running GCN on the biological dataset, you need to identify what are the feature and adjacency matrices.
@guanenteng4870
@guanenteng4870 10 ай бұрын
thank you. i hope you can create tutorial for spatial temporal data.
@mashaan14
@mashaan14 10 ай бұрын
So glad it helped you. Thanks for your suggestion, I saw several papers using GCN for spatial temporal data. Hopefully, I can write a tutorial on one of these.
@bigjeffystyle7011
@bigjeffystyle7011 10 ай бұрын
Great video. The code is really easy to understand for this application. Thanks for taking the time to summarize and condense a ton of information. Q: Have you seen any implementations in PyTorch Lightning or with DataLoaders for larger applications at scale? I'm looking at using this for a larger production application and any additional sources or packages would be helpful.
@mashaan14
@mashaan14 10 ай бұрын
Glad it was helpful! Actually, I haven’t come across a GCN for large scale applications. But there’s an interesting algorithmic change to GCN. The authors claim it runs faster with the same accuracy. paper: proceedings.mlr.press/v97/wu19e/wu19e.pdf code: github.com/Tiiiger/SGC