Videos of this quality on this subject are a rare occurrence. Great job!
@albertwang59745 жыл бұрын
The Capsule Network is the most accurate brain neuron network representation so far, and thanks for the implementation.
@yiangzheng11396 жыл бұрын
I feel like this video is even more helpful than last one, flawless! Thanks a lot
@Perryman11386 жыл бұрын
Nice! I’ll be watching this a few times. Your book just arrived in the mail today! Thank you so much for helping explain concepts to everyone!
@amirlibra41823 жыл бұрын
Awesome!!!! My mind's been starving to feed on CapsNet implementation concepts. Thank you for this concise understandable tutorial.
@richardchou96495 жыл бұрын
Very nice video. I often have difficulties in understanding all those DIMENSION stuffs, but this video explains very well!
@jacobusstrydom70175 жыл бұрын
Excellent video. Your explanation of the consent and code is by far the best I have seen so far, well done
@akshay_pachaar5 жыл бұрын
Making this must have taken a lots of hard work n a great mind. Thanks and keep it up.
@nateamus39206 жыл бұрын
Just bought your book and sent you an InMail. Incredible work, Mr. Géron!!
@foobar16723 жыл бұрын
Thank you for detailed explanations in your source code.
@HeduAI4 жыл бұрын
Best explanation on capusule networks! Thank you!
@everg866 жыл бұрын
This video is amazingly clear, just like your book! Thanks a lot!
@tingnews72736 жыл бұрын
After watch time by time finally get the idea. Thank you for your hard work.
@beckettman426 жыл бұрын
Excellent resource to learn by. I think I am finally starting to understand some of these concepts. Keep up the good work.
@amogh_wagh5 жыл бұрын
This is by far the best approach and explanation of the code that i have seen on the internet. Your notebook is great as well. Thank you for making people like us understand it better and please keep making more videos. Kudos. \m/
@CyberSecWithDesire4 жыл бұрын
I need to watch this video 1000000 times
@aa-xn5hc6 жыл бұрын
This video was great!! (again!). We are all looking forward to seeing more videos from you in the future.
@Ruhgtfo3 жыл бұрын
The clearest explanation thanks
@OttoFazzl6 жыл бұрын
This is an awesome presentation! You should definitely keep on making these videos!
@connerjairo74923 жыл бұрын
I dont mean to be so off topic but does someone know of a way to get back into an Instagram account..? I was dumb forgot the password. I would love any tricks you can give me
@jasperzakai39343 жыл бұрын
@Conner Jairo Instablaster ;)
@connerjairo74923 жыл бұрын
@Jasper Zakai Thanks for your reply. I found the site through google and Im in the hacking process now. Seems to take quite some time so I will get back to you later when my account password hopefully is recovered.
@connerjairo74923 жыл бұрын
@Jasper Zakai it did the trick and I finally got access to my account again. I'm so happy:D Thank you so much you saved my ass !
@jasperzakai39343 жыл бұрын
@Conner Jairo glad I could help :D
@parthdedhia41825 жыл бұрын
This video very well explained the implementation. It was a really great work. Thank you :)
@hardikmodi82346 жыл бұрын
Excellent video available on youtube for capsnet.
@ttmofy6 жыл бұрын
Great step-by-step explanation!
@karlotto93196 жыл бұрын
Another excellent video. Please keep them coming!
@miharbi006 жыл бұрын
Your videos are great. Thanks a lot!
@mouduge6 жыл бұрын
Thanks Evpatoria! :)
@unoqualsiasi73416 жыл бұрын
Thanks for the wonderful video tutorial!
@KarolMajek6 жыл бұрын
Awesome video! Thank you so much!
@ThibaultNeveu6 жыл бұрын
I implement capsNet using broadcasting with matrix multiplication without tf.tile by using einsum: u_hat = tf.einsum('abdc,iabcf->iabdf', w_ij, input_layer). It's more computationally efficient.
@AurelienGeron6 жыл бұрын
Hi Thibault, very interesting, thanks! I'll try it out.
@ThibaultNeveu6 жыл бұрын
As far As I know, you are from France, if you have some time check out my youtube channel :) I try to produce some deep learning tutorial for the French public!
@ThibaultNeveu6 жыл бұрын
Great video by the way!
@AurelienGeron6 жыл бұрын
Thanks Thibault! Tu as un abonné de plus. :)
@michaelrubinstein40886 жыл бұрын
I tried using einsum, as Thibault suggested. I replaced the matmul's in the caps2 predictions and the agreement computation. This resulted in a performance improvement of more that 15x. GPU (1080ti) utilization went from around 12% to around 90%. Thanks!
@reynaldborja34066 жыл бұрын
Priceless! So Amazing!
@massoudkhodadadzadeh81604 жыл бұрын
Great explanation of Capsnet
@sidrahliaqat76376 жыл бұрын
Awesome! Thank you very much for making this video!
@goldengrapeorg6 жыл бұрын
Thank you! Looking forword the Pytorch version
@maxpan12226 жыл бұрын
Awesome explanation!
@ja3zman6 жыл бұрын
Great explanation. Thanks
@0sandruskyi03 жыл бұрын
You are the best
@subhamnaskar19984 жыл бұрын
Respected Sir, can you please make a video on extreme learning machine. Your videos are really informative and helpful.!
@jjaannnniikk2 жыл бұрын
Great Video! Is there any updated versions based on tf 2? Thanks
@cyrildigrandi96 жыл бұрын
Super vidéo Aurelien Merci !
@vladimiriurcovschi16576 жыл бұрын
Nice job! Thank you !
@EsdrasSoutoCosta6 жыл бұрын
Awesome video, thanks. In the video you cite that this is a shallow architecture, but how would be a deep?
@AurelienGeron6 жыл бұрын
Hi Esdras, you would just stack as many hidden capsule layers as you want on top of the primary capsule layer, before the final digit layer. These hidden capsule layers would be implemented exactly like the digit layer, using the routing by agreement algorithm. Hope this helps.
@EsdrasSoutoCosta6 жыл бұрын
Thank you very much. Helped a lot. I'm studying Capsules Network for my Master degree.
@neumdeneuer18905 жыл бұрын
@@AurelienGeron Hi, short follow up question: I guess since the stacked hidden capsule layers are fully connected the complexity would explode quite quickly making the architecture unusable (do you agree ?). Is there some proposed way to only connect certain capsules in the hidden layers e.g. only the ones at the same grid positions like in CNNs ? Thanks in advance for your engagement in the comments.
@AurelienGeron5 жыл бұрын
@@neumdeneuer1890 excellent question, you do not have fully connected layers, it could indeed be partial, as you suggest. I recommend you check out the latest paper which makes some significant changes to the architecture (but the main ideas are still there), in particular partial connectivity. See the link in the video description. Hope this helps!
@PhongNguyen-zz1ei6 жыл бұрын
Hello guys, I have a quick question ? Why the author choose the length of the second capsule layer is 16 ? I get the idea behind the multiplication each capsule containing 8 direction values in the output of the 1st capsule layer with the weight vector. But what i dont understand is there any specific reason to choose the number 16 or am I misunderstanding something here ?
@AurelienGeron6 жыл бұрын
Hi Phong, great question. You could very well choose another number, like 10, 20 or 30. The tradeoff is this: you need a sufficient number of dimensions to hold all the pose information of the digits. That number should probably be greater than 8, since there's more information in the pose of a full digit than in the pose of one of its parts. However, you probably don't want too many dimensions either, because this would add a lot of parameters in your model, with the risk of overfitting the training set and adding too much computation time. So I suppose the authors experimented with various values and found that 16 performed well. I hope this helps.
@animeshkarnewar36 жыл бұрын
It's like we have 16 dimensions along which certain information is encoded. For example, the video mentioned that by changing certain values across some dimension of this vector, we can change the thickness of the digit reconstructed. What would be really interesting is somehow if we could interpret these dimensions, we would be able to generate the digit the way we want. Here is a small similar architecture that I built. -> medium.com/mlreview/aann-absolute-artificial-neural-network-ae8f1a65fa67 This might be of help.
@4yt1586 жыл бұрын
Love your book! Worth the money!! Could you also please make videos on Deep Reinforcement learning? :)
@AurelienGeron6 жыл бұрын
Thanks Vamsi! I wish I could clone myself, I have so many things I want to do. Sure, I'd love to do a video on Deep RL. Would you like a high-level overview (like my Capsule Networks video) or a video focused on the implementation (like this video)?
@4yt1586 жыл бұрын
Thank you Aurelien for taking the time to make videos for educating us! :) I know you get a lot of requests like mine. It would be great if you could please do a video on both the high level and keras implementation on Deep RL. Your video on capusule networks is amazing. I listened to Prof. Hinton's video on the same topic but understood nothing! Also, another big topic missing on youtube or books on Amazon is good resources on energy based methods. Most technical papers by Bengio, Hinton,etc are so mathematical that many students can't understand what's going on. Do you have any plans to write a book or make videos on those topics? Thank you! :)
@AurelienGeron6 жыл бұрын
Sounds good, I'm adding this idea to my list to videos to do. I can't commit to a deadline, however, because my agenda is crazily packed these days, but I'll do my best. In the mean time, since you have my book, check out chapter 16: it presents Deep Reinforcement Learning, in particular Policy Gradients and Deep Q-Learning. You can also look at the Jupyter notebook for that chapter here: github.com/ageron/handson-ml/blob/master/16_reinforcement_learning.ipynb There's the code to train an agent to play Ms-PacMan based only on the raw pixels. Hope it helps!
@sameeraramasinghe6936 жыл бұрын
Excellent EXCELLENT!!! thanks a lot
@sunderrajan61726 жыл бұрын
Well explained. Thanks
@luisvalesilva89316 жыл бұрын
Brilliant video!
@animeshkarnewar36 жыл бұрын
I have a way to get around the ugly-code problem. I usually use the variable_scopes and the weight_reuse parameter inside the functional interfaces that I create for the tensorflow computation graphs. So this way, there is no need to pass empty arrays to the placeholders that are not required.
@mouduge6 жыл бұрын
Animesh Karnewar Interesting. Could you please submit a Pull Request for the notebook?
@011azr6 жыл бұрын
OMG, you're French and you speak both French and English very clearly. That's a talent! LOL xD. Anyway, thanks for your video, they're the best. :)))).
@animeshkarnewar36 жыл бұрын
Indeed! I think what he speaks best is the language of 'Deep Learning' :), :p :D.
@bingeltube6 жыл бұрын
Very recommendable
@animeshkarnewar36 жыл бұрын
I noticed that you have set the random seed_value = 42. Does it have to do anything with "The Hitchhiker's Guide to the Galaxy"? lol :D.
@AurelienGeron6 жыл бұрын
Animesh Karnewar , absolutely! :))
@animeshkarnewar36 жыл бұрын
Aurélien Géron haha 👍
@dbp_patel_19946 жыл бұрын
Such great Easter Eggs. :)
@dhruvnil87456 жыл бұрын
If you read the sklearn source, you will find that they too have left this easter egg everywhere.
@Gunth0r6 жыл бұрын
and what about the poppies? I get the capsule reference, but still, why poppies? :p
@dpdove165 жыл бұрын
There is slight error in the margin loss formula at 21:55. The sign before the lambda should be a + (plus) and not a - (minus). We could make the parameter negative to rectify. Thank you for the awesome video though.
@AurelienGeron5 жыл бұрын
Thanks Dhawal, I appreciate your feedback. Someone mentioned this error some time ago, so I added it to the list of errata in the video description. I wish I could add a note directly within the video, but KZbin removed this feature (or perhaps they restricted it to people with more subscribers, I'm not sure). Cheers!
@NithirojTripatarasit6 жыл бұрын
Thank you!
@kennethchong47215 жыл бұрын
Thank you for creating this wonderful tutorial! :) I've learnt a lot from your videos on capsule networks and your book! However, could you explain how primary capsules being fully connected to digit capsules allows me to use tf.reshape(conv2, [-1, caps1_n_caps, caps1_n_dims]) to group 8 scalars across 8 feature maps (i.e. 1 scalar from each feature map) together to form a 8D primary capsule? My understanding is that tf.reshape reshapes numpy arrays by going through numbers of the numpy array in the order of columns -> rows -> depth, whereas using tf.reshape as you did goes through scalars of the feature maps in the order of depth (across activation maps) -> columns -> rows
@AurelienGeron5 жыл бұрын
Hi Kenneth, thanks for your kind words and your question. The output of the second convolutional layer is [batch_size, height, width, channels], so we could reshape it to: [batch_size, height, width, caps1_n_caps_per_position, caps1_n_dims], and this would just reshape the last dimension. For example, if batch_size=10, height=6, width=6, channels=256, caps1_n_caps_per_position=32, caps1_n_dims=8, each position in each feature map would start with 256 scalars and would end up with 32 vectors of 8 dimensions each. This would be the output of the primary capsules, represented as a 5D array. Now if we want these primary capsule outputs to be fully connected to the digit capsules, we don't need to preserve the horizontal & vertical dimensions (i.e., the 6x6 shape), we can just reshape these outputs to [batch_size, height*width*caps1_n_caps_per_position, caps1_n_dims], which in this example is [batch_size, 1152, 8]. Similarly, when you want to train a dense network to classify MNIST, you start by reshaping the inputs from [batch_size, 28, 28] to [batch_size, 28*28], you don't need the location information. In the diagrams I preserved the 6x6 representation, to make it easier to understand what each arrow corresponds to, but the digit capsules actually just get a long list of 8D vectors as input. I hope this helps.
@kennethchong47215 жыл бұрын
@@AurelienGeron Hi, thank you for replying to my question! :) I think I understand what's happening better now. To clarify if I understood it correctly: When forming primary capsules, it doesn't matter whether scalars are taken from one feature map or across different feature maps. A primary capsule in this case is simply an 8D vector of feature map scalars.
@AurelienGeron5 жыл бұрын
@@kennethchong4721 my pleasure! Each 8D primary capsule output must be taken from scalars across feature maps, not within the same feature map. That's what the reshape operation does: try reshaping arr = np.arange(2×3×3×6).reshape(2,4,4,6), like this: arr.reshape(2, 4×4×2, 3). Notice what happens to the last dimension: it just gets split in 2. Hope this helps.
@kennethchong47215 жыл бұрын
@@AurelienGeron Ah yes I get it now with the example you provided! but I think there's just a small typo where it should be np.arrange(2*4*4*6) :) Can I also clarify just 2 last questions I have about capsule networks after watching the video: Does the orientation of the 8D vector directly represent the orientation of a feature the primary capsule detects, or is it just an abstract idea to represent the values within the vector? Does training only take place between the primary capsule layer and the digit capsule layer through weight matrix W? Thank you so much for your help nonetheless! :)
@MrDarkech5 жыл бұрын
at 7:07, I think green circle must point to vector number 1 of primary capsule number 2, not vector number 2 of primary capsule number 1 or did I misunderstand something? Btw thank you for the great video!
@kamalkamal-ms8re6 жыл бұрын
great video sir. i have some issues like understanding the margin loss. The paper uses a special margin loss to make it possible to detect two or more different digits in each image. now if i want to implement the network in some other kind of image classifier which have only one image and they classify either the image is positive or negative means 0 or 1 then do i need the margin loss as there is only one image and if i need the margin loss what would be T ? is it the labels 0 and 1?
@harikrishnanrajeev14326 жыл бұрын
Great Video. Thank you. Is there a place to know whats the latest happening with Capsule N/w ?. Thanks.
@AurelienGeron5 жыл бұрын
Thanks! And sorry for the late response, I just saw your comment. You can check out: scholar.google.com/scholar?q=capsule+networks or principlesofdeeplearning.com/index.php/resources/list-of-research-papers-on-capsule-networks/
@webentwicklerappentwickler33075 жыл бұрын
Hey, veeeerrryyyy nice video but I think the margin loss term is written with a + bevor the lambda not a -
@AurelienGeron5 жыл бұрын
Great catch, thanks! I added an erratum in the video description (at least the code to compute the loss is correct, at 23:42).
@yavuzkahraman43684 жыл бұрын
!!!videos have great explanation. I wonder how these works on high resolution images (i.e. 1024*1024). could you please give some examples in simple way. Thanks in advance
@liusvelsocarrassanchez94876 жыл бұрын
Very good work Géron i am looking forward for your next video now i have a question : how i can design a net with capsules for the recognition of images with more than one mnist digits ??? could you bring some light i am a starter, thanks again ;)
@AurelienGeron6 жыл бұрын
Liusvel Socarrás Sánchez thanks, I'm glad you found the video useful. You can use the exact same model to detect multiple digits. All you need to do is use a training set containing images with overlapping digits, and each label should be a vector of size 10, with 1s for each digit that is present, and 0s elsewhere. Apart from that, all the code will be identical. Hope this helps!
@omeryalcn57976 жыл бұрын
may be I was wrong , sorry about that. I saw firstly , I suppose this resemble feature detection works , I mean SIFT algorithm , I mean descriptors etc . Because first feature detectors is bad , when image was transformed or rescaled . then SIFT algorithm came , it solved this problem. Hopefully I can understand what I mean.
@AurelienGeron6 жыл бұрын
Hi Ömer, thanks for your comment, it's really interesting, I never realized it but I agree with you that there are resemblances between SIFT and CapsNets: both take low level features and look for agreement between these low-level features regarding the presence and pose of higher level objects, using a clustering algorithm to identify agreement. One important difference is that a CapsNet *learns* the low-level features using convolutional layers trained with backpropagation, whereas SIFT extracts them from a previously built database of low-level features. Moreover, I don't think SIFT supports multiple levels, I think it goes directly from the low-level features to objects (although I'm not entirely sure, you may know better). SIFT also only looks at the location, scale and orientation of each low-level feature, whereas a CapsNet can learn other kinds of "pose" parameters, such as the thickness, skew, etc. If you look at the latest CapsNet paper (the link is in the video description), it also organizes the Capsules much like in a Convolutional Layer, which makes it exploit the local and hierarchical structures present in most images. Anyway, excellent point, thanks!
@omeryalcn57976 жыл бұрын
Thank you your answer. I agree your words about difference SIFT and CAPs Net. I intend only keypoint descriptor part. I think they are robust transformation .You are .I was really happy as someone who read your machine learning book and learned a lot from it. I would love to share your experiences with you if you have the time.
@AurelienGeron6 жыл бұрын
Sure, you can connect on LinkedIn.
@tbelkas5 жыл бұрын
Hello, mr Géron, this is a great video, it really did help me understand a lot of the math behind the capsule neural networks. I noticed that you recently answered a question here and I was wondering whether you could help me out a bit. The problem is that when I run the provided code, the accuracy is horrible (in the 9-10% mark, loss is at 1.5~). Even after 5-6 epochs it doesn't get better. I would be really grateful if you could explain why does this happen
@asishkumarmishra25972 жыл бұрын
Well I am a noob here. But, I think the capsule networks aren't that fast at learning! Most of the implementations, I saw, use epochs in order of 100 like 100 epochs, 500 epochs et cetera. So, I think running 5-6 epochs would be very less for the network to learn. Hope it helps!!
@seanjhardy5 жыл бұрын
Hello, this might seems like an extremely naive question as I am an extreme beginner in this topic, hoping to implement a similar algorithm like this in Java without copying any existing library so that I can test some different ideas. I would really love an explanation of how the initial feature maps from the primary capsule contain information about rotation. Surely you would need to apply the same capsule over the area multiple times with varying rotation in order to determine the angle of an object but obviously, this is just inefficient. When you described capsule networks in your Capsule networks tutorial video, you implied that the rotation of the object is encoded as one of these dimensional vectors which are used to predict what the object is, but I don't see how the network could use the same convolution to detect different rotations of an object. Sorry if this question is a little basic/I am not understanding correctly, an answer would be greatly appreciated!
@AurelienGeron5 жыл бұрын
Hi Sean, That's a great question! Indeed, a convolutional layer will not learn a rotation unless you show it some examples of rotated images. In practice, many filters will be dedicated to detecting different rotations of the same shape. However, the capsule layers can take these multiple outputs and (perhaps) encode them into a single dimension representing the rotation angle. It seems that Capsnets are better than many other architectures at learning with few examples.
@seanjhardy5 жыл бұрын
@@AurelienGeron thank you! I find it incredibly interesting that this architecture can encapsulate such high level information and integrate multiple streams into a consise representation of reality. Thank you for taking time to respond!
@juliogodel6 жыл бұрын
Great.. Thanks!
@pedrohenriquefariateixeira58614 жыл бұрын
Can I train other dataset based on this capsnet implementation? I trying to implementing a capsnet for the BreakHis dataset. If you are reading this and can help me with some links, thanks.
@kislaykunal89216 жыл бұрын
hi, we select 2 conv layers, what i don't understand is based on what we are choosing these hyperparameters, i know it is more geussing and checking but what would be your suggestion if I'm trying to train a deep capsule network on 1048*1048 images for object detection with yolo labels at the end. Should I increase the no. of conv layers or should I just increase the no. of kernels. I am having this doubt because minute feature means better prediction by dynamic routing(if I understand capsule correctly) but at the cost of compute I'm trying to find a sweet spot here.
@whyzzy26834 жыл бұрын
Hello. I have questions. In normal situation, when we do inference input could be just single image. But in capsnet do we need to put many images to model? because of convergence of coefficient?
@edeneden975 жыл бұрын
do the capsules share the prediction matrices weights with nearby capsules?
@AurelienGeron5 жыл бұрын
Hi Eden, great question! No, in the original paper described in this video, nearby capsules do not share weights (except for the primary capsules, since they are mostly just regular convolutional layers). Hope this helps.
@edeneden975 жыл бұрын
@@AurelienGeron thanks! A read the paper a long time ago and remembered some sort of sharing weights. Great video!
@AurelienGeron5 жыл бұрын
@@edeneden97 Thanks, I'm glad you enjoyed the video! :)
@midhulavijayan59766 жыл бұрын
Great explanation. Thank you sir. I have one doubt. Is it possible to apply capsule net for large scale inputs ?.
@AurelienGeron6 жыл бұрын
It is possible, but so far CapsNets are slower than Convolutional Neural Nets, and they also don't achieve the same accuracy on large images (e.g., ImageNet). But there's rapid progress, so this may change in the coming months. Check out the authors' latest paper (link in the video description). Hope this helps!
@sichenggu58065 жыл бұрын
Hi! I have used your notebook to repeat this CapsNet and the val_acc is only about 10%, that's weird. I wonder why this happened...?
@animeshkarnewar36 жыл бұрын
Awesome video! The notebook is very helpful. Although I have a small question: In the squash function, doesn't the unit vector itself ensure that the values are in the range of 0 - 1, while preserving the direction of the vector. Could you please explain what is the need of the squash factor?
@mouduge6 жыл бұрын
Animesh Karnewar The unit vector has a length of exactly 1. We want to make sure vectors longer than 1 gets squashed down to the 0-1 range, but we don't want to grow small vectors up to 1.
@animeshkarnewar36 жыл бұрын
Aurélien Geron Oh. Now I got it. So, we are converting all the vectors proportionately in the Range [0 - 1) while preserving the directions. Yes, it makes sense. TYSM for the explanation. Cheers!
@animeshkarnewar36 жыл бұрын
Yeah. Because of the squashing factor, very long vectors (length >> 1) will end up being close to 1, while vectors that have a length less than 1 will get squashed down towards zero length vectors thereby nullifying their effect (as if the vectors weren't there). This would suppress all the non significant vectors and keep the contribution from the important ones and since this ensures that range stays within 0-1, this function probably has the effect of ReLU followed by Batch Normalization. Just sharing some of my thoughts after understanding this concept.
@sagnikbhattacharya12026 жыл бұрын
J'aime vos vidéos! Ferez-vous des vidéos en français aussi? Ça serait très utile pour moi.
@AurelienGeron6 жыл бұрын
Sagnik Bhattacharya, merci! J'y pense, je vais sans doute simplement traduire les vidéos anglaises en francais, et les envoyer sur une autre chaîne.
@bryan37926 жыл бұрын
I know it gonna be good before I see the viedo
@animeshkarnewar36 жыл бұрын
I was just wondering; the process of routing primary_caps layer's outputs to the digit_caps layer is iterative. The iterations are required to change the log_prior weights (which are always 0s initially ensuring equal weight.) in accordance to the output of the digit_caps layer (after squashing). So, basically, the purpose of this routing is to generate the routing_coefficients that would maximize the scalar product of the digit_caps output vectors and the predicted_vectors by the primary_caps layer. Can't we just use another parallel fully connected layer which would predict the routing_coefficients from the predicted_outputs such that, this parallel fully connected layer would be optimized by constructing appropriate objective function (in our case: maximize the scalar product)? This thought is mostly inspired by the concept of Synthetic Gradients. It would be a great help if you could shed some of your insights over this thought.
@AurelienGeron6 жыл бұрын
Animesh Karnewar interesting idea, you could definitely try! This is a new domain, it's likely that there are small tweaks to be found that will strongly affect the performance of the network, or it's training time.
@RAJATTHEPAGAL6 жыл бұрын
Awesome :-D ...
@richardmaddison81965 жыл бұрын
Hi Aurélien, I loved this video and used it extensively to build an MNIST Capsule Net in Microsoft Excel. I finished this a couple of weeks ago and wrote it up here www.RichardMaddison.com I also just bought your book "Hands-On Machine Learning with Scikit-Learn and Tensor Flow" and look forward to reading through this immensely. Thank you so much for your hard work and the inspiration you have given. Richard
@usmabhatt17686 жыл бұрын
Great! I have two questions Do I need reconstruction code if I have to do binary classification? Do it works the same way if I use 3D Convolution layer?
@AurelienGeron6 жыл бұрын
Thanks Usma! The reconstruction part of the CapsNet (i.e., the decoder) acts as a regularizer: it encourages the CapsNet to learn a high level representation of the whole image, rather than base its classification decisions on just a few pixel values. Thanks to the decoder, the CapsNet is less likely to overfit the training data, and it will generalize better to new instances. And yes, you can build a CapsNet in just the same way using 3D Conv Layers, or even 1D Conv Layers: it's a pretty general concept. However, make sure to check out the latest paper by the same authors: "Matrix Capsules using EM Routing". It introduces a lot of significant changes to the CapsNet architecture (while keeping the same core principles, of course). Hope this helps.
@usmabhatt17686 жыл бұрын
Aurélien Géron Thank you sir!
@akashjadhav50936 жыл бұрын
Thanks for such a wonderful explanation!! one question though. How can I use this architecture on dataset other than cifar and MNIST? or what precautions are necessary?
@asyeerajat29514 жыл бұрын
Hello. I'm new to python and deep learning. I have the same problem here. How to use my own image dataset? Did you already have the solution? It is wonderful if you could share it. Thank you.
@habibakhaled97712 жыл бұрын
@@asyeerajat2951 Hello, I am currently facing the same problem. Have you reached a solution? Thank you in advance.
@wolfisraging6 жыл бұрын
Plz make more videos
@marat616 жыл бұрын
Where did you use special training algorithm? How does adam gradient based method trai capsule net?
@AurelienGeron5 жыл бұрын
Many things are special in this model, but training is just as usual, using backpropagation. Let me make an analogy: an RNN implements a loop over the inputs, in pseudo-code: state=0 for input in inputs: state = compute_new_state(input, state) return compute_output(state) Yet, backpropagation works fine, because this loop is actually "unrolled" internally (either statically before even seing the data, which requires knowing in advance the length of the inputs, or dynamically when handling one training batch): state=0 state = compute_new_state(inputs[0], state) state = compute_new_state(inputs[1], state) state = compute_new_state(inputs[2], state) state = compute_new_state(inputs[3], state) return compute_output(state) So the fact that the model contains a loop is not a problem for backpropagation. Back to capsule networks: the routing by agreement algorithm includes a loop, but that's not a problem, it can just be unrolled. Hope this helps
@AmanKhandelia6 жыл бұрын
Hi there, I have a question regarding the weight matrix (before I continue I wish to state I want to implement this all by myself and learn in the process of doing so, I am basically squinting at your videos to get just enough to get the idea, hence my question could be plainly incorrect, if so, do correct me), it is mentioned on page 4, first para, "each capsule in the [6 x 6] grid is sharing their weights with each other", based on this quote, shouldn't we should only have 32 set W_i_j, where each W_i_j is replicated 36 (6*6) times for each grid? P.S.: I am a novice at this so at I may be just outright wrong about it, if so please do correct me, any help is deeply appreciated.
@AurelienGeron6 жыл бұрын
Hi Aman, the primary capsules are basically a couple Convolutional Layers whose outputs are reshaped and squashed. All neurons in a given feature map of a convolutional layer share their weights. So if you use a standard convolutional layer, the weight sharing they talk about in the paper (i.e., weights between the inputs and the primary capsules) will be taken care of automatically, no need to worry about it. The weights you are refering to (W_ij) are not the same: these are the weights between the primary capsules and the digit capsules. Those weights are not shared. At least not in this implementation (but check out the new paper, "Matrix capsules with EM routing", the link is in the video description, they do share these weights in this case). Hope this helps!
@AmanKhandelia6 жыл бұрын
Got it, thanks for the info, I stand corrected. One last thing before I sign off, in the squashing operation, there are two parts to it, one where we normalise our vector, i.e., s_j/||s_j||, which basically is a unit vector with the same orientation, but as for the other part, I am unable to arrive at an interpretation, after doing a bit of math, I turned it into 1 - (1/ (1 + ||s||^2) ), now how can I interpret this formulation. What does the magnitude of input vector s_j indicate, and is there a nice meaning to this equation which I am missing
@AurelienGeron6 жыл бұрын
You are right, s_j / ||s_j|| is a unit vector with the same orientation. The other part does not have a nice interpretation other than "large vectors will end up with a length close to 1, and small vectors will end up with a length close to 0". This was one of the reasons why in their latest paper they decided to encode the probability of presence differently (as a separate output). :)
@ritap_CV5 жыл бұрын
Can you do the same for pytorch????
@aryanmobiny73406 жыл бұрын
Thanks for the nice code and explanation. I had a question regarding the capsnet. Would it be possible to add layers (like a conv-caps layer after the primary caps layer, or a fully connected caps layer with 20 capsules befor the digit caps layer?) I tried it myself and am getting terrible results! I can't understand why is it happening. Do you have any idea?
@AurelienGeron6 жыл бұрын
Thanks Aryan. I haven't tried adding extra layers yet, but I'm a bit surprised that you're getting terrible results. Perhaps it's just that you end up with so many parameters that the amount of training data becomes insufficient. In their latest paper ("Matrix capsules with EM routing") the authors introduce some improvements that dramatically reduce the number of parameters per layer, so I'm guessing it will help. As soon as I have time, I'll try the new architecture.
@aryanmobiny73406 жыл бұрын
Thanks for your reply. I can't wait to see your videos on EM routing ;)
@MultiRoommate6 жыл бұрын
Pls help me to understand this, the weights in dynamic routing agreement adjusted based on the vector results from digit capsule.but how the filters(256) applied in convolution layer would change its value? Or any logic behind in it. Many thanks!
@AurelienGeron6 жыл бұрын
Hi! The weights of the convolutional layers are trained using regular backpropagation (and so are the weight matrices W_ij in the capsules). Basically, at each training iteration, a batch of images is fed to the input layer, it goes through the convolutional layers, and up through the capsule layers (including through the routing by agreement algorithm), and finally we get the outputs, and the loss is computed (based on both the digit capsule's outputs and on the reconstruction loss). Then backpropagation starts: the gradient of the loss is backpropagated through the whole network and all the model parameters get updated (this includes the W_ij matrices and the convolutional layer parameters, but it does *not* include the routing by agreement weights, which are just computed on the fly when needed, but they are not "stateful" model parameters). At inference time, the first forward phase happens in exactly the same way (including the routing by agreement iterations), but once we have the outputs, we're done (we don't compute the loss and we don't do any backpropagation). I hope this helps.
@MultiRoommate6 жыл бұрын
Got it. Many thanks!
@JonasDonald6 жыл бұрын
Do you have any results on how the network performed on images in which there are multiple different numbers?
@AurelienGeron5 жыл бұрын
Oops, just saw your comment, sorry. The paper contains many results on the MultiMNIST dataset with two overlapping digits. The baseline models (not CapsNets) got 8% error, while the CapsNet got 5.2% error. Hope this helps.
@rohitdulam87626 жыл бұрын
Nice video. I have a small doubt. In the training loop, there is a statement y = y_batch. But, y is of shape [None] and y_batch of shape [batch_size, 10]. Is that kind of an assignment statement valid? I've tried it and got an error. If I've missed something, please correct me.
@rohitdulam87626 жыл бұрын
Sorry, it was my mistake. While reading the dataset, I set one_hot to true.
@AurelienGeron6 жыл бұрын
Thanks Rohit. In the Jupyter notebook, `y_batch` is obtained by `X_batch, y_batch = mnist.train.next_batch()` which returns a one-dimensional array for the targets, so everything's fine. Perhaps you loaded MNIST with `one_hot=True`, or using Keras? My code definitely assumes that `y` is one-dimensional. Make sure you load MNIST just like in the notebook. Hope this helps.
@rohitdulam87626 жыл бұрын
Yes, it was my mistake, I put one_hot to true while loading the dataset. Thank you for taking time and replying, keep up the good work. 😀
@kislaykunal89216 жыл бұрын
I dont get what he did at 9:46. can anyone explain??
@kislaykunal89216 жыл бұрын
I don't get the additional dimension thing for the batch size
@kislaykunal89216 жыл бұрын
Is it because he is trying to parallelize the operations?
@AurelienGeron6 жыл бұрын
Hi Kislay, when you call tf.matmul(A,B), the shapes of A and B must be compatible. For example, if A's shape is [5,6,8,3,2], it is best to think of it as a 5x6x8 array of matrices (specifically, a 5x6x8 array where each cell is a 3x2 matrix). In this case, B should also be a 5x6x8 array of matrices (e.g., 2x9 matrices), and tf.matmul(A,B) will multiply each matrix in A with the corresponding matrix in B. As you point out, this will all happen in parallel on the GPU so it will be much faster than computing all these multiplications one by one. More generally, if A's shape is [a,b,...,c,d,E,F] then B's shape must be [a,b,...,c,d,F,G] and the result's shape will be [a,b,...,c,d,E,G]. This is because the product of an ExF matrix and an FxG matrix is an ExG matrix. So what this part of the code is doing is just making sure the arrays have compatible shapes, adding dimensions where needed (by copying the data using tf.tile()), so we can then efficiently compute many matrix multiplications in one shot. Note that even though this is much more efficient than doing all the matrix multiplications sequentially, Thibault Neveu kindly suggested (see his comment on this video) an even faster way to do this using tf.einsum(), without having to copy data with tf.tile(). Hope this helps!
@kislaykunal89216 жыл бұрын
Thanks alot for the explanation but what i am unable to understand is the part where u make one copy per instance of the batch .Also what do u mean by batch instance is it the mini batches? because if so then how can u forward propagate the whole thing at once. Again Thanks!
@kislaykunal89216 жыл бұрын
Sorry to bother you but do u mean image in the mini batch by single batch instance.
@wolfisraging6 жыл бұрын
Why don't u create more videos???????????????
@AurelienGeron6 жыл бұрын
Rishik Mourya I would love to, I'm just not finding the time right now. But I'll do my best!
@wolfisraging6 жыл бұрын
Aurélien Géron, hope to see u soon.
@StanleySalvatierraАй бұрын
I’m from the future, Transformers won the war.
@julioargumedo67226 жыл бұрын
Awesome video, thank you very much!
@briancompton47056 жыл бұрын
Thank you!
@briancompton47056 жыл бұрын
I'm buying myself two books for Christmas! Yay! Lol