How to implement CapsNets using TensorFlow

Рет қаралды 55,722

Aurélien Géron

Күн бұрын

Пікірлер: 166

6 жыл бұрын

Videos of this quality on this subject are a rare occurrence. Great job!

@albertwang5974 5 жыл бұрын

The Capsule Network is the most accurate brain neuron network representation so far, and thanks for the implementation.

@yiangzheng1139 6 жыл бұрын

I feel like this video is even more helpful than last one, flawless! Thanks a lot

@Perryman1138 6 жыл бұрын

Nice! I’ll be watching this a few times. Your book just arrived in the mail today! Thank you so much for helping explain concepts to everyone!

@amirlibra4182 3 жыл бұрын

Awesome!!!! My mind's been starving to feed on CapsNet implementation concepts. Thank you for this concise understandable tutorial.

@richardchou9649 5 жыл бұрын

Very nice video. I often have difficulties in understanding all those DIMENSION stuffs, but this video explains very well!

@jacobusstrydom7017 5 жыл бұрын

Excellent video. Your explanation of the consent and code is by far the best I have seen so far, well done

@akshay_pachaar 5 жыл бұрын

Making this must have taken a lots of hard work n a great mind. Thanks and keep it up.

@nateamus3920 6 жыл бұрын

Just bought your book and sent you an InMail. Incredible work, Mr. Géron!!

@foobar1672 3 жыл бұрын

Thank you for detailed explanations in your source code.

@HeduAI 4 жыл бұрын

Best explanation on capusule networks! Thank you!

@everg86 6 жыл бұрын

This video is amazingly clear, just like your book! Thanks a lot!

@tingnews7273 6 жыл бұрын

After watch time by time finally get the idea. Thank you for your hard work.

@beckettman42 6 жыл бұрын

Excellent resource to learn by. I think I am finally starting to understand some of these concepts. Keep up the good work.

@amogh_wagh 5 жыл бұрын

This is by far the best approach and explanation of the code that i have seen on the internet. Your notebook is great as well. Thank you for making people like us understand it better and please keep making more videos. Kudos. \m/

@CyberSecWithDesire 4 жыл бұрын

I need to watch this video 1000000 times

@aa-xn5hc 6 жыл бұрын

This video was great!! (again!). We are all looking forward to seeing more videos from you in the future.

@Ruhgtfo 3 жыл бұрын

The clearest explanation thanks

@OttoFazzl 6 жыл бұрын

This is an awesome presentation! You should definitely keep on making these videos!

@connerjairo7492 3 жыл бұрын

I dont mean to be so off topic but does someone know of a way to get back into an Instagram account..? I was dumb forgot the password. I would love any tricks you can give me

@jasperzakai3934 3 жыл бұрын

@Conner Jairo Instablaster ;)

@connerjairo7492 3 жыл бұрын

@Jasper Zakai Thanks for your reply. I found the site through google and Im in the hacking process now. Seems to take quite some time so I will get back to you later when my account password hopefully is recovered.

@connerjairo7492 3 жыл бұрын

@Jasper Zakai it did the trick and I finally got access to my account again. I'm so happy:D Thank you so much you saved my ass !

@jasperzakai3934 3 жыл бұрын

@Conner Jairo glad I could help :D

@parthdedhia4182 5 жыл бұрын

This video very well explained the implementation. It was a really great work. Thank you :)

@hardikmodi8234 6 жыл бұрын

Excellent video available on youtube for capsnet.

@ttmofy 6 жыл бұрын

Great step-by-step explanation!

@karlotto9319 6 жыл бұрын

Another excellent video. Please keep them coming!

@miharbi00 6 жыл бұрын

Your videos are great. Thanks a lot!

@mouduge 6 жыл бұрын

Thanks Evpatoria! :)

@unoqualsiasi7341 6 жыл бұрын

Thanks for the wonderful video tutorial!

@KarolMajek 6 жыл бұрын

Awesome video! Thank you so much!

@ThibaultNeveu 6 жыл бұрын

I implement capsNet using broadcasting with matrix multiplication without tf.tile by using einsum: u_hat = tf.einsum('abdc,iabcf->iabdf', w_ij, input_layer). It's more computationally efficient.

@AurelienGeron 6 жыл бұрын

Hi Thibault, very interesting, thanks! I'll try it out.

@ThibaultNeveu 6 жыл бұрын

As far As I know, you are from France, if you have some time check out my youtube channel :) I try to produce some deep learning tutorial for the French public!

@ThibaultNeveu 6 жыл бұрын

Great video by the way!

@AurelienGeron 6 жыл бұрын

Thanks Thibault! Tu as un abonné de plus. :)

@michaelrubinstein4088 6 жыл бұрын

I tried using einsum, as Thibault suggested. I replaced the matmul's in the caps2 predictions and the agreement computation. This resulted in a performance improvement of more that 15x. GPU (1080ti) utilization went from around 12% to around 90%. Thanks!

@reynaldborja3406 6 жыл бұрын

Priceless! So Amazing!

@massoudkhodadadzadeh8160 4 жыл бұрын

Great explanation of Capsnet

@sidrahliaqat7637 6 жыл бұрын

Awesome! Thank you very much for making this video!

@goldengrapeorg 6 жыл бұрын

Thank you! Looking forword the Pytorch version

@maxpan1222 6 жыл бұрын

Awesome explanation!

@ja3zman 6 жыл бұрын

Great explanation. Thanks

@0sandruskyi0 3 жыл бұрын

You are the best

@subhamnaskar1998 4 жыл бұрын

Respected Sir, can you please make a video on extreme learning machine. Your videos are really informative and helpful.!

@jjaannnniikk 2 жыл бұрын

Great Video! Is there any updated versions based on tf 2? Thanks

@cyrildigrandi9 6 жыл бұрын

Super vidéo Aurelien Merci !

@vladimiriurcovschi1657 6 жыл бұрын

Nice job! Thank you !

@EsdrasSoutoCosta 6 жыл бұрын

Awesome video, thanks. In the video you cite that this is a shallow architecture, but how would be a deep?

@AurelienGeron 6 жыл бұрын

Hi Esdras, you would just stack as many hidden capsule layers as you want on top of the primary capsule layer, before the final digit layer. These hidden capsule layers would be implemented exactly like the digit layer, using the routing by agreement algorithm. Hope this helps.

@EsdrasSoutoCosta 6 жыл бұрын

Thank you very much. Helped a lot. I'm studying Capsules Network for my Master degree.

@neumdeneuer1890 5 жыл бұрын

@@AurelienGeron Hi, short follow up question: I guess since the stacked hidden capsule layers are fully connected the complexity would explode quite quickly making the architecture unusable (do you agree ?). Is there some proposed way to only connect certain capsules in the hidden layers e.g. only the ones at the same grid positions like in CNNs ? Thanks in advance for your engagement in the comments.

@AurelienGeron 5 жыл бұрын

@@neumdeneuer1890 excellent question, you do not have fully connected layers, it could indeed be partial, as you suggest. I recommend you check out the latest paper which makes some significant changes to the architecture (but the main ideas are still there), in particular partial connectivity. See the link in the video description. Hope this helps!

@PhongNguyen-zz1ei 6 жыл бұрын

Hello guys, I have a quick question ? Why the author choose the length of the second capsule layer is 16 ? I get the idea behind the multiplication each capsule containing 8 direction values in the output of the 1st capsule layer with the weight vector. But what i dont understand is there any specific reason to choose the number 16 or am I misunderstanding something here ?

@AurelienGeron 6 жыл бұрын

Hi Phong, great question. You could very well choose another number, like 10, 20 or 30. The tradeoff is this: you need a sufficient number of dimensions to hold all the pose information of the digits. That number should probably be greater than 8, since there's more information in the pose of a full digit than in the pose of one of its parts. However, you probably don't want too many dimensions either, because this would add a lot of parameters in your model, with the risk of overfitting the training set and adding too much computation time. So I suppose the authors experimented with various values and found that 16 performed well. I hope this helps.

@animeshkarnewar3 6 жыл бұрын

It's like we have 16 dimensions along which certain information is encoded. For example, the video mentioned that by changing certain values across some dimension of this vector, we can change the thickness of the digit reconstructed. What would be really interesting is somehow if we could interpret these dimensions, we would be able to generate the digit the way we want. Here is a small similar architecture that I built. -> medium.com/mlreview/aann-absolute-artificial-neural-network-ae8f1a65fa67 This might be of help.

@4yt158 6 жыл бұрын

Love your book! Worth the money!! Could you also please make videos on Deep Reinforcement learning? :)

@AurelienGeron 6 жыл бұрын

Thanks Vamsi! I wish I could clone myself, I have so many things I want to do. Sure, I'd love to do a video on Deep RL. Would you like a high-level overview (like my Capsule Networks video) or a video focused on the implementation (like this video)?

@4yt158 6 жыл бұрын

Thank you Aurelien for taking the time to make videos for educating us! :) I know you get a lot of requests like mine. It would be great if you could please do a video on both the high level and keras implementation on Deep RL. Your video on capusule networks is amazing. I listened to Prof. Hinton's video on the same topic but understood nothing! Also, another big topic missing on youtube or books on Amazon is good resources on energy based methods. Most technical papers by Bengio, Hinton,etc are so mathematical that many students can't understand what's going on. Do you have any plans to write a book or make videos on those topics? Thank you! :)

@AurelienGeron 6 жыл бұрын

Sounds good, I'm adding this idea to my list to videos to do. I can't commit to a deadline, however, because my agenda is crazily packed these days, but I'll do my best. In the mean time, since you have my book, check out chapter 16: it presents Deep Reinforcement Learning, in particular Policy Gradients and Deep Q-Learning. You can also look at the Jupyter notebook for that chapter here: github.com/ageron/handson-ml/blob/master/16_reinforcement_learning.ipynb There's the code to train an agent to play Ms-PacMan based only on the raw pixels. Hope it helps!

@sameeraramasinghe693 6 жыл бұрын

Excellent EXCELLENT!!! thanks a lot

@sunderrajan6172 6 жыл бұрын

Well explained. Thanks

@luisvalesilva8931 6 жыл бұрын

Brilliant video!

@animeshkarnewar3 6 жыл бұрын

I have a way to get around the ugly-code problem. I usually use the variable_scopes and the weight_reuse parameter inside the functional interfaces that I create for the tensorflow computation graphs. So this way, there is no need to pass empty arrays to the placeholders that are not required.

@mouduge 6 жыл бұрын

Animesh Karnewar Interesting. Could you please submit a Pull Request for the notebook?

@011azr 6 жыл бұрын

OMG, you're French and you speak both French and English very clearly. That's a talent! LOL xD. Anyway, thanks for your video, they're the best. :)))).

@animeshkarnewar3 6 жыл бұрын

Indeed! I think what he speaks best is the language of 'Deep Learning' :), :p :D.

@bingeltube 6 жыл бұрын

Very recommendable

@animeshkarnewar3 6 жыл бұрын

I noticed that you have set the random seed_value = 42. Does it have to do anything with "The Hitchhiker's Guide to the Galaxy"? lol :D.

@AurelienGeron 6 жыл бұрын

Animesh Karnewar , absolutely! :))

@animeshkarnewar3 6 жыл бұрын

Aurélien Géron haha 👍

@dbp_patel_1994 6 жыл бұрын

Such great Easter Eggs. :)

@dhruvnil8745 6 жыл бұрын

If you read the sklearn source, you will find that they too have left this easter egg everywhere.

@Gunth0r 6 жыл бұрын

and what about the poppies? I get the capsule reference, but still, why poppies? :p

@dpdove16 5 жыл бұрын

There is slight error in the margin loss formula at 21:55. The sign before the lambda should be a + (plus) and not a - (minus). We could make the parameter negative to rectify. Thank you for the awesome video though.

@AurelienGeron 5 жыл бұрын

Thanks Dhawal, I appreciate your feedback. Someone mentioned this error some time ago, so I added it to the list of errata in the video description. I wish I could add a note directly within the video, but KZbin removed this feature (or perhaps they restricted it to people with more subscribers, I'm not sure). Cheers!

@NithirojTripatarasit 6 жыл бұрын

Thank you!

@kennethchong4721 5 жыл бұрын

Thank you for creating this wonderful tutorial! :) I've learnt a lot from your videos on capsule networks and your book! However, could you explain how primary capsules being fully connected to digit capsules allows me to use tf.reshape(conv2, [-1, caps1_n_caps, caps1_n_dims]) to group 8 scalars across 8 feature maps (i.e. 1 scalar from each feature map) together to form a 8D primary capsule? My understanding is that tf.reshape reshapes numpy arrays by going through numbers of the numpy array in the order of columns -> rows -> depth, whereas using tf.reshape as you did goes through scalars of the feature maps in the order of depth (across activation maps) -> columns -> rows

@AurelienGeron 5 жыл бұрын

Hi Kenneth, thanks for your kind words and your question. The output of the second convolutional layer is [batch_size, height, width, channels], so we could reshape it to: [batch_size, height, width, caps1_n_caps_per_position, caps1_n_dims], and this would just reshape the last dimension. For example, if batch_size=10, height=6, width=6, channels=256, caps1_n_caps_per_position=32, caps1_n_dims=8, each position in each feature map would start with 256 scalars and would end up with 32 vectors of 8 dimensions each. This would be the output of the primary capsules, represented as a 5D array. Now if we want these primary capsule outputs to be fully connected to the digit capsules, we don't need to preserve the horizontal & vertical dimensions (i.e., the 6x6 shape), we can just reshape these outputs to [batch_size, height*width*caps1_n_caps_per_position, caps1_n_dims], which in this example is [batch_size, 1152, 8]. Similarly, when you want to train a dense network to classify MNIST, you start by reshaping the inputs from [batch_size, 28, 28] to [batch_size, 28*28], you don't need the location information. In the diagrams I preserved the 6x6 representation, to make it easier to understand what each arrow corresponds to, but the digit capsules actually just get a long list of 8D vectors as input. I hope this helps.

@kennethchong4721 5 жыл бұрын

@@AurelienGeron Hi, thank you for replying to my question! :) I think I understand what's happening better now. To clarify if I understood it correctly: When forming primary capsules, it doesn't matter whether scalars are taken from one feature map or across different feature maps. A primary capsule in this case is simply an 8D vector of feature map scalars.

@AurelienGeron 5 жыл бұрын

@@kennethchong4721 my pleasure! Each 8D primary capsule output must be taken from scalars across feature maps, not within the same feature map. That's what the reshape operation does: try reshaping arr = np.arange(2×3×3×6).reshape(2,4,4,6), like this: arr.reshape(2, 4×4×2, 3). Notice what happens to the last dimension: it just gets split in 2. Hope this helps.

@kennethchong4721 5 жыл бұрын

@@AurelienGeron Ah yes I get it now with the example you provided! but I think there's just a small typo where it should be np.arrange(2*4*4*6) :) Can I also clarify just 2 last questions I have about capsule networks after watching the video: Does the orientation of the 8D vector directly represent the orientation of a feature the primary capsule detects, or is it just an abstract idea to represent the values within the vector? Does training only take place between the primary capsule layer and the digit capsule layer through weight matrix W? Thank you so much for your help nonetheless! :)

@MrDarkech 5 жыл бұрын

at 7:07, I think green circle must point to vector number 1 of primary capsule number 2, not vector number 2 of primary capsule number 1 or did I misunderstand something? Btw thank you for the great video!

@kamalkamal-ms8re 6 жыл бұрын

great video sir. i have some issues like understanding the margin loss. The paper uses a special margin loss to make it possible to detect two or more different digits in each image. now if i want to implement the network in some other kind of image classifier which have only one image and they classify either the image is positive or negative means 0 or 1 then do i need the margin loss as there is only one image and if i need the margin loss what would be T ? is it the labels 0 and 1?

@harikrishnanrajeev1432 6 жыл бұрын

Great Video. Thank you. Is there a place to know whats the latest happening with Capsule N/w ?. Thanks.

@AurelienGeron 5 жыл бұрын

Thanks! And sorry for the late response, I just saw your comment. You can check out: scholar.google.com/scholar?q=capsule+networks or principlesofdeeplearning.com/index.php/resources/list-of-research-papers-on-capsule-networks/

@webentwicklerappentwickler3307 5 жыл бұрын

Hey, veeeerrryyyy nice video but I think the margin loss term is written with a + bevor the lambda not a -

@AurelienGeron 5 жыл бұрын

Great catch, thanks! I added an erratum in the video description (at least the code to compute the loss is correct, at 23:42).

@yavuzkahraman4368 4 жыл бұрын

!!!videos have great explanation. I wonder how these works on high resolution images (i.e. 1024*1024). could you please give some examples in simple way. Thanks in advance

@liusvelsocarrassanchez9487 6 жыл бұрын

Very good work Géron i am looking forward for your next video now i have a question : how i can design a net with capsules for the recognition of images with more than one mnist digits ??? could you bring some light i am a starter, thanks again ;)

@AurelienGeron 6 жыл бұрын

Liusvel Socarrás Sánchez thanks, I'm glad you found the video useful. You can use the exact same model to detect multiple digits. All you need to do is use a training set containing images with overlapping digits, and each label should be a vector of size 10, with 1s for each digit that is present, and 0s elsewhere. Apart from that, all the code will be identical. Hope this helps!

@omeryalcn5797 6 жыл бұрын

may be I was wrong , sorry about that. I saw firstly , I suppose this resemble feature detection works , I mean SIFT algorithm , I mean descriptors etc . Because first feature detectors is bad , when image was transformed or rescaled . then SIFT algorithm came , it solved this problem. Hopefully I can understand what I mean.

@AurelienGeron 6 жыл бұрын

Hi Ömer, thanks for your comment, it's really interesting, I never realized it but I agree with you that there are resemblances between SIFT and CapsNets: both take low level features and look for agreement between these low-level features regarding the presence and pose of higher level objects, using a clustering algorithm to identify agreement. One important difference is that a CapsNet *learns* the low-level features using convolutional layers trained with backpropagation, whereas SIFT extracts them from a previously built database of low-level features. Moreover, I don't think SIFT supports multiple levels, I think it goes directly from the low-level features to objects (although I'm not entirely sure, you may know better). SIFT also only looks at the location, scale and orientation of each low-level feature, whereas a CapsNet can learn other kinds of "pose" parameters, such as the thickness, skew, etc. If you look at the latest CapsNet paper (the link is in the video description), it also organizes the Capsules much like in a Convolutional Layer, which makes it exploit the local and hierarchical structures present in most images. Anyway, excellent point, thanks!

@omeryalcn5797 6 жыл бұрын

Thank you your answer. I agree your words about difference SIFT and CAPs Net. I intend only keypoint descriptor part. I think they are robust transformation .You are .I was really happy as someone who read your machine learning book and learned a lot from it. I would love to share your experiences with you if you have the time.

@AurelienGeron 6 жыл бұрын

Sure, you can connect on LinkedIn.

@tbelkas 5 жыл бұрын

Hello, mr Géron, this is a great video, it really did help me understand a lot of the math behind the capsule neural networks. I noticed that you recently answered a question here and I was wondering whether you could help me out a bit. The problem is that when I run the provided code, the accuracy is horrible (in the 9-10% mark, loss is at 1.5~). Even after 5-6 epochs it doesn't get better. I would be really grateful if you could explain why does this happen

@asishkumarmishra2597 2 жыл бұрын

Well I am a noob here. But, I think the capsule networks aren't that fast at learning! Most of the implementations, I saw, use epochs in order of 100 like 100 epochs, 500 epochs et cetera. So, I think running 5-6 epochs would be very less for the network to learn. Hope it helps!!

@seanjhardy 5 жыл бұрын

Hello, this might seems like an extremely naive question as I am an extreme beginner in this topic, hoping to implement a similar algorithm like this in Java without copying any existing library so that I can test some different ideas. I would really love an explanation of how the initial feature maps from the primary capsule contain information about rotation. Surely you would need to apply the same capsule over the area multiple times with varying rotation in order to determine the angle of an object but obviously, this is just inefficient. When you described capsule networks in your Capsule networks tutorial video, you implied that the rotation of the object is encoded as one of these dimensional vectors which are used to predict what the object is, but I don't see how the network could use the same convolution to detect different rotations of an object. Sorry if this question is a little basic/I am not understanding correctly, an answer would be greatly appreciated!

@AurelienGeron 5 жыл бұрын

Hi Sean, That's a great question! Indeed, a convolutional layer will not learn a rotation unless you show it some examples of rotated images. In practice, many filters will be dedicated to detecting different rotations of the same shape. However, the capsule layers can take these multiple outputs and (perhaps) encode them into a single dimension representing the rotation angle. It seems that Capsnets are better than many other architectures at learning with few examples.

@seanjhardy 5 жыл бұрын

@@AurelienGeron thank you! I find it incredibly interesting that this architecture can encapsulate such high level information and integrate multiple streams into a consise representation of reality. Thank you for taking time to respond!

@juliogodel 6 жыл бұрын

Great.. Thanks!

@pedrohenriquefariateixeira5861 4 жыл бұрын

Can I train other dataset based on this capsnet implementation? I trying to implementing a capsnet for the BreakHis dataset. If you are reading this and can help me with some links, thanks.

@kislaykunal8921 6 жыл бұрын

hi, we select 2 conv layers, what i don't understand is based on what we are choosing these hyperparameters, i know it is more geussing and checking but what would be your suggestion if I'm trying to train a deep capsule network on 1048*1048 images for object detection with yolo labels at the end. Should I increase the no. of conv layers or should I just increase the no. of kernels. I am having this doubt because minute feature means better prediction by dynamic routing(if I understand capsule correctly) but at the cost of compute I'm trying to find a sweet spot here.

@whyzzy2683 4 жыл бұрын

Hello. I have questions. In normal situation, when we do inference input could be just single image. But in capsnet do we need to put many images to model? because of convergence of coefficient?

@edeneden97 5 жыл бұрын

do the capsules share the prediction matrices weights with nearby capsules?

@AurelienGeron 5 жыл бұрын

Hi Eden, great question! No, in the original paper described in this video, nearby capsules do not share weights (except for the primary capsules, since they are mostly just regular convolutional layers). Hope this helps.

@edeneden97 5 жыл бұрын

@@AurelienGeron thanks! A read the paper a long time ago and remembered some sort of sharing weights. Great video!

@AurelienGeron 5 жыл бұрын

@@edeneden97 Thanks, I'm glad you enjoyed the video! :)

@midhulavijayan5976 6 жыл бұрын

Great explanation. Thank you sir. I have one doubt. Is it possible to apply capsule net for large scale inputs ?.

@AurelienGeron 6 жыл бұрын

It is possible, but so far CapsNets are slower than Convolutional Neural Nets, and they also don't achieve the same accuracy on large images (e.g., ImageNet). But there's rapid progress, so this may change in the coming months. Check out the authors' latest paper (link in the video description). Hope this helps!

@sichenggu5806 5 жыл бұрын

Hi! I have used your notebook to repeat this CapsNet and the val_acc is only about 10%, that's weird. I wonder why this happened...?

@animeshkarnewar3 6 жыл бұрын

Awesome video! The notebook is very helpful. Although I have a small question: In the squash function, doesn't the unit vector itself ensure that the values are in the range of 0 - 1, while preserving the direction of the vector. Could you please explain what is the need of the squash factor?

@mouduge 6 жыл бұрын

Animesh Karnewar The unit vector has a length of exactly 1. We want to make sure vectors longer than 1 gets squashed down to the 0-1 range, but we don't want to grow small vectors up to 1.

@animeshkarnewar3 6 жыл бұрын

Aurélien Geron Oh. Now I got it. So, we are converting all the vectors proportionately in the Range [0 - 1) while preserving the directions. Yes, it makes sense. TYSM for the explanation. Cheers!

@animeshkarnewar3 6 жыл бұрын

Yeah. Because of the squashing factor, very long vectors (length >> 1) will end up being close to 1, while vectors that have a length less than 1 will get squashed down towards zero length vectors thereby nullifying their effect (as if the vectors weren't there). This would suppress all the non significant vectors and keep the contribution from the important ones and since this ensures that range stays within 0-1, this function probably has the effect of ReLU followed by Batch Normalization. Just sharing some of my thoughts after understanding this concept.

@sagnikbhattacharya1202 6 жыл бұрын

J'aime vos vidéos! Ferez-vous des vidéos en français aussi? Ça serait très utile pour moi.

@AurelienGeron 6 жыл бұрын

Sagnik Bhattacharya, merci! J'y pense, je vais sans doute simplement traduire les vidéos anglaises en francais, et les envoyer sur une autre chaîne.

@bryan3792 6 жыл бұрын

I know it gonna be good before I see the viedo

@animeshkarnewar3 6 жыл бұрын

I was just wondering; the process of routing primary_caps layer's outputs to the digit_caps layer is iterative. The iterations are required to change the log_prior weights (which are always 0s initially ensuring equal weight.) in accordance to the output of the digit_caps layer (after squashing). So, basically, the purpose of this routing is to generate the routing_coefficients that would maximize the scalar product of the digit_caps output vectors and the predicted_vectors by the primary_caps layer. Can't we just use another parallel fully connected layer which would predict the routing_coefficients from the predicted_outputs such that, this parallel fully connected layer would be optimized by constructing appropriate objective function (in our case: maximize the scalar product)? This thought is mostly inspired by the concept of Synthetic Gradients. It would be a great help if you could shed some of your insights over this thought.

@AurelienGeron 6 жыл бұрын

Animesh Karnewar interesting idea, you could definitely try! This is a new domain, it's likely that there are small tweaks to be found that will strongly affect the performance of the network, or it's training time.

@RAJATTHEPAGAL 6 жыл бұрын

Awesome :-D ...

@richardmaddison8196 5 жыл бұрын

Hi Aurélien, I loved this video and used it extensively to build an MNIST Capsule Net in Microsoft Excel. I finished this a couple of weeks ago and wrote it up here www.RichardMaddison.com I also just bought your book "Hands-On Machine Learning with Scikit-Learn and Tensor Flow" and look forward to reading through this immensely. Thank you so much for your hard work and the inspiration you have given. Richard

@usmabhatt1768 6 жыл бұрын

Great! I have two questions Do I need reconstruction code if I have to do binary classification? Do it works the same way if I use 3D Convolution layer?

@AurelienGeron 6 жыл бұрын

Thanks Usma! The reconstruction part of the CapsNet (i.e., the decoder) acts as a regularizer: it encourages the CapsNet to learn a high level representation of the whole image, rather than base its classification decisions on just a few pixel values. Thanks to the decoder, the CapsNet is less likely to overfit the training data, and it will generalize better to new instances. And yes, you can build a CapsNet in just the same way using 3D Conv Layers, or even 1D Conv Layers: it's a pretty general concept. However, make sure to check out the latest paper by the same authors: "Matrix Capsules using EM Routing". It introduces a lot of significant changes to the CapsNet architecture (while keeping the same core principles, of course). Hope this helps.

@usmabhatt1768 6 жыл бұрын

Aurélien Géron Thank you sir!

@akashjadhav5093 6 жыл бұрын

Thanks for such a wonderful explanation!! one question though. How can I use this architecture on dataset other than cifar and MNIST? or what precautions are necessary?

@asyeerajat2951 4 жыл бұрын

Hello. I'm new to python and deep learning. I have the same problem here. How to use my own image dataset? Did you already have the solution? It is wonderful if you could share it. Thank you.

@habibakhaled9771 2 жыл бұрын

@@asyeerajat2951 Hello, I am currently facing the same problem. Have you reached a solution? Thank you in advance.

@wolfisraging 6 жыл бұрын

Plz make more videos

@marat61 6 жыл бұрын

Where did you use special training algorithm? How does adam gradient based method trai capsule net?

@AurelienGeron 5 жыл бұрын

Many things are special in this model, but training is just as usual, using backpropagation. Let me make an analogy: an RNN implements a loop over the inputs, in pseudo-code: state=0 for input in inputs: state = compute_new_state(input, state) return compute_output(state) Yet, backpropagation works fine, because this loop is actually "unrolled" internally (either statically before even seing the data, which requires knowing in advance the length of the inputs, or dynamically when handling one training batch): state=0 state = compute_new_state(inputs[0], state) state = compute_new_state(inputs[1], state) state = compute_new_state(inputs[2], state) state = compute_new_state(inputs[3], state) return compute_output(state) So the fact that the model contains a loop is not a problem for backpropagation. Back to capsule networks: the routing by agreement algorithm includes a loop, but that's not a problem, it can just be unrolled. Hope this helps

@AmanKhandelia 6 жыл бұрын

Hi there, I have a question regarding the weight matrix (before I continue I wish to state I want to implement this all by myself and learn in the process of doing so, I am basically squinting at your videos to get just enough to get the idea, hence my question could be plainly incorrect, if so, do correct me), it is mentioned on page 4, first para, "each capsule in the [6 x 6] grid is sharing their weights with each other", based on this quote, shouldn't we should only have 32 set W_i_j, where each W_i_j is replicated 36 (6*6) times for each grid? P.S.: I am a novice at this so at I may be just outright wrong about it, if so please do correct me, any help is deeply appreciated.

@AurelienGeron 6 жыл бұрын

Hi Aman, the primary capsules are basically a couple Convolutional Layers whose outputs are reshaped and squashed. All neurons in a given feature map of a convolutional layer share their weights. So if you use a standard convolutional layer, the weight sharing they talk about in the paper (i.e., weights between the inputs and the primary capsules) will be taken care of automatically, no need to worry about it. The weights you are refering to (W_ij) are not the same: these are the weights between the primary capsules and the digit capsules. Those weights are not shared. At least not in this implementation (but check out the new paper, "Matrix capsules with EM routing", the link is in the video description, they do share these weights in this case). Hope this helps!

@AmanKhandelia 6 жыл бұрын

Got it, thanks for the info, I stand corrected. One last thing before I sign off, in the squashing operation, there are two parts to it, one where we normalise our vector, i.e., s_j/||s_j||, which basically is a unit vector with the same orientation, but as for the other part, I am unable to arrive at an interpretation, after doing a bit of math, I turned it into 1 - (1/ (1 + ||s||^2) ), now how can I interpret this formulation. What does the magnitude of input vector s_j indicate, and is there a nice meaning to this equation which I am missing

@AurelienGeron 6 жыл бұрын

You are right, s_j / ||s_j|| is a unit vector with the same orientation. The other part does not have a nice interpretation other than "large vectors will end up with a length close to 1, and small vectors will end up with a length close to 0". This was one of the reasons why in their latest paper they decided to encode the probability of presence differently (as a separate output). :)

@ritap_CV 5 жыл бұрын

Can you do the same for pytorch????

@aryanmobiny7340 6 жыл бұрын

Thanks for the nice code and explanation. I had a question regarding the capsnet. Would it be possible to add layers (like a conv-caps layer after the primary caps layer, or a fully connected caps layer with 20 capsules befor the digit caps layer?) I tried it myself and am getting terrible results! I can't understand why is it happening. Do you have any idea?

@AurelienGeron 6 жыл бұрын

Thanks Aryan. I haven't tried adding extra layers yet, but I'm a bit surprised that you're getting terrible results. Perhaps it's just that you end up with so many parameters that the amount of training data becomes insufficient. In their latest paper ("Matrix capsules with EM routing") the authors introduce some improvements that dramatically reduce the number of parameters per layer, so I'm guessing it will help. As soon as I have time, I'll try the new architecture.

@aryanmobiny7340 6 жыл бұрын

Thanks for your reply. I can't wait to see your videos on EM routing ;)

@MultiRoommate 6 жыл бұрын

Pls help me to understand this, the weights in dynamic routing agreement adjusted based on the vector results from digit capsule.but how the filters(256) applied in convolution layer would change its value? Or any logic behind in it. Many thanks!

@AurelienGeron 6 жыл бұрын

Hi! The weights of the convolutional layers are trained using regular backpropagation (and so are the weight matrices W_ij in the capsules). Basically, at each training iteration, a batch of images is fed to the input layer, it goes through the convolutional layers, and up through the capsule layers (including through the routing by agreement algorithm), and finally we get the outputs, and the loss is computed (based on both the digit capsule's outputs and on the reconstruction loss). Then backpropagation starts: the gradient of the loss is backpropagated through the whole network and all the model parameters get updated (this includes the W_ij matrices and the convolutional layer parameters, but it does *not* include the routing by agreement weights, which are just computed on the fly when needed, but they are not "stateful" model parameters). At inference time, the first forward phase happens in exactly the same way (including the routing by agreement iterations), but once we have the outputs, we're done (we don't compute the loss and we don't do any backpropagation). I hope this helps.

@MultiRoommate 6 жыл бұрын

Got it. Many thanks!

@JonasDonald 6 жыл бұрын

Do you have any results on how the network performed on images in which there are multiple different numbers?

@AurelienGeron 5 жыл бұрын

Oops, just saw your comment, sorry. The paper contains many results on the MultiMNIST dataset with two overlapping digits. The baseline models (not CapsNets) got 8% error, while the CapsNet got 5.2% error. Hope this helps.

@rohitdulam8762 6 жыл бұрын

Nice video. I have a small doubt. In the training loop, there is a statement y = y_batch. But, y is of shape [None] and y_batch of shape [batch_size, 10]. Is that kind of an assignment statement valid? I've tried it and got an error. If I've missed something, please correct me.

@rohitdulam8762 6 жыл бұрын

Sorry, it was my mistake. While reading the dataset, I set one_hot to true.

@AurelienGeron 6 жыл бұрын

Thanks Rohit. In the Jupyter notebook, `y_batch` is obtained by `X_batch, y_batch = mnist.train.next_batch()` which returns a one-dimensional array for the targets, so everything's fine. Perhaps you loaded MNIST with `one_hot=True`, or using Keras? My code definitely assumes that `y` is one-dimensional. Make sure you load MNIST just like in the notebook. Hope this helps.

@rohitdulam8762 6 жыл бұрын

Yes, it was my mistake, I put one_hot to true while loading the dataset. Thank you for taking time and replying, keep up the good work. 😀

@kislaykunal8921 6 жыл бұрын

I dont get what he did at 9:46. can anyone explain??

@kislaykunal8921 6 жыл бұрын

I don't get the additional dimension thing for the batch size

@kislaykunal8921 6 жыл бұрын

Is it because he is trying to parallelize the operations?

@AurelienGeron 6 жыл бұрын

Hi Kislay, when you call tf.matmul(A,B), the shapes of A and B must be compatible. For example, if A's shape is [5,6,8,3,2], it is best to think of it as a 5x6x8 array of matrices (specifically, a 5x6x8 array where each cell is a 3x2 matrix). In this case, B should also be a 5x6x8 array of matrices (e.g., 2x9 matrices), and tf.matmul(A,B) will multiply each matrix in A with the corresponding matrix in B. As you point out, this will all happen in parallel on the GPU so it will be much faster than computing all these multiplications one by one. More generally, if A's shape is [a,b,...,c,d,E,F] then B's shape must be [a,b,...,c,d,F,G] and the result's shape will be [a,b,...,c,d,E,G]. This is because the product of an ExF matrix and an FxG matrix is an ExG matrix. So what this part of the code is doing is just making sure the arrays have compatible shapes, adding dimensions where needed (by copying the data using tf.tile()), so we can then efficiently compute many matrix multiplications in one shot. Note that even though this is much more efficient than doing all the matrix multiplications sequentially, Thibault Neveu kindly suggested (see his comment on this video) an even faster way to do this using tf.einsum(), without having to copy data with tf.tile(). Hope this helps!

@kislaykunal8921 6 жыл бұрын

Thanks alot for the explanation but what i am unable to understand is the part where u make one copy per instance of the batch .Also what do u mean by batch instance is it the mini batches? because if so then how can u forward propagate the whole thing at once. Again Thanks!

@kislaykunal8921 6 жыл бұрын

Sorry to bother you but do u mean image in the mini batch by single batch instance.

@wolfisraging 6 жыл бұрын

Why don't u create more videos???????????????

@AurelienGeron 6 жыл бұрын

Rishik Mourya I would love to, I'm just not finding the time right now. But I'll do my best!

@wolfisraging 6 жыл бұрын

Aurélien Géron, hope to see u soon.