For details and code on building a translator using a transformer neural network, check out my playlist "Transformers from scratch": kzbin.info/www/bejne/h3Stgnpqedp7ipI
@from-chimp-to-champ12 жыл бұрын
The first time that i see how someone "unrolled" the lstm network and actually demonstrated it. This could not do any professor that i saw. They only showed the picture of cells anybody could find on the internet. Thank you very much, good job!
@AzharKhan-to2ll2 жыл бұрын
I learnt about LSTMs from so many sources; but no one explained it this well. This is some amazing content you are creating. It should be preserved.
@CodeEmporium2 жыл бұрын
Thank you:)
@kennethatz395 жыл бұрын
Programmed my first LSTM with that video. Really good introduction to this topic. Right amount of math, architecture, background (GRU, RNN etc) and coding.
@simoneparvizi7752 жыл бұрын
WTF did i just watched....man in 15 min video you explained so many topics and smoothly.....I'm new to AI but what you did was impressive. You explainaid the meaning of everything while simplifieng math concepts BUT STILL putting them in....thank you for your work, I really appreciate it
@CodeEmporium2 жыл бұрын
I'm super glad you appreciate this style. I'm trying to make more videos like this as of later too. :)
@sepehr_fard5 жыл бұрын
I’m not done with watching it. But I had to leave a comment first. I think you have really cracked the way to make people understand. Not a single professor has ever taught the way you have unfortunately. I have always wanted someone whose teaching to start from ground up and just explain everything before diving into math. You even explained what os and random was that’s never seen before in cs 😂. So thank you I really enjoy your video it is great for once someone understood a student who might be watching this does not have a phd in mathematics so explaining what each variable means and what the big picture is might save them hours of being lost and confused. Keep up the great work man I really appreciate this channel now it’s a hidden gem!!!!
@LifeKiT-i Жыл бұрын
i study MSc computer science at HKU, but your teaching is much better than my professor. OMG
@AlistairWalsh4 жыл бұрын
Really like your conversational explanations. Great detail presented in a palatable manner.
@hoangphamviet12412 жыл бұрын
Great video!!! Everything I can see and understand from the video make compelling sense for me. Thank you so much!!
@CodeEmporium2 жыл бұрын
You are very welcome (sorry I am so late)
@anibhatia4 жыл бұрын
Amazing video.. I love how you have explained a number of different concepts and explained each one with due integrity
@loopuleasa6 жыл бұрын
pretty in-depth view on this I like your pacing better than Siraj, also the simplicity
@CodeEmporium6 жыл бұрын
Thanks a lot! I'm going for a "here is why we do things the way we do" approach. Glad that you (and many others) find it interesting.
@beingnothing344 жыл бұрын
This man is a different beast! Way better and hence shouldn't be compared to Siraj! :) Great video.
@farenhite43294 жыл бұрын
Arun Kumar the scandal showed why Siraj was so much worse at explaining than this guy.
@ifmondayhadaface94904 жыл бұрын
Farenhite oof yeah
@SwapnilGusani4 жыл бұрын
@@beingnothing34 Dude Siraj was a fraud.
@ChaminduWeerasinghe3 жыл бұрын
Your explanation is amazing. Love the way you joking and that makes the video more interesting❤️
@happyduck702 жыл бұрын
Had a mighty laugh on the Sepp Hochreiter joke, thanks!
@NeilWiddowson3 жыл бұрын
This makes so much more sense than my lecture...
@sooryaprakash63902 жыл бұрын
Mind-blowing Video!. Thanks for making it.
@CodeEmporium2 жыл бұрын
Anytime :)
@h4ck3146 жыл бұрын
I like the quality of your content, I'll definitely watch your other videos !
@CodeEmporium6 жыл бұрын
Thanks sooo much! Enjoy your stay ;)
@bopon40904 жыл бұрын
Thank you sooo much for linking references in the description.
@TheLOL98423 жыл бұрын
Gosh can't wait for that video on GRU that's coming pretty soon! Besides the joke, Thanks for the video!
@CodeEmporium3 жыл бұрын
Thanks for watching!
@98998953844 жыл бұрын
wow, your explanation is so simplistic!
@jeroenritmeester734 жыл бұрын
Thank you SO MUCH for giving some examples of each architecture. Im following multiple ML courses on uni, but everything is abstracted away behind mathematical jargon, and never gets back to basics.
@mohammadyahya78 Жыл бұрын
Thank you very much. Why the gradient explode as a function of t/d please at 7:19?
@dompatrick81142 жыл бұрын
6:37 Lmao the comedic timing, I died.
@1852834 жыл бұрын
Hello, have a question: @8:50 you mentioned x(0)... x(n) as inputs. If you had a sentence "Hello World", is a vector of "Hello" be x(0) and "World" be x(1)? If so, x(0) and x(1) will require 2 LSTM cells, and will one line of "model.add(LSTM)" have two LSTM cells to process "Hello World"? How can we visualize more than one LSTM layer then?
@thalesogoncalves14 жыл бұрын
Excelent video, dude! It's awesome when someone embraces both theoretical *and* practical parts. Thanks a lot
@CodeEmporium4 жыл бұрын
Thanks for the compliments!
@kamalmanchu30603 жыл бұрын
This is phenomenal....great explanation dude..... ❤️
@medhnhadush43202 жыл бұрын
awesome explanation. thank you
@auslei4 жыл бұрын
nice and concise.. good work buddy
@internationalenglish74135 жыл бұрын
You are very good. Someday, you will be a great professor.
@rahimdehkharghani4 жыл бұрын
I really liked this clear exlpanation.
@tylersnard5 ай бұрын
Thank you for this. What are U, V, and W at 8:44?
@captiandaasAI Жыл бұрын
Great!!!!!!!!!!!!!!!!!!!!!!! Lecture damm good explanation..
@pablovillarroel31095 жыл бұрын
Such a great video, you explain everything so clearly and at a good pace, liked and subscribed!
@DiaboloMootopia3 жыл бұрын
Great video. Is it possible that the graphic at 9:45 is mislabeled? h_t is coming out at the top right where I though o_t should be emerging.
@friedrichwilhelmhufnagel3577 Жыл бұрын
Hello! Your Link to your coursera videos is seemingly broken/expired. Can I find videos from you on coursera and can you recommend more learning material like courses and books to me? Thank you! Great videos.
@akompsupport Жыл бұрын
Good overview! Still relevant. LSTM's have come a long way, important for the dev of LLM that are showing SOTA performance on NLP as of this date no?
@kamalchapagain89655 жыл бұрын
Thanks ! Simply the best.
@adampaslawski88593 жыл бұрын
This is a great video , thanks for making it
@osci51242 жыл бұрын
Great video, you are really good at explaining logically
@ИванНикитин-ч7б3 жыл бұрын
Can't understand some points. If I have a set of temperature values or closing price of a day. Just one linear sequence. I need to forecast 3 future days values by 10 previous days values. So the question is which of values I need to put into the first LSTM cell, which values into the second cell and so on? The second question is how much LSTM cells I need for this calculations; does an LSTM cells count depend on previous days count or future days count?
@mentalmodels55 жыл бұрын
I'm confused about the part where he says "Gradient will now explode/vanish as a function of tau/d" 7:06 Can someone explain this to me?
@dfnoshamps5 жыл бұрын
If I got the right understanding, since the propagation of weigth should happen throught jumps over d units instead of directly to next one, the explosion problem should happen in a "smoother rate"
@mentalmodels55 жыл бұрын
@@dfnoshamps Thanks for the reply, what I don't get is why it should happen at a smoother rate if you just add a skip connection?
@nezbut74 жыл бұрын
this was very helpful! thank you
@jayasreechaganti93822 жыл бұрын
Sir can you do a video of Rnn example by giving numerical values
@cooky1232 жыл бұрын
Good video, thank you.
@elisimic43714 жыл бұрын
High Quality Content!
@akashkewar4 жыл бұрын
keep doing the good stuff man.
@CodeEmporium4 жыл бұрын
Thanks dude. I'm always about that good stuff.
@piotrgrzegorzek80395 жыл бұрын
Hi! just a question, does lstm predict on sequences of FEATURES in ONE SAMPLE or sequences of SAMPLES (outputs) in ONE BATCH? For eg. I need to predict next number as many to one. I fit first sample as x1=1, x2=2 and output y=3, next sample x1=4, x2=5 y=6. NOW Does the model look on sequence of features (x1,x2) or sequence of samples (y, which are output of the model)
@ethiomusic32172 жыл бұрын
how to use ctc loos function for training of variable length sequences??? can you help to me??
@threeMetreJim5 жыл бұрын
Very good and informative video, shame about how many adverts though.
@justingoh37502 жыл бұрын
Great video! My only complain is that I cannot find your video explaining GRUs like you said you would =p
@CodeEmporium2 жыл бұрын
Yea I did not do that and got caught up with some other videos later on :) My bad
@1UniverseGames3 жыл бұрын
Do you have any videos about using RNN model for cyber threat attacks, or any source to look for study it
@onomatopeia891 Жыл бұрын
Can you explain further what the hidden size argument is for in the LSTM? Many say it is the dimensionality of the output but I don't get it. The sample explanations of LSTM I saw only has 1 dimensionality so what does it mean when hidden size or number of units as some refer to is more than 1?
@beshosamir89782 жыл бұрын
Hi , i need some help here why we decide to make the next hidden state = the long memory after filter it ? why not the next hidden layer not = the long memory (Ct)
@danielpiskorski94474 жыл бұрын
Great video! Thank you
@xinyuma53584 жыл бұрын
Hi, Why we use Tanh in RNN consider it is a bad activation function? Can we use ReLu?
@shaythuramelangkovan58003 жыл бұрын
Hi Siraj, could you explain why we use a dense layer ?
@coolvideos28293 жыл бұрын
How can we predict the market using math? I believe it's possible through Fourier series and a few other views. Please help 🆘 I just don't understand how to get the wave form of the market and then calculate a point in time to predict the price. Itself sounds simple but idk what to
@CodeEmporium3 жыл бұрын
Hmm. The stock market is very hard to predict. It depends on factors that go beyond historical trends. It's a fun toy problem, but not super realistic to model. I have a video of me attempting to build a model for this too. It's one of my more recent videos
@bubblesgrappling7364 жыл бұрын
is "cell" equal to neuron? it seems to be like the case. But at 8:45, when you say that each sequence element goes through a cell each, then i am confused, is the cell really modelling the entire model?
@mikefda122 жыл бұрын
At 5:24 what is that e looking symbol called?
@CodeEmporium2 жыл бұрын
The symbol is an epsilon which means "belongs to". So x(i) belongs to a set of vectors of real numbers with D dimensions. Simply put, x(i) is a vector of real numbers with D dimensions.
@mikefda122 жыл бұрын
@@CodeEmporium thank you
@Небудьбараном-к1м4 жыл бұрын
Isn't 128 is too many for hidden size? I building an LSTM network, my input shape is [300, 5] and using hidden_size=128 results in gradient vanishing. Also, what happens if I add more layers to the dense net which comes after LSTM? Will this architecture be able to learn? Because LSTM "requires" a relatively large learning rate, which is often too large for typical FC network I am guessing that this will cause some crazy instability as a whole. I hope you could help me with these annoying questions :). Thanks a lot for sharing your knowledge!
@CCCC-lu2st4 жыл бұрын
The way he pronounced "Sepp Hocrieter" blew my brains 😅
@renosyaputra3 жыл бұрын
its actually Hochreiter :)
@ObviouslyASMR4 жыл бұрын
I'm new to AI so this might be a silly question but I thought the weights were randomly initialized, how is it possible it performed so well on the first epoch? I assumed the characters would be completely random but they make at least some semblance of words already, or is there already some learning done before the end of the first epoch? Btw thanks so much for the video! Way clearer than others I've watched
@siddheshbalshetwar38694 жыл бұрын
The prediction sentence is printed after the epoch...so yes it did learn 'something' in that epoch that's why it makes a little sense
@ObviouslyASMR4 жыл бұрын
@@siddheshbalshetwar3869 Thanks man, I think when I wrote this comment I was under the impression that the printed sentences were from during the training and before backprop, but I realize now that first of all the backprop would've probably been done in batches, and second of all that like you said the sentences are printed after the final backprop in that epoch
@siddheshbalshetwar38694 жыл бұрын
@@ObviouslyASMR yeah any time man
@lynnlo4 жыл бұрын
The weights are randomized, the goal of a Neural Network is to make a bad guess and turn it into a better one.
@harriethurricane86173 жыл бұрын
OMG definitely didn't expect to see my favorite ASMR channel here lol
@exoticme47604 жыл бұрын
is it not better to use word embeddings rather than character vectors
@TheShubham674 жыл бұрын
Really Awesome stuff
@loyodea51475 жыл бұрын
Thank you for the great video!
@hangchen2 жыл бұрын
Best part 6:32
@jhonysilver52083 жыл бұрын
Good video!
@CodeEmporium3 жыл бұрын
Much appreciated
@kostasgeorgiou24175 жыл бұрын
I love your videos, please make more!
@PopMusicFilms4 жыл бұрын
bro you are fire, i was struggling in my deep learning course and this LTSM video really helped
@yulinliu8506 жыл бұрын
Excellent lecture! Many Thanks!
@CodeEmporium6 жыл бұрын
Thanks for watching!
@ApriliyanusPratama6 жыл бұрын
excellent explanation. can you show me where i can get full math derivation of backward pass of lstm?
@CodeEmporium6 жыл бұрын
Thanks! A quick google search takes me here: arunmallya.github.io/writeups/nn/lstm/index.html#/ It seems good.
@wiebetje004 жыл бұрын
You cut the text into semi-redundant sequences of maxlen characters, but how does the model or performance change if you change the value of maxlen?
@samfelton50093 жыл бұрын
what's your source for the images throughout this video? I'd love to use them in my own work!
@LightFykki6 жыл бұрын
Amazing video, thanks!
@CodeEmporium6 жыл бұрын
Thanks for watching!
@MrMrjacky74 жыл бұрын
Hi! I have some sequences generated from some initial conditions, what model should I use to have a sequence generated from some initial condition based on the data I have? seq2seq models usually predict the following data of a series but don't generate sequence from initial conditions.
@shubhamdotdkhema4 жыл бұрын
You should probably try open AI GPT-2....it will generate sentences for u given an initial context (or even a single word).
@dubey_ji4 жыл бұрын
thank you so much !
@nathanaelsatrianugraha33812 жыл бұрын
Hello i'm new here, i want to ask, how do we know the value of Wi, Wf, Wo, Wc? Is it randomize? Thank you, BTW nice video
@rainfeedermusic4 жыл бұрын
I liked the explanation but unfortunately could not understand why exploding gradients is more of a problem in RNN rather than a DNN. I mean the W that gets propagated from h(t-1) to h(t) can also be in such a way that when one W is >1 the next could be
@zd6764 жыл бұрын
In DNN Ws can be different from layer to layer, so W in layer 1 is 0. In RNN, weights get shared, so if W>1 or W
@Firewalker1245 жыл бұрын
Got a specific question: I am currently trying to classify motion in a 3d-animation. So basically I get a bunch of 3d-vectors that i am trying to get in relation over time. More specifically I want to check if the movement of the bones and joints are too fast. So my thought was to use lstm to check that. I would use the 3d-vectors for each frame as an input in a lstm-cell. Yet i am not quite sure how to set each cell, each frame in relation to the next one. Any tipps? :D
@soareverix2 жыл бұрын
This is a really interesting problem I'm interested in as well, for VR purposes! Did you ever solve it?
@Firewalker1242 жыл бұрын
@@soareverix well it was a topic for a possible master thesis for myself, i thought a bit about it, but changed the topic due to some otver hardware related problems. However, i had an idea on how to enter all necessary information into the lstm that could work. But im currently still working, so maybe ill write back later with the idea. In my case it wasnt vr but motion capturing of movements
@joshualee31724 жыл бұрын
what are the dimensions of the weights?
@yangwang96884 жыл бұрын
Max length of the sentence is 40, but why set LSTM units to 128? What is the output size of LSTM?
@robinmuller24022 жыл бұрын
yeah we have no question marks in german 3:32
@swathykrishna96184 жыл бұрын
Good explanation. Can u do one video on Xception model?Plz
@CodeEmporium4 жыл бұрын
Thanks! I have already done an Xception explanation. Check out my video on "Depthwise Separable Convolution - explained"
@huangbinapple4 жыл бұрын
Starts at 9:00
@rabinthapa14314 жыл бұрын
bro can u make a video on implementing Convolutions and LSTMs
@cliccme5 жыл бұрын
Hi, I have one question regarding BiLSTM neural network. Should i ask here or on your Quora profile? Thanks
@CodeEmporium5 жыл бұрын
Wherever you want :)
@karthiksrini71785 жыл бұрын
the presentation is at its best. What software are u using?
@CodeEmporium5 жыл бұрын
Thanks for the compliments Karthik. I use Camtasia Studio for editing my videos
@gauravkumar65345 жыл бұрын
hi, your video was nice and i request you to make video on LSTM for speech recognization please.
@charlieangkor86493 жыл бұрын
I don’t understand it. Suddenly a pic full of math symbols pops up it’s not labeled what are inputs outputs neurons connections weights
@ethiomusic32172 жыл бұрын
good videos, but i have some questions please
@georgebarnett1215 жыл бұрын
Don't BatchNorms and He Initilaization fix Vanishing/Exploding Gradients? ResNet actually fixed model degredation, where deeper models perform worse than smaller models. Deeper networks should learn identity connections if an optimal model has smaller models. The ResNet shortcut connection allows easy learning of mappings similar to identity mappings. How does affect LSTMs? Why can't we just include BatchNorms to fix vanishing/exploding gradients?
@tharindawicky5 жыл бұрын
thanks
@doubtunites1685 жыл бұрын
what kind of sorcery is this?
@arshadhashmi79384 жыл бұрын
How can I get this code
@Below10IQ5 жыл бұрын
Loading Weights generates different results to when it was trained.
@rakshithak.sgowda71554 жыл бұрын
hi sir, can you please send me this project code if u have"Developing an Efficient Deep Learning-Based Trusted Model for Pervasive Computing Using an LSTM-Based Classification Model"
@loriando76985 жыл бұрын
You are doing good jobs! But I do not really understand that in this case, your chars value are unique characters, so why after converting into text, it is not unique ones, words in alphabet instead?
@nomadlyyy1113 жыл бұрын
equations for GRU's are wrong, it will have Ht-1 not Ht
@jay-rathod-013 жыл бұрын
6:32
@anishjain80965 жыл бұрын
Brother i won't understand many things how to do good and learn more advance concept