LSTM Networks - EXPLAINED!

Рет қаралды 303,959

Күн бұрын

Recurrent neural nets are very versatile. However, they don’t work well for longer sequences. Why is this the case? You’ll understand that now. And we delve into one of the most common Recurrent Neural Network Architectures : LSTM. We also build a text generator in Keras to generate state union speeches.
BLOG: / dataemporium
PLAYLISTS FROM MY CHANNEL
⭕ Reinforcement Learning: • Reinforcement Learning...
Natural Language Processing: • Natural Language Proce...
⭕ Transformers from Scratch: • Natural Language Proce...
⭕ ChatGPT Playlist: • ChatGPT
⭕ Convolutional Neural Networks: • Convolution Neural Net...
⭕ The Math You Should Know : • The Math You Should Know
⭕ Probability Theory for Machine Learning: • Probability Theory for...
⭕ Coding Machine Learning: • Code Machine Learning
MATH COURSES (7 day free trial)
📕 Mathematics for Machine Learning: imp.i384100.ne...
📕 Calculus: imp.i384100.ne...
📕 Statistics for Data Science: imp.i384100.ne...
📕 Bayesian Statistics: imp.i384100.ne...
📕 Linear Algebra: imp.i384100.ne...
📕 Probability: imp.i384100.ne...
OTHER RELATED COURSES (7 day free trial)
📕 ⭐ Deep Learning Specialization: imp.i384100.ne...
📕 Python for Everybody: imp.i384100.ne...
📕 MLOps Course: imp.i384100.ne...
📕 Natural Language Processing (NLP): imp.i384100.ne...
📕 Machine Learning in Production: imp.i384100.ne...
📕 Data Science Specialization: imp.i384100.ne...
📕 Tensorflow: imp.i384100.ne...
CODE FOR THIS VIDEO: github.com/ajh...
REFERENCES
[1] LSTM Landmark paper (Sepp Hochreiter ): www.bioinf.jku...
[1] Slides from the Deep Learning book for RNNs: www.deeplearni...
[2] Andrej Karpathy’s Blog + Code (You can probably understand more from this now!): karpathy.github...
[3] The Deep learning Book on Sequence Modeling: www.deeplearni...
[4] Colah’s blog on LSTMs: colah.github.io...
[6] Visualizing and Understanding RNNs : arxiv.org/pdf/...

Пікірлер: 153

@CodeEmporium Жыл бұрын

For details and code on building a translator using a transformer neural network, check out my playlist "Transformers from scratch": kzbin.info/www/bejne/h3Stgnpqedp7ipI

@AzharKhan-to2ll 3 жыл бұрын

I learnt about LSTMs from so many sources; but no one explained it this well. This is some amazing content you are creating. It should be preserved.

@CodeEmporium 3 жыл бұрын

Thank you:)

@kennethatz39 5 жыл бұрын

Programmed my first LSTM with that video. Really good introduction to this topic. Right amount of math, architecture, background (GRU, RNN etc) and coding.

@from-chimp-to-champ1 2 жыл бұрын

The first time that i see how someone "unrolled" the lstm network and actually demonstrated it. This could not do any professor that i saw. They only showed the picture of cells anybody could find on the internet. Thank you very much, good job!

@anibhatia 4 жыл бұрын

Amazing video.. I love how you have explained a number of different concepts and explained each one with due integrity

@AlistairWalsh 4 жыл бұрын

Really like your conversational explanations. Great detail presented in a palatable manner.

@simoneparvizi775 3 жыл бұрын

WTF did i just watched....man in 15 min video you explained so many topics and smoothly.....I'm new to AI but what you did was impressive. You explainaid the meaning of everything while simplifieng math concepts BUT STILL putting them in....thank you for your work, I really appreciate it

@CodeEmporium 3 жыл бұрын

I'm super glad you appreciate this style. I'm trying to make more videos like this as of later too. :)

@hoangphamviet1241 2 жыл бұрын

Great video!!! Everything I can see and understand from the video make compelling sense for me. Thank you so much!!

@CodeEmporium 2 жыл бұрын

You are very welcome (sorry I am so late)

@loopuleasa 6 жыл бұрын

pretty in-depth view on this I like your pacing better than Siraj, also the simplicity

@CodeEmporium 6 жыл бұрын

Thanks a lot! I'm going for a "here is why we do things the way we do" approach. Glad that you (and many others) find it interesting.

@beingnothing34 4 жыл бұрын

This man is a different beast! Way better and hence shouldn't be compared to Siraj! :) Great video.

@farenhite4329 4 жыл бұрын

Arun Kumar the scandal showed why Siraj was so much worse at explaining than this guy.

@ifmondayhadaface9490 4 жыл бұрын

Farenhite oof yeah

@SwapnilGusani 4 жыл бұрын

@@beingnothing34 Dude Siraj was a fraud.

@sepehr_fard 5 жыл бұрын

I’m not done with watching it. But I had to leave a comment first. I think you have really cracked the way to make people understand. Not a single professor has ever taught the way you have unfortunately. I have always wanted someone whose teaching to start from ground up and just explain everything before diving into math. You even explained what os and random was that’s never seen before in cs 😂. So thank you I really enjoy your video it is great for once someone understood a student who might be watching this does not have a phd in mathematics so explaining what each variable means and what the big picture is might save them hours of being lost and confused. Keep up the great work man I really appreciate this channel now it’s a hidden gem!!!!

@h4ck314 6 жыл бұрын

I like the quality of your content, I'll definitely watch your other videos !

@CodeEmporium 6 жыл бұрын

Thanks sooo much! Enjoy your stay ;)

@LifeKiT-i Жыл бұрын

i study MSc computer science at HKU, but your teaching is much better than my professor. OMG

@happyduck70 3 жыл бұрын

Had a mighty laugh on the Sepp Hochreiter joke, thanks!

@ChaminduWeerasinghe 3 жыл бұрын

Your explanation is amazing. Love the way you joking and that makes the video more interesting❤️

@bopon4090 4 жыл бұрын

Thank you sooo much for linking references in the description.

@sooryaprakash6390 2 жыл бұрын

Mind-blowing Video!. Thanks for making it.

@CodeEmporium 2 жыл бұрын

Anytime :)

@jeroenritmeester73 4 жыл бұрын

Thank you SO MUCH for giving some examples of each architecture. Im following multiple ML courses on uni, but everything is abstracted away behind mathematical jargon, and never gets back to basics.

@pablovillarroel3109 5 жыл бұрын

Such a great video, you explain everything so clearly and at a good pace, liked and subscribed!

@TheLOL9842 3 жыл бұрын

Gosh can't wait for that video on GRU that's coming pretty soon! Besides the joke, Thanks for the video!

@CodeEmporium 3 жыл бұрын

Thanks for watching!

@NeilWiddowson 4 жыл бұрын

This makes so much more sense than my lecture...

@thalesogoncalves1 4 жыл бұрын

Excelent video, dude! It's awesome when someone embraces both theoretical *and* practical parts. Thanks a lot

@CodeEmporium 4 жыл бұрын

Thanks for the compliments!

@akompsupport Жыл бұрын

Good overview! Still relevant. LSTM's have come a long way, important for the dev of LLM that are showing SOTA performance on NLP as of this date no?

@9899895384 4 жыл бұрын

wow, your explanation is so simplistic!

@kamalmanchu3060 3 жыл бұрын

This is phenomenal....great explanation dude..... ❤️

@auslei 4 жыл бұрын

nice and concise.. good work buddy

@medhnhadush4320 2 жыл бұрын

awesome explanation. thank you

@captiandaasAI Жыл бұрын

Great!!!!!!!!!!!!!!!!!!!!!!! Lecture damm good explanation..

@rahimdehkharghani 4 жыл бұрын

I really liked this clear exlpanation.

@nezbut7 4 жыл бұрын

this was very helpful! thank you

@kamalchapagain8965 5 жыл бұрын

Thanks ! Simply the best.

@adampaslawski8859 3 жыл бұрын

This is a great video , thanks for making it

@dompatrick8114 3 жыл бұрын

6:37 Lmao the comedic timing, I died.

@internationalenglish7413 5 жыл бұрын

You are very good. Someday, you will be a great professor.

@osci5124 2 жыл бұрын

Great video, you are really good at explaining logically

@threeMetreJim 5 жыл бұрын

Very good and informative video, shame about how many adverts though.

@cooky123 2 жыл бұрын

Good video, thank you.

@danielpiskorski9447 4 жыл бұрын

Great video! Thank you

@akashkewar 4 жыл бұрын

keep doing the good stuff man.

@CodeEmporium 4 жыл бұрын

Thanks dude. I'm always about that good stuff.

@justingoh3750 2 жыл бұрын

Great video! My only complain is that I cannot find your video explaining GRUs like you said you would =p

@CodeEmporium 2 жыл бұрын

Yea I did not do that and got caught up with some other videos later on :) My bad

@jayasreechaganti9382 2 жыл бұрын

Sir can you do a video of Rnn example by giving numerical values

@yulinliu850 6 жыл бұрын

Excellent lecture! Many Thanks!

@CodeEmporium 6 жыл бұрын

Thanks for watching!

@tylersnard 7 ай бұрын

Thank you for this. What are U, V, and W at 8:44?

@exoticme4760 4 жыл бұрын

is it not better to use word embeddings rather than character vectors

@elisimic4371 4 жыл бұрын

High Quality Content!

@mohammadyahya78 Жыл бұрын

Thank you very much. Why the gradient explode as a function of t/d please at 7:19?

@TheShubham67 4 жыл бұрын

Really Awesome stuff

@beshosamir8978 2 жыл бұрын

Hi , i need some help here why we decide to make the next hidden state = the long memory after filter it ? why not the next hidden layer not = the long memory (Ct)

@loyodea5147 5 жыл бұрын

Thank you for the great video!

@shaythuramelangkovan5800 4 жыл бұрын

Hi Siraj, could you explain why we use a dense layer ?

@ObviouslyASMR 4 жыл бұрын

I'm new to AI so this might be a silly question but I thought the weights were randomly initialized, how is it possible it performed so well on the first epoch? I assumed the characters would be completely random but they make at least some semblance of words already, or is there already some learning done before the end of the first epoch? Btw thanks so much for the video! Way clearer than others I've watched

@siddheshbalshetwar3869 4 жыл бұрын

The prediction sentence is printed after the epoch...so yes it did learn 'something' in that epoch that's why it makes a little sense

@ObviouslyASMR 4 жыл бұрын

@@siddheshbalshetwar3869 Thanks man, I think when I wrote this comment I was under the impression that the printed sentences were from during the training and before backprop, but I realize now that first of all the backprop would've probably been done in batches, and second of all that like you said the sentences are printed after the final backprop in that epoch

@siddheshbalshetwar3869 4 жыл бұрын

@@ObviouslyASMR yeah any time man

@lynnlo 4 жыл бұрын

The weights are randomized, the goal of a Neural Network is to make a bad guess and turn it into a better one.

@harriethurricane8617 3 жыл бұрын

OMG definitely didn't expect to see my favorite ASMR channel here lol

@ИванНикитин-ч7б 3 жыл бұрын

Can't understand some points. If I have a set of temperature values or closing price of a day. Just one linear sequence. I need to forecast 3 future days values by 10 previous days values. So the question is which of values I need to put into the first LSTM cell, which values into the second cell and so on? The second question is how much LSTM cells I need for this calculations; does an LSTM cells count depend on previous days count or future days count?

@dubey_ji 4 жыл бұрын

thank you so much !

@friedrichwilhelmhufnagel3577 Жыл бұрын

Hello! Your Link to your coursera videos is seemingly broken/expired. Can I find videos from you on coursera and can you recommend more learning material like courses and books to me? Thank you! Great videos.

@DiaboloMootopia 4 жыл бұрын

Great video. Is it possible that the graphic at 9:45 is mislabeled? h_t is coming out at the top right where I though o_t should be emerging.

@jhonysilver5208 4 жыл бұрын

Good video!

@CodeEmporium 4 жыл бұрын

Much appreciated

@kostasgeorgiou2417 5 жыл бұрын

I love your videos, please make more!

@PopMusicFilms 4 жыл бұрын

bro you are fire, i was struggling in my deep learning course and this LTSM video really helped

@rainfeedermusic 5 жыл бұрын

I liked the explanation but unfortunately could not understand why exploding gradients is more of a problem in RNN rather than a DNN. I mean the W that gets propagated from h(t-1) to h(t) can also be in such a way that when one W is >1 the next could be

@zd676 4 жыл бұрын

In DNN Ws can be different from layer to layer, so W in layer 1 is 0. In RNN, weights get shared, so if W>1 or W

@onomatopeia891 2 жыл бұрын

Can you explain further what the hidden size argument is for in the LSTM? Many say it is the dimensionality of the output but I don't get it. The sample explanations of LSTM I saw only has 1 dimensionality so what does it mean when hidden size or number of units as some refer to is more than 1?

@karthiksrini7178 5 жыл бұрын

the presentation is at its best. What software are u using?

@CodeEmporium 5 жыл бұрын

Thanks for the compliments Karthik. I use Camtasia Studio for editing my videos

@1UniverseGames 4 жыл бұрын

Do you have any videos about using RNN model for cyber threat attacks, or any source to look for study it

@LightFykki 6 жыл бұрын

Amazing video, thanks!

@CodeEmporium 6 жыл бұрын

Thanks for watching!

@hangchen 3 жыл бұрын

Best part 6:32

@mentalmodels5 5 жыл бұрын

I'm confused about the part where he says "Gradient will now explode/vanish as a function of tau/d" 7:06 Can someone explain this to me?

@dfnoshamps 5 жыл бұрын

If I got the right understanding, since the propagation of weigth should happen throught jumps over d units instead of directly to next one, the explosion problem should happen in a "smoother rate"

@mentalmodels5 5 жыл бұрын

@@dfnoshamps Thanks for the reply, what I don't get is why it should happen at a smoother rate if you just add a skip connection?

@185283 5 жыл бұрын

Hello, have a question: @8:50 you mentioned x(0)... x(n) as inputs. If you had a sentence "Hello World", is a vector of "Hello" be x(0) and "World" be x(1)? If so, x(0) and x(1) will require 2 LSTM cells, and will one line of "model.add(LSTM)" have two LSTM cells to process "Hello World"? How can we visualize more than one LSTM layer then?

@CCCC-lu2st 5 жыл бұрын

The way he pronounced "Sepp Hocrieter" blew my brains 😅

@rarenectar 4 жыл бұрын

its actually Hochreiter :)

@ethiomusic3217 2 жыл бұрын

how to use ctc loos function for training of variable length sequences??? can you help to me??

@huangbinapple 4 жыл бұрын

Starts at 9:00

@xinyuma5358 4 жыл бұрын

Hi, Why we use Tanh in RNN consider it is a bad activation function? Can we use ReLu?

@coolvideos2829 3 жыл бұрын

How can we predict the market using math? I believe it's possible through Fourier series and a few other views. Please help 🆘 I just don't understand how to get the wave form of the market and then calculate a point in time to predict the price. Itself sounds simple but idk what to

@CodeEmporium 3 жыл бұрын

Hmm. The stock market is very hard to predict. It depends on factors that go beyond historical trends. It's a fun toy problem, but not super realistic to model. I have a video of me attempting to build a model for this too. It's one of my more recent videos

@georgebarnett121 5 жыл бұрын

Don't BatchNorms and He Initilaization fix Vanishing/Exploding Gradients? ResNet actually fixed model degredation, where deeper models perform worse than smaller models. Deeper networks should learn identity connections if an optimal model has smaller models. The ResNet shortcut connection allows easy learning of mappings similar to identity mappings. How does affect LSTMs? Why can't we just include BatchNorms to fix vanishing/exploding gradients?

@yangwang9688 4 жыл бұрын

Max length of the sentence is 40, but why set LSTM units to 128? What is the output size of LSTM?

@ethiomusic3217 2 жыл бұрын

good videos, but i have some questions please

@joshualee3172 4 жыл бұрын

what are the dimensions of the weights?

@Небудьбараном-к1м 4 жыл бұрын

Isn't 128 is too many for hidden size? I building an LSTM network, my input shape is [300, 5] and using hidden_size=128 results in gradient vanishing. Also, what happens if I add more layers to the dense net which comes after LSTM? Will this architecture be able to learn? Because LSTM "requires" a relatively large learning rate, which is often too large for typical FC network I am guessing that this will cause some crazy instability as a whole. I hope you could help me with these annoying questions :). Thanks a lot for sharing your knowledge!

@swathykrishna9618 4 жыл бұрын

Good explanation. Can u do one video on Xception model?Plz

@CodeEmporium 4 жыл бұрын

Thanks! I have already done an Xception explanation. Check out my video on "Depthwise Separable Convolution - explained"

@Below10IQ 6 жыл бұрын

Loading Weights generates different results to when it was trained.

@gauravkumar6534 5 жыл бұрын

hi, your video was nice and i request you to make video on LSTM for speech recognization please.

@charlieangkor8649 3 жыл бұрын

I don’t understand it. Suddenly a pic full of math symbols pops up it’s not labeled what are inputs outputs neurons connections weights

@nathanaelsatrianugraha3381 2 жыл бұрын

Hello i'm new here, i want to ask, how do we know the value of Wi, Wf, Wo, Wc? Is it randomize? Thank you, BTW nice video

@tharindawicky 6 жыл бұрын

thanks

@piotrgrzegorzek8039 5 жыл бұрын

Hi! just a question, does lstm predict on sequences of FEATURES in ONE SAMPLE or sequences of SAMPLES (outputs) in ONE BATCH? For eg. I need to predict next number as many to one. I fit first sample as x1=1, x2=2 and output y=3, next sample x1=4, x2=5 y=6. NOW Does the model look on sequence of features (x1,x2) or sequence of samples (y, which are output of the model)

@rabinthapa1431 4 жыл бұрын

bro can u make a video on implementing Convolutions and LSTMs

@robinmuller2402 2 жыл бұрын

yeah we have no question marks in german 3:32

@wiebetje00 4 жыл бұрын

You cut the text into semi-redundant sequences of maxlen characters, but how does the model or performance change if you change the value of maxlen?

@ApriliyanusPratama 6 жыл бұрын

excellent explanation. can you show me where i can get full math derivation of backward pass of lstm?

@CodeEmporium 6 жыл бұрын

Thanks! A quick google search takes me here: arunmallya.github.io/writeups/nn/lstm/index.html#/ It seems good.

@nomadlyyy111 3 жыл бұрын

equations for GRU's are wrong, it will have Ht-1 not Ht

@mikefda12 3 жыл бұрын

At 5:24 what is that e looking symbol called?

@CodeEmporium 3 жыл бұрын

The symbol is an epsilon which means "belongs to". So x(i) belongs to a set of vectors of real numbers with D dimensions. Simply put, x(i) is a vector of real numbers with D dimensions.

@mikefda12 3 жыл бұрын

@@CodeEmporium thank you

@samfelton5009 3 жыл бұрын

what's your source for the images throughout this video? I'd love to use them in my own work!

@loriando7698 5 жыл бұрын

You are doing good jobs! But I do not really understand that in this case, your chars value are unique characters, so why after converting into text, it is not unique ones, words in alphabet instead?

@bubblesgrappling736 4 жыл бұрын

is "cell" equal to neuron? it seems to be like the case. But at 8:45, when you say that each sequence element goes through a cell each, then i am confused, is the cell really modelling the entire model?

@arshadhashmi7938 4 жыл бұрын

How can I get this code

@Firewalker124 5 жыл бұрын

Got a specific question: I am currently trying to classify motion in a 3d-animation. So basically I get a bunch of 3d-vectors that i am trying to get in relation over time. More specifically I want to check if the movement of the bones and joints are too fast. So my thought was to use lstm to check that. I would use the 3d-vectors for each frame as an input in a lstm-cell. Yet i am not quite sure how to set each cell, each frame in relation to the next one. Any tipps? :D

@soareverix 2 жыл бұрын

This is a really interesting problem I'm interested in as well, for VR purposes! Did you ever solve it?

@Firewalker124 2 жыл бұрын

@@soareverix well it was a topic for a possible master thesis for myself, i thought a bit about it, but changed the topic due to some otver hardware related problems. However, i had an idea on how to enter all necessary information into the lstm that could work. But im currently still working, so maybe ill write back later with the idea. In my case it wasnt vr but motion capturing of movements

@cliccme 5 жыл бұрын

Hi, I have one question regarding BiLSTM neural network. Should i ask here or on your Quora profile? Thanks

@CodeEmporium 5 жыл бұрын

Wherever you want :)

@doubtunites168 6 жыл бұрын

what kind of sorcery is this?

@MrMrjacky7 4 жыл бұрын

Hi! I have some sequences generated from some initial conditions, what model should I use to have a sequence generated from some initial condition based on the data I have? seq2seq models usually predict the following data of a series but don't generate sequence from initial conditions.

@shubhamdotdkhema 4 жыл бұрын

You should probably try open AI GPT-2....it will generate sentences for u given an initial context (or even a single word).

@anishjain8096 5 жыл бұрын

Brother i won't understand many things how to do good and learn more advance concept

@rakshithak.sgowda7155 4 жыл бұрын

hi sir, can you please send me this project code if u have"Developing an Efficient Deep Learning-Based Trusted Model for Pervasive Computing Using an LSTM-Based Classification Model"