Simple Deep Neural Networks for Text Classification

Рет қаралды 116,257

Күн бұрын

Hi. In this video, we will apply neural networks for text. And let's first remember, what is text? You can think of it as a sequence of characters, words or anything else. And in this video, we will continue to think of text as a sequence of words or tokens. And let's remember how bag of words works. You have every word and forever distinct word that you have in your dataset, you have a feature column. And you actually effectively vectorizing each word with one-hot-encoded vector that is a huge vector of zeros that has only one non-zero value which is in the column corresponding to that particular word. So in this example, we have very, good, and movie, and all of them are vectorized independently. And in this setting, you actually for real world problems, you have like hundreds of thousands of columns. And how do we get to bag of words representation? You can actually see that we can sum up all those values, all those vectors, and we come up with a bag of words vectorization that now corresponds to very, good, movie. And so, it could be good to think about bag of words representation as a sum of sparse one-hot-encoded vectors corresponding to each particular word. Okay, let's move to neural network way. And opposite to the sparse way that we've seen in bag of words, in neural networks, we usually like dense representation. And that means that we can replace each word by a dense vector that is much shorter. It can have 300 values, and now it has any real valued items in those vectors. And an example of such vectors is word2vec embeddings, that are pretrained embeddings that are done in an unsupervised manner. And we will actually dive into details on word2vec in the next two weeks. But, all we have to know right now is that, word2vec vectors have a nice property. Words that have similar context in terms of neighboring words, they tend to have vectors that are collinear, that actually point to roughly the same direction. And that is a very nice property that we will further use. Okay, so, now we can replace each word with a dense vector of 300 real values. What do we do next? How can we come up with a feature descriptor for the whole text? Actually, we can use the same manner as we used for bag of words. We can just dig the sum of those vectors and we have a representation based on word2vec embeddings for the whole text, like very good movie. And, that's some of word2vec vectors actually works in practice. It can give you a great baseline descriptor, a baseline features for your classifier and that can actually work pretty well. Another approach is doing a neural network over these embeddings.

Пікірлер: 82

@maxlegnar7639 2 жыл бұрын

Thank you for the good explanation! You forgot to link the paper you mentioned (at 12:43). For all who are interested: I think it was about this paper: "Convolutional Neural Networks for Sentence Classification" by Yoon Kim

@Posejdonkon Жыл бұрын

Hey there! I normally don’t leave comments, or likes but I had to stop here! You’ve explained a convoluted topic in a clear, digestible and concise way. Thank you!

@because2022 Жыл бұрын

Great content.

@NandishA 5 жыл бұрын

One of the best videos to understand string inputs for Neural Nets.

@dr.mohammedabdallayoussif8051 4 жыл бұрын

kzbin.info/www/bejne/ZmjVlqaCmblqpKs

@sylvainbzh7171 5 жыл бұрын

Really nice explanations even if the convolution network internals are not enough explained.

@louisd695 2 жыл бұрын

this is the best explanation i've seen on CNN applied on text input

@danm2092 5 жыл бұрын

This is the most comprehensive video I've ever seen on neural networks! Thank you so much! I study and develop AI, but was using something more like the bag of words representation. The other thing, aside from accuracy that I noticed to be an issue with the bag of words representation, was actually the amount of resources it required from the machine it was operating on. To give some insight into just how bad it was, while the machine I was using wasn't exactly top of the line, the machine I'm using now is pretty high performance (i5-8400, 16gb RAM, 1TB Samsung Evo 860 SSD) and yet, the facial recognition usually dropped down the camera feed to about 3-5fps when it would detect a face. Even generating a response (using Speech-to-Text, then a custom-tailored version of the Levenshtein Distance algorithm to correct any misinterpretation of speech) was using at least 7GB of RAM even with a relatively small data set in the vicinity of maybe 50GB, and using 40-60% of my CPU power. Anyhow, my intent with watching this video was to learn about better algorithms, with the intent of actually implementing a neural network on an FPGA (Field-Programmable-Gate-Array). Now I feel well-equipped with enough information to conquer that finally, as I feel I finally understand CNNs well enough. Thanks so much!

@MachineLearningTV 5 жыл бұрын

Thanks for your feedback

@Mark-wl2gn 4 жыл бұрын

How would you go about getting started if you have a decent grasp of python and are trying to get into the space?

@xl000 2 жыл бұрын

This is not a pretty high performance setup. This is an average PC I bought a PC for 3d rendering and it's a 32 cores CPU E5-2670 0 @ 2.60GHz, 128GB of RAM and storage is a some kind of SSD on PCI-E. Pretty good setup

@GameChanger77 2 жыл бұрын

i5-8400 is horrible lol, definitely not high performance.

@danm2092 2 жыл бұрын

@@GameChanger77 Dude this post was from 3 years ago. Also, thank you for wasting both of our time by commenting this! Have a great day

@boooo6789 4 жыл бұрын

2:05 freudian slip? made me crack up haha Excellent video, thanks for sharing!

@manjuappu89 5 жыл бұрын

One of best lecture i have heard ever. Seriously i was totally in to your video for 15mins, which i forgot external world. Awaiting for next set of topics.

@MachineLearningTV 5 жыл бұрын

Soon we will upload the next video of this series! If you liked the video, please press the like button so that other people can find this video! Regards

@manjuappu89 5 жыл бұрын

I have subscribed, added push notification, liked videos.

@hellochii1675 4 жыл бұрын

@@MachineLearningTV at 11:19 I am confused about why each gram we learn 100 filters? What is the filter in this case? I thought by applying the 3-gram kernel using the same padding, we will get (1,n) vector, where n: number of words in this case n=5. Then we have 3,4,5 gram, shouldn't we just have 3 (1,n) vectors? If we get max value for each gram, shouldn't we just have 3 outputs, where each output from each x-gram vector (size =(1,n))? Can you explain why you said 300 outputs? Thanks,

@akashkewar 4 жыл бұрын

This is just brilliant!

@DanielWeikert 6 жыл бұрын

Great work thanks. Can't wait for the next. Very well explained

@mikkaruru 5 жыл бұрын

Try original course: www.coursera.org/learn/language-processing

@felipela2227 8 ай бұрын

Your explanation was great, thx

@kushshri05 5 жыл бұрын

IT is the best explanation of word embeddings ever seen

@dr.mohammedabdallayoussif8051 4 жыл бұрын

kzbin.info/www/bejne/ZmjVlqaCmblqpKs

@ijeffking 4 жыл бұрын

Fantastic! You have explained it very very well. Please upload more videos on related and Machine Learning topics. Thank you so much.

@dr.mohammedabdallayoussif8051 4 жыл бұрын

kzbin.info/www/bejne/ZmjVlqaCmblqpKs

@HeduAI 5 жыл бұрын

Amazing explanation! Thank you!

@dr.mohammedabdallayoussif8051 4 жыл бұрын

kzbin.info/www/bejne/ZmjVlqaCmblqpKs

@luislptigres 5 жыл бұрын

Excellent video. This video made me watch the whole playlist

@dr.mohammedabdallayoussif8051 4 жыл бұрын

kzbin.info/www/bejne/ZmjVlqaCmblqpKs

@suraj9661 3 жыл бұрын

Wonderful explanation.

@sonar_kella 4 жыл бұрын

Thank you so much. Got clear idea.

@dr.mohammedabdallayoussif8051 4 жыл бұрын

kzbin.info/www/bejne/ZmjVlqaCmblqpKs

@mawadaabdallah5363 4 жыл бұрын

great efforts!!

@sachadu779 4 жыл бұрын

top vidéo thanks !!!

@redyandrimof7565 5 жыл бұрын

Thanks man. May God guide you.

@dr.mohammedabdallayoussif8051 4 жыл бұрын

kzbin.info/www/bejne/ZmjVlqaCmblqpKs

@hellochii1675 4 жыл бұрын

at 11:19 I am confused about why each gram we learn 100 filters? What is the filter in this case? I thought by applying the 3-gram kernel using the same padding, we will get (1,n) vector, where n: number of words in this case n=5. Then we have 3,4,5 gram, shouldn't we just have 3 (1,n) vectors? If we get max value for each gram, shouldn't we just have 3 outputs, where each output from each x-gram vector (size =(1,n))? Can you explain why you said 300 outputs? Thanks,

@barax9462 3 жыл бұрын

at 1:54 what are the inputs are they [very, good, movie] or are they the [x1, x2, x3].

@jonathancardozo 3 жыл бұрын

Excelente

@tantzer6113 2 жыл бұрын

How does this compare with the attention mechanism in transformers?

@bizzatulhasan4629 4 жыл бұрын

Brilliant! Please share link to the next lecture.

@MachineLearningTV 4 жыл бұрын

Dear, in the description of the video you can find the link of the course.

@shubhammittal7832 3 жыл бұрын

@@MachineLearningTV I dint find any links in the description. Please provide the link

@argentineinformationservic2917 2 жыл бұрын

Please, explain the meaning of the final vector obtained after the 1d convolution and i guess, trained in some way.

@rialtosan Жыл бұрын

excuse my stupidity, on 4:19 how do you get 0.9 from word embeddings and convolutional filter, is it a dot product? or some thing else?

@bismeetsingh352 4 жыл бұрын

What about the context of the text. Why would you use this rather than use something like a GRU or LSTM

@GamerBat3112 3 жыл бұрын

how to find values for convolution filter?

@johntsirigotis8161 2 жыл бұрын

Where did the 0.9 and 0.84 come from? Sorry, I'm new to this...

@arnetmitarnetmit9456 2 жыл бұрын

at 4:25, the result of convolution is not 0.9, it is 0.88. How CNN create these filters? For instance we defined 16 filters to apply. How CNN library determine the content of filters ( numbers) ?

@MachineLearningTV 2 жыл бұрын

Good question! These filters are learned through back-propagation!

@singhRajshree 5 жыл бұрын

do we always need a vector representation before neural network??

@ruslanmurtazin7918 5 жыл бұрын

Yup

@singhRajshree 5 жыл бұрын

@@ruslanmurtazin7918 but that may be in case for feature eng in unsupervised learning. What about supervised learning algo..do we still need i this case?

@TheMightyWolfie 5 жыл бұрын

Can you please tell me the precision and recall of this network?

@dr.mohammedabdallayoussif8051 4 жыл бұрын

kzbin.info/www/bejne/ZmjVlqaCmblqpKs

@arsalaanshaikh3079 4 жыл бұрын

Need your advice for my ML project. Please help

@jastriarahmat659 4 жыл бұрын

what do i need to learn before can follow this video? i can't follow the explanation after hearing "bags of words" and the "neural network"

@Ragnarok540 3 жыл бұрын

Linear algebra would be a start.

@jastriarahmat659 3 жыл бұрын

@@Ragnarok540 thanks man

@sangitasable6919 4 жыл бұрын

thx sir its very nice lecture, i wanted to text processing on web page content can you take lecture on this

@dr.mohammedabdallayoussif8051 4 жыл бұрын

kzbin.info/www/bejne/ZmjVlqaCmblqpKs

@epictrollmemeface3946 3 жыл бұрын

1:20 "good movie very" ?

@Manu-lc4ob 4 жыл бұрын

How does one learn these filters?

@MachineLearningTV 4 жыл бұрын

These filters are learned by the Deep Learning algorithm. As a matter of fact, these filters are the weights that Neural Networks try to learn.

@rogerfroud300 3 жыл бұрын

This seems to use lots of terms that are undefined here. Is this part of a larger presentation? If so, numbering the parts would be useful. If not, then this really expect you to know a lot about the terminology before watching. Frankly, I find this completely confusing.