More such videos please, this is much better than the Udemy courses, even the paid ones!
@vishnuprabhaviswanathan5462 жыл бұрын
Pls show how to custom train Bert embedding
@s.m.saifulislambadhon26544 жыл бұрын
bro, in 44 no shell what is the purpose of tokenizer when we already tokenize the sentences into words in preprocessing part
@s.m.saifulislambadhon26544 жыл бұрын
would you please explain 44 no shell little bit more briefly? I think this is the most important part which I missing....
@NormalizedNerd4 жыл бұрын
Great point! In NLP preprocessing, tokenization makes it easier to clean the text. Here I generally use nltk library. In block 44, I did the tokenization with keras Tokenizer which allows us to use two nice functions: word_index & texts_to_sequences. These help us to create the tensors easily. So yes, tokenization is redundant here but I did it anyway to make our life easier :D
@s.m.saifulislambadhon26544 жыл бұрын
Thanks for the explanation
@MrStudent19784 жыл бұрын
Very nice explanation! I have a question....in shell no 50..what is sense behind "trainable = false" ? The video is about training custom word2vec...then why false?
@NormalizedNerd4 жыл бұрын
@Gurpreet Singh I understand your confusion. We are actually training our word vectors in shell 46. In shell 50, we are making our embedding layer that will be placed just before the LSTM units. Remember that embedding layer is nothing but the learned word vectors (in matrix form)! So if we make trainable = True at the embedding layer then keras will train the embedding layer (i.e. the word vectors) again while performing the back prop on LSTM. We don't want that. I hope now it's clear to you.
@MrStudent19784 жыл бұрын
@@NormalizedNerd thanks for your response! I got it now....
@Lotof_Mazey2 жыл бұрын
Sir Kindly guide - How can I use Pre Trained word embedding models for local languages (or languages written in Roman format) that are not available/trained in the pretrained model. Do I have to use an embedding layer(not pre trained) for creating embedding matrices for any local language? How can I get benefit from pretrained models for local language?
@NormalizedNerd2 жыл бұрын
Hi, unfortunately there aren't a lot of pre-trained word embeddings of romanized non-english languages. You can search and if you find something then you can fine tune it on your data. But I don't think there's an easy way to use English models on romanized non-english languages.
@hanjes47933 жыл бұрын
Hello...i got a question. In train test split cell. Where is 'word_index' from??? Thx
@NormalizedNerd3 жыл бұрын
It's the Keras Tokenizer that is giving us the 'word_index'
@rushikeshkulkarni7758 Жыл бұрын
why didn't we use sklearn train_test_split?
@vishnuprabhaviswanathan5462 жыл бұрын
Can u show how to calculate similarity of 2 words using custom trained word2vec
@ARSHABBIR1004 жыл бұрын
Excellent. Thanks for uploading. Kindly make more videos to build a chatbot .
@NormalizedNerd4 жыл бұрын
It is in my wish-list too! keep supporting
@coxixx4 жыл бұрын
would you learn how to train our custom word vectors with Glove using python?
@NormalizedNerd4 жыл бұрын
That's actually very easy. Just make your corpus (.txt file). Then use the official repo to train glove model on your corpus. github.com/stanfordnlp/GloVe