Introduction to NLP | How to Train Custom Word Vectors

  Рет қаралды 4,384

Normalized Nerd

Normalized Nerd

Күн бұрын

Пікірлер: 21
@shrutiiyyer2783
@shrutiiyyer2783 2 жыл бұрын
More such videos please, this is much better than the Udemy courses, even the paid ones!
@vishnuprabhaviswanathan546
@vishnuprabhaviswanathan546 2 жыл бұрын
Pls show how to custom train Bert embedding
@s.m.saifulislambadhon2654
@s.m.saifulislambadhon2654 4 жыл бұрын
bro, in 44 no shell what is the purpose of tokenizer when we already tokenize the sentences into words in preprocessing part
@s.m.saifulislambadhon2654
@s.m.saifulislambadhon2654 4 жыл бұрын
would you please explain 44 no shell little bit more briefly? I think this is the most important part which I missing....
@NormalizedNerd
@NormalizedNerd 4 жыл бұрын
Great point! In NLP preprocessing, tokenization makes it easier to clean the text. Here I generally use nltk library. In block 44, I did the tokenization with keras Tokenizer which allows us to use two nice functions: word_index & texts_to_sequences. These help us to create the tensors easily. So yes, tokenization is redundant here but I did it anyway to make our life easier :D
@s.m.saifulislambadhon2654
@s.m.saifulislambadhon2654 4 жыл бұрын
Thanks for the explanation
@MrStudent1978
@MrStudent1978 4 жыл бұрын
Very nice explanation! I have a question....in shell no 50..what is sense behind "trainable = false" ? The video is about training custom word2vec...then why false?
@NormalizedNerd
@NormalizedNerd 4 жыл бұрын
@Gurpreet Singh I understand your confusion. We are actually training our word vectors in shell 46. In shell 50, we are making our embedding layer that will be placed just before the LSTM units. Remember that embedding layer is nothing but the learned word vectors (in matrix form)! So if we make trainable = True at the embedding layer then keras will train the embedding layer (i.e. the word vectors) again while performing the back prop on LSTM. We don't want that. I hope now it's clear to you.
@MrStudent1978
@MrStudent1978 4 жыл бұрын
@@NormalizedNerd thanks for your response! I got it now....
@Lotof_Mazey
@Lotof_Mazey 2 жыл бұрын
Sir Kindly guide - How can I use Pre Trained word embedding models for local languages (or languages written in Roman format) that are not available/trained in the pretrained model. Do I have to use an embedding layer(not pre trained) for creating embedding matrices for any local language? How can I get benefit from pretrained models for local language?
@NormalizedNerd
@NormalizedNerd 2 жыл бұрын
Hi, unfortunately there aren't a lot of pre-trained word embeddings of romanized non-english languages. You can search and if you find something then you can fine tune it on your data. But I don't think there's an easy way to use English models on romanized non-english languages.
@hanjes4793
@hanjes4793 3 жыл бұрын
Hello...i got a question. In train test split cell. Where is 'word_index' from??? Thx
@NormalizedNerd
@NormalizedNerd 3 жыл бұрын
It's the Keras Tokenizer that is giving us the 'word_index'
@rushikeshkulkarni7758
@rushikeshkulkarni7758 Жыл бұрын
why didn't we use sklearn train_test_split?
@vishnuprabhaviswanathan546
@vishnuprabhaviswanathan546 2 жыл бұрын
Can u show how to calculate similarity of 2 words using custom trained word2vec
@ARSHABBIR100
@ARSHABBIR100 4 жыл бұрын
Excellent. Thanks for uploading. Kindly make more videos to build a chatbot .
@NormalizedNerd
@NormalizedNerd 4 жыл бұрын
It is in my wish-list too! keep supporting
@coxixx
@coxixx 4 жыл бұрын
would you learn how to train our custom word vectors with Glove using python?
@NormalizedNerd
@NormalizedNerd 4 жыл бұрын
That's actually very easy. Just make your corpus (.txt file). Then use the official repo to train glove model on your corpus. github.com/stanfordnlp/GloVe
@WhatsAI
@WhatsAI 4 жыл бұрын
Great video m yfriend!
@NormalizedNerd
@NormalizedNerd 4 жыл бұрын
Thank you pal :)
Sarcasm is Very Easy to Detect! GloVe + LSTM
17:08
Normalized Nerd
Рет қаралды 12 М.
Introduction to NLP | Word Embeddings & Word2Vec Model
23:10
Normalized Nerd
Рет қаралды 38 М.
«Жат бауыр» телехикаясы І 26-бөлім
52:18
Qazaqstan TV / Қазақстан Ұлттық Арнасы
Рет қаралды 434 М.
Маусымашар-2023 / Гала-концерт / АТУ қоштасу
1:27:35
Jaidarman OFFICIAL / JCI
Рет қаралды 390 М.
Непосредственно Каха: сумка
0:53
К-Media
Рет қаралды 12 МЛН
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 4,2 МЛН
Training Word Vectors with Facebook's fastText
24:55
tanmay bakshi
Рет қаралды 14 М.
How might LLMs store facts | DL7
22:43
3Blue1Brown
Рет қаралды 912 М.
Introduction to NLP | GloVe & Word2Vec Transfer Learning
21:12
Normalized Nerd
Рет қаралды 11 М.
Diffusion models from scratch in PyTorch
30:54
DeepFindr
Рет қаралды 266 М.
Introduction to NLP | GloVe Model Explained
23:15
Normalized Nerd
Рет қаралды 69 М.
Word2Vec Easily Explained- Data Science
22:50
Krish Naik
Рет қаралды 173 М.