Pytorch Seq2Seq Tutorial for Machine Translation

Рет қаралды 81,178

Aladdin Persson

Күн бұрын

Пікірлер: 140

@morancium Жыл бұрын

Field and BucketIterator is not working!!

@aryansharma4902 5 ай бұрын

have you found any solution for this. I am facing the same issue right now.

@SAINIVEDH Жыл бұрын

Why have you not used Hidden, Cell states in encoder LSTM ?!. I'm quite confused. Thanks

@kaushilkundalia2197 4 жыл бұрын

Was struggling to make an implementation of this. So so so happy I found out your tutorial. Thanks a lot for making this. Keep up the great work

@AladdinPersson 4 жыл бұрын

I appreciate you saying that, it means a lot and is exactly why I want to continue doing these videos! ❤

@gauravms2799 3 жыл бұрын

@@AladdinPersson can you share where you learnt from all this not specific to this video but all this stuff your top resources

@amitg2k 3 жыл бұрын

This is great stuff. Explained like a pro. Could you please create videos on similar lines with slight modifications like: 1. How to use custom data-set 2. How to use basic RNN and/or GRU (I tried but ran into multiple issues). These branch offs will be very helpful in overall understanding on how to modify code to address custom problems. Thanks in advance :)

@AshokSharma-ec9so Жыл бұрын

I got the following errors while implementing: 1. ImportError: cannot import name translate_sentence, save_checkpoint, load_checkpoint from 'utils' 2. AttributeError: module 'torchtext.nn' has no attribute 'Module' Has someone else also been going through these errors ? Can anyone please suggest the resolution for these issues ?

@dogkansarac4889 3 жыл бұрын

loved your tutorial. i have a question though. when we implementing the encoder, you said: "x shape" as (seq_len, N). shouldn't it be as (input_size, seq_len, N), where input_size is the vocabulary? because we one-hot encode each word in the first place

@meylyssa3666 3 жыл бұрын

Please can you do some tutorials on new torchtext, there are a lot of new things there!

@duongocyen 3 жыл бұрын

when I run the code I have this error: Traceback (most recent call last): train_data, valid_data, test_data = Multi30k.splits( AttributeError: 'function' object has no attribute 'splits' Process finished with exit code 1 I have search for some solution form the internet but it doesnot work. Could you please take a look at my error? I really appreciate your time.

@nishantpall1747 3 жыл бұрын

train_data, validation_data, test_data = Multi30k(split=('train', 'valid', 'test'),language_pair=('de', 'en')) use this

@injeranamitmita 3 жыл бұрын

Great intro for ML! What if I am working with a language that does not yet exist in spacy, can I create my own custom tokenizer?

@vladsirbu9538 3 жыл бұрын

Hi, I can't seem to find the video where you go through the paper, could you please point me to it? I love the intro btw :D

@rohankavari8612 2 жыл бұрын

Hey, Your code doesnt work for the new pytorch and torchtext version. Please could you port ur code?

@henryli85 11 ай бұрын

use versions torchtext 0.6.0 and torch 1.13.0, just import torchtext, no need for the .legacy part

@MuhammadFhadli 3 жыл бұрын

When you called build_vocab method for german and english? How can pytorch now if you are trying to build vocab for the specific language? Because you just pass train_data at once. Can someone explain? Thank yu

@feravladimirovna1044 4 жыл бұрын

will you make tutorials about Allennlp library or suggest us sources to learn it?

@AladdinPersson 4 жыл бұрын

Not too familiar with AllenNLP so would have to do some research on that

@GodsGreatest Жыл бұрын

you scared me with that intro 😅

@Aswin255 4 жыл бұрын

Holy shit, thank you good sir. You are a lifesaver.

@ankanbasu7381 Жыл бұрын

i can't understand the intuition behind making batch_size the index 1 of shape. (sequence_len, batch_size, word_size). pytorch docs say lstm uses this shape until it's mentioned batch_first = True but it seems confusing to me. (batch_size, sentence_len, word_size) seems more intuitive. can anyone explain me the first shape (when batch_first=False)

@riyajatar6859 2 жыл бұрын

I am still stuck at shape of target and output tensor. Don't you think both have same shape, and we don't need to reshape. Because if target has shape (N, T, voc_size) and output will also has same shape. Correct if I m wrong

@北海苑优质男 3 жыл бұрын

您是我的导师！i love you

@____khan 3 жыл бұрын

at 0:32, he says he went over the Seq2Seq paper in his last video but I cannot find that video for some reason. Can anyone link to it please?

@SpiderMan-wk4gk 3 жыл бұрын

good job, i have a question for dataset, how do get dataset for my computer ? how do create dataset for new language ? i want to Vietnamese -> english but i dont have dataset and i dont know how do create dataset for that !!! please suport !

@nazaninadavoodi3563 Жыл бұрын

Thank you so much.I have a question why Decoder does not have activation layer?

@raviraja2691 4 жыл бұрын

Hi. Great tutorial! Can you explain how did you write the translate_sentence function? It will be a great help. Thanks

@AladdinPersson 4 жыл бұрын

Unfortunately I didn't make an explanation of this, but it's on Github and we're essentially doing as we did in the video just one time step at a time. I think if you read through the code you will understand it, here is the code for it: github.com/AladdinPerzon/Machine-Learning-Collection/blob/master/ML/Pytorch/more_advanced/Seq2Seq/utils.py Let me know if you still have questions about this

@raviraja2691 4 жыл бұрын

@@AladdinPersson Thanks a lot man!

@godse54 3 жыл бұрын

any suggestions how to make encoder-decoder network using tensorflow.. or can you make one video using tensorflow

@sanju12121 2 жыл бұрын

from utils import translate_sentence, bleu, save_checkpoint, load_checkpoint ModuleNotFoundError: No module named 'utils' I am getting an error here

@prathameshjadhav3041 2 жыл бұрын

check the github repo for utils class

@computerscience8532 3 жыл бұрын

spacy_ger = spacy.load('de') give error : module 'de' has no attribute 'load'

@Scientificommnuty Жыл бұрын

what framework does he use for python? Is it visual studio?

@computergyan6224 Жыл бұрын

Minimum how much pair of sentences we need to have a decent translation

@MenTaLLyMenTaL 2 жыл бұрын

Will the first output of the model be token or not? In the intro you've shown that there is no in the output sequence. But @39:52 on line 174, you do output[1:] with the intention to skip the token, which is contradictory. Shouldn't loss be comparing the entire output sequence i.e. output[:] with target[1:] ?

@meseretfetene754 2 жыл бұрын

where should I save my datasets on my drive? can the program read wherever it is????

@kirillkonovalov9072 3 жыл бұрын

Could you, please, show how to actually use the model to translate sentences without the function that you imported? It’s not quite clear to me. Thank you!

@parthchokhra7298 4 жыл бұрын

Hi, I am not able to load german tokeniser. OSError: [E050] Can't find model 'de'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

@AladdinPersson 4 жыл бұрын

Did you do: Python - m spacy download "de" And made sure that's in the same environment as where you train your model?

@parthchokhra948 4 жыл бұрын

@@AladdinPersson yup this worked for me

@simarpreetsingh019 4 жыл бұрын

@@parthchokhra948 hey parth , i got same issue, but while I type this command, it shows linking fail, could you help me with this

@1chimaruGin0_0 4 жыл бұрын

@@simarpreetsingh019 I can't use spacy.load('de'). but spacy_ger = spacy.load('de_core_news_sm') This work for me.

@1chimaruGin0_0 4 жыл бұрын

(another option) If you are using Windows, 1. Right click on anaconda navigator 2. Run as administrator 3. Open anaconda prompt 4. python -m spacy download de this work for me

@youmeifan4770 3 жыл бұрын

Thanks for your tutorial helped me a lot! But I got ImportError: cannot import name 'translate_sentence' from 'utils' (/opt/anaconda3/lib/python3.8/site-packages/utils/__init__.py) Do you have any idea I can solve this problem?

@duongocyen 3 жыл бұрын

you could find the utils file in his github repo. you download it and copy it to your project folder

@harrisonford4103 Жыл бұрын

That's really helpful! Do you have the code for translation_sentence function?

@4SAnalyticsModelling 7 ай бұрын

This video has been very helpful for me to be able to implement a seq2seq model for a (slightly different) time series forecasting task!! Thanks so very much!!

@rishabhahuja2506 3 жыл бұрын

Hi Aladdin, in your implementation you have passed the context vector that we got at the end of the encoder network to the first state of decoder network, then for the next state of decoder network you are using previous hidden state, cell state of decoder network as its context vector. In the lecture, kzbin.info/www/bejne/j3LKm5mDh56Fla8&ab_channel=MichiganOnline at 4:34, the author has used both the previous hidden state, cell state of decoder network and context vector that we got from the encoder network as input to the current state of decoder network. Don't you think that if we use later in our implementation that would give better results and allow the model to give better results??

@shurah100 2 жыл бұрын

Hi @Aladdin Person. I got error statement which NO MODULE NAMED ‘TORCHTEXT’. How can I fix this error? I’m beginner with this PyTorch.

@weneedlittlepatience 2 жыл бұрын

You jave to create a virtual environment with conda first and install the packages in there

@nareshmalviya5232 Жыл бұрын

i got error to import Field could you help me for

@francoisvallee1620 2 жыл бұрын

I am getting : ImportError: cannot import name 'translate_sentence' from 'utils', could anyone help

@francoisvallee1620 2 жыл бұрын

problem resolve

@yuvrajkhanna5841 4 жыл бұрын

awesome video man thx for explaining everything well just a quick question : you made the forward function of the seq2seq to use target values I mean for training it fine but while predicting we won't have that right? I mean I understood we can basically use a while loop and see if x== we stop it there but just curious that did u write the model totally again for testing or something like if model.eval() do this. what I was curious was how did u implement that? and I was just wondering if there was a way to write that code in such a way that I won't need to pass the target in the forward function of that decoder. if possible pls make a vid on that testing part of that model. once again great vid man one of the best explanations I have seen makes me understand not only the concept but also how to implement things you are doing great work

@AladdinPersson 4 жыл бұрын

You're completely right that we need to modify our approach during evaluation. We do this by modifying our loop to predict one time step at a time until we either reach a max_length allowed or if we reach an EOS like you mentioned. If you want to see how I implemented it you can see the code for it here: github.com/AladdinPerzon/Machine-Learning-Collection/blob/master/ML/Pytorch/more_advanced/Seq2Seq/utils.py

@raivenhasan 3 жыл бұрын

Do you need to set up a deep learning environment before running this?

@willieman2532 Жыл бұрын

ty, this helped me in my hw assignment implementing Seq2Seq!

@dockertutorial778 3 жыл бұрын

Thanks for sharing, i find answer in your video on how to get one single sentence translation result.

@vansadiakartik 4 жыл бұрын

These tutorials are very helpful. Keep up the good work mate.

@AladdinPersson 4 жыл бұрын

Appreciate that a lot, thank you!

@alexkonopatski429 3 жыл бұрын

Hi, I am from Germany and hell yeah I love this Video!

@vansadiakartik 4 жыл бұрын

Hey, here in seq2seq forward we keep the outputs[0] as zero and the target[0]=vocab.english.stoi[""]. Should we not keep outputs[0,:,vocab.english.stoi[""]=1

@AladdinPersson 4 жыл бұрын

We keep the outputs[0] as zero because doing in this way makes the indexing convenient (although you can quite easily change this to start at 0 too) and then we simply ignore this zero element by doing outputs[1:] when we send it to the cross entropy loss. It would be helpful if you could refer to the lines from Github that you feel are wrong/confusing and propose your alternatives, it's a bit difficult to follow now

@supervince110 3 жыл бұрын

Dude, I learn more from than from uni.

@snehagodwani2509 3 жыл бұрын

Hello I have noticed that in seq2seq class you have given target as input but how we will deal with this in validation time, that time we don't want to pass target data.

@felixmohr8354 Жыл бұрын

I noticed the same thing, and I guess you are right. One should probably have `target = None` and then make a case distinction. And in fact I think that there is a mistake in this function, because the prediction might actually be *longer* than the length of the target sequence. In that case, the rest of the prediction is just ignored. This has implications for the loss computation. The predictions should be at least some number of steps T forward after the target sequence length has been matched. But this would also require the training batch to be padded to some higher number of tokens.

@tianyiwang7930 4 жыл бұрын

It is amazing!! I'm learning NLP and AI and your videos just perfectly solve my problems.

@AladdinPersson 4 жыл бұрын

Happy to hear you found them useful, means a lot to me that you take the time to write the kind words:)

@tianyiwang7930 4 жыл бұрын

@@AladdinPersson I've watched several more videos, but still need to catch up because I am a new subscriber. I'm still on my way learning, and I have some questions regarding this topic. Please don't be mad if I asked a stupid question : - ). Seq2Seq works for language translation with its advantage of inconsistency of input/output lengths. However, I didn't quite get the part that how a german sentence of length 12 will be translated to english sentence of length 16 (while during the training process, the translated sentence length also varies. It really confused me. I had experience with LSTM on text classification, and in that project the output of my LSTM will always have the same length.) I understand there must be some ways but I couldn't quite get that. Could you please explain more on how this works?

@seanbenhur 3 жыл бұрын

@@tianyiwang7930 if I am not mistaken..all the inputs are padded..both sentences!!

@tianyiwang7930 3 жыл бұрын

@@seanbenhur That makes sense then, thx.

@AmitGupta-pm8iw 3 жыл бұрын

Hi first of all thanks a lot for the awesome work you are doing. I have implemented and tested your code on other sentence and the model was not able to translate correctly even a single word.

@andyfeng6 2 жыл бұрын

Thanks for your detailed sharing, I got a problem, normally the batch size is the first dimension of the input, but in this seq2seq model it’s the second dimension, is there any knows the reason?

@weiyingwang2533 Жыл бұрын

I am confused here as well. If anyone gets the answer, I would appreciate they can reply. 😃

@qiguosun129 3 жыл бұрын

Thanks for this tutorial. I just do a similar project about GNN combined with encoder-decoder architecture, this video helps me a lot.

@khanhaominh6265 3 жыл бұрын

thank you for sharing video

@yasinugur9805 4 жыл бұрын

Great tutorial but I could not try it with my dataset. When I run for loop for batch it return "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn ". I dont know why. Can you help me about it with mail?

@AladdinPersson 4 жыл бұрын

I am much more active on KZbin so it's better if you ask on this platform. I'm not sure what could be wrong with the data loading approach, I do have an entire video dedicated to this topic of loading text datasets, wish you the best of luck on this

@Virtualexist Жыл бұрын

Ig a lot has changed can’t even use imports correctly now. And language models en, de also don’t load. 🥲🥲🥲🥲🥲

@mahdidehshiri1832 3 жыл бұрын

thanks a lot for making these excellent tutorials, you are best

@user-or7ji5hv8y 3 жыл бұрын

Given that the first input into the decoder is , would the first prediction be bad and unlikely result in the word, 'Good'?

@AladdinPersson 3 жыл бұрын

Was a while ago I did this, any specific time stamp? The first input shouldn't have a major influence because we're passing through the context vector, so if it does that it's probably just because it needs more training

@100sourav100 3 жыл бұрын

Great tutorial! Just a question: when predicting/testing the model with German sentences don't we update the teacher_force_ratio?

@phsyco202 4 жыл бұрын

Hey, can you please tell me how do you deal with words that do not have any vector representation(OOV words) but still exists in your data. Great explanation tho!.

@AladdinPersson 4 жыл бұрын

We loaded the data using torchtext and it also takes cares of oov tokens, in this case we just used the default which is "" but you can also specify it which you do in Field if I remember correctly

@mrunalnasery9415 Жыл бұрын

what version of toechtext are we using?

@henryli85 11 ай бұрын

torchtext 0.6.0 and torch 1.13.0, just import torchtext, no need for the .legacy part

@yutaizhou8822 4 жыл бұрын

I think standardizing your future videos using einsum and namedTensor would be great! Your video on einsum was wonderful. I think NamedTensor would be something that you'd enjoy, too. That, and maybe explore PyTorch Lightning?

@AladdinPersson 4 жыл бұрын

So far I haven't used Lightning, but it seems like a great library. I think I haven't come across the use cases of it yet, for example I haven't had models train on a TPU or using 16-bit precision

@fedjioraymond8039 3 жыл бұрын

Thanks very much

@kirillkonovalov9072 3 жыл бұрын

Love the tutorial. Thank you!

@seanbenhur 3 жыл бұрын

Thanks for making these videos..i have a doubt..Is it possible to use this same approach for any other language translation!?..and how hard is to deploy this!?

@AladdinPersson 3 жыл бұрын

Yeah the approach should work in general, for deployment I'm not sure, haven't tried it!

@seanbenhur 3 жыл бұрын

@@AladdinPersson Thanks...expecting a deployment tutorial from u!!

@user-or7ji5hv8y 4 жыл бұрын

by far, this is the best presentation. in a way, I am struggling to keep track of all the shapes. I guess, there is no easy way other than keep track of all of them. I noticed that the shapes of inp_data and target changes with every batch, yet within the batch, all samples have the same shape? How is that possible? Are they padded within each batch to have the same length?

@AladdinPersson 4 жыл бұрын

Yes you're correct, we need to pad each batch to have the same length, and this is from how a tensor is constructed. We can't have variably length of a dimension in a tensor

@subaruhassufferredenough7892 4 жыл бұрын

Thanks for this informative video. Could you explain how you were able to remember the shapes of all the matrices correctly.

@AladdinPersson 4 жыл бұрын

I had gone through everything for the video multiple times so just remembered it I guess :)

@thecros1076 4 жыл бұрын

hey can this be used to make text to speech as text and speech are sequence to sequence ....if not i would love to see a video of how to implement text to speech project using pytorch

@AladdinPersson 4 жыл бұрын

CROS comin with them great 💡! I have a few videos that I want to do but will definitely get back to this one in the future 👊

@thecros1076 4 жыл бұрын

@@AladdinPersson please do ❤️❤️will surely like to see this one soon❤️❤️

@stephennfernandes 4 жыл бұрын

Great video .. I'm a new subscriber , and a huge fan of your work .. keep up the great work 🔥

@AladdinPersson 4 жыл бұрын

Thank you, I appreciate you saying that more than you know! :)

@arunavamaulik19 4 жыл бұрын

very impressive! Thank you for the video, subbed

@AladdinPersson 4 жыл бұрын

Thank you:)

@simarpreetsingh019 4 жыл бұрын

Is torch's gpu version necessary for running this? And I got stuck at last function, the program got stuck at the epoch function after first iteration. I am running it in jupyter notebook, utils file also included in same directory, I am using torch's CPU version.

@AladdinPersson 4 жыл бұрын

No it's not necessary but it might take a long time running on the CPU (maybe that's why you think its stuck?). Where exactly does it get stuck?

@simarpreetsingh019 4 жыл бұрын

It stuck after I got an output for epoch 0 /100

@AladdinPersson 4 жыл бұрын

@SIMAR PREET SINGH I don't think that means it's stuck, it's probably training and just taking a very long time for you since you're on the CPU. Try printing loss.item() and setting batch size to 1: do you get anything printed? I mean if you're not getting an error it's most likely training.

@simarpreetsingh019 4 жыл бұрын

So it doesn't work offline, so I have to switch to Google colab, it worked there fine and got results. Thanks for help. And thankyou for the video lesson

@BL0WUPFISH 4 жыл бұрын

Hey man, I've noticed in some implementations people use the hidden_size as the embedding dimension instead of a separate set value embedding_size, what is the reason for this?

@AladdinPersson 4 жыл бұрын

Simplicity I guess, they aren't really related at least in my view. Could you provide an example where you've seen this and I could give better reasoning.

@BL0WUPFISH 4 жыл бұрын

@@AladdinPersson I first saw it here: pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html , the official tutorial. If you look under the encoder and decoder you will see hidden_size as the dimension of the embedding

@ashishjohnsonburself 3 жыл бұрын

Hi could you post the link to the last video which you have mentioned at 0:35 ?

@AladdinPersson 3 жыл бұрын

I made a paper review of the first Seq2Seq paper but in retrospect I do not think it lived up to the standard I want to set so I unlisted it. I'll add it so it's accessible for members if you still want to take a look though

@AladdinPersson 3 жыл бұрын

Let me know if it doesn't work since KZbin told me it's still in beta version, but link is here: kzbin.info/www/bejne/i5qQfGeIitB2a5o

@ashishjohnsonburself 3 жыл бұрын

@@AladdinPersson I watched the video. I do acknowledge that making a paper review is challenging at times, when you have time constraints but you did a decent job.

@Robert-fp3rs 4 жыл бұрын

Hey man, I'm having some cuda errors on runtime and noticed that it consumed my whole GPU memory. May I ask if what GPU is this trained on? Thank you.

@AladdinPersson 4 жыл бұрын

It was run on a GTX 1060 with 6 GB of VRAM so nothing extreme is needed. How much VRAM does your GPU have? I suggest decreasing batch_size from 64 to 2/4/8/16/32, embedding size to 200, hidden size to 256 and change num_layers to 1 if you have to. Let me know if you're able to run it then :)

@Robert-fp3rs 4 жыл бұрын

@@AladdinPersson Oh now i get, I'm using Google Colab's GPU, the VRAM is 12gb i think. I'm trying to train it on my custom dataset which has 70k rows of data. Unfortunately I only have 1 GPU and can't load it all on the the GPU so i have to get a subset of my current training data, may you suggest any workaround on this issue? Plus I'm conducting an experiment on using pre-trained embeddings and came across this paper: www.aclweb.org/anthology/N18-2084/ Do you think this will work?

@Robert-fp3rs 4 жыл бұрын

@@AladdinPersson And also, what's the loss of your training?

@ahmedkadmiri3528 3 жыл бұрын

the code source please ?

@dogkansarac4889 3 жыл бұрын

check his github repo

@Ramm165 4 жыл бұрын

Great work thanks for the video :)

@AladdinPersson 4 жыл бұрын

Thank you :)

@Ramm165 4 жыл бұрын

@@AladdinPersson Hi aladdin in the character level lstm you have unsqueezed the embedding output in the lstm layer but here it is not unsqueezed may i know y ?

@risheshgarg9990 4 жыл бұрын

Very helpful content....I was looking for some tutorials on attention and transformer networks and came across your work - you taught me a lot Sir...Keep it up.

@AladdinPersson 4 жыл бұрын

Thank you for saying that, appreciate it a lot.