Рет қаралды 47,249
I made this video to illustrate the difference between how a Transformer is used at inference time (i.e. when generating text) vs. how a Transformer is trained.
Disclaimer: this video assumes that you are familiar with the basics of deep learning, and that you've used HuggingFace Transformers at least once. If that's not the case, I highly recommend this course: cs231n.stanford.edu/ which will teach you the basics of deep learning. To learn HuggingFace, I recommend our free course: huggingface.co/course.
The video goes in detail explaining the difference between input_ids, decoder_input_ids and labels:
- the input_ids are the inputs to the encoder
- the decoder_input_ids are the inputs to the decoder
- the labels are the targets for the decoder.
Resources:
- Transformer paper: arxiv.org/abs/1706.03762
- Jay Allamar's The Illustrated Transformer blog post: jalammar.github.io/illustrate...
- HuggingFace Transformers: github.com/huggingface/transf...
- Transformers-Tutorials, a repository containing several demos for Transformer-based models: github.com/NielsRogge/Transfo....