Which transformer architecture is best? Encoder-only vs Encoder-decoder vs Decoder-only models

  Рет қаралды 23,867

Efficient NLP

Efficient NLP

Күн бұрын

Пікірлер: 29
@sp5394
@sp5394 2 ай бұрын
Thank you very much. Great video! Clear, concise and yet covers most of the necessary details.
@sumukhas5418
@sumukhas5418 Жыл бұрын
Great video, learnt a lot on how models work Looking forward on more videos like these 😊
@tukarampriyolkar3608
@tukarampriyolkar3608 2 ай бұрын
Awesome explanation!
@groundingtiming
@groundingtiming Жыл бұрын
great video, can you make one with more detail focusing on the why ?
@chitranair1105
@chitranair1105 9 ай бұрын
Good explanation. Thanks!
@nudelsuppenzauberer3367
@nudelsuppenzauberer3367 7 ай бұрын
I think u safe my exams ty man
@xflory26x
@xflory26x Жыл бұрын
It's still not clear what the difference between the three are - how are they different in terms of the way they process the text? How is the encoder-decoder different to the decoder only - if both of them are autoregressive?
@EfficientNLP
@EfficientNLP Жыл бұрын
Indeed they have a lot in common and both encoder-decoder and decoder-only models do autoregressive decoding. The main difference is encoder-decoder models make an architectural distinction between the input and output, in encoder-decoder models typically there is a cross-attention mechanism in the decoder, which is not present in decoder-only models.
@chrisogonas
@chrisogonas 11 ай бұрын
Well illustrated. Thanks
@Monoglossia
@Monoglossia Жыл бұрын
Very clear, thank you!
@kevon217
@kevon217 Жыл бұрын
Great overview!
@WhatsAI
@WhatsAI Жыл бұрын
Great video Bai! :)
@ZivShemesh
@ZivShemesh 11 ай бұрын
Thank you very much, very helpful!
@arabindabhattacharjee9774
@arabindabhattacharjee9774 10 ай бұрын
One thing which I still didnot understand was, how decoder only model works, when the encoder is not there? What part ensures that the sequence of inputs are managed and do not get jumbled up for a correct output?
@EfficientNLP
@EfficientNLP 10 ай бұрын
In the decoder-only model, the input is provided as a prompt or prefix, which the model uses to generate subsequent tokens. As for how they don't get jumbled up - they use positional encodings to convey information about word order. I have some videos about how positional encodings work if you're interested.
@desrucca
@desrucca 8 ай бұрын
​@@EfficientNLPIve tried prompting a conversational chatbot in transformers library Python. But I found out decoder-only (causal) model is slower by many times compared to (seq2seq) encoder-decoder model. Why is that?
@prabhakarnimmagadda6599
@prabhakarnimmagadda6599 Жыл бұрын
Good bro
@kaustuvray5066
@kaustuvray5066 9 ай бұрын
at 3:08 why does the encoder take 4 timesteps? Isnt the encoder supposed to be parallel?
@EfficientNLP
@EfficientNLP 9 ай бұрын
You’re right, transformer encoders process all the input in parallel. However, encoders are not always transformers, and in this case the figure shows an example of the older RNN/LSTM type of encoder.
@MannyBernabe
@MannyBernabe 7 ай бұрын
thx
@Sessrikant
@Sessrikant 8 ай бұрын
Thanks but not clear. Do you think encoder only or encoder-decoder is a matter of past as chatGPT now takes speech as input means speech to text its able to process?
@EfficientNLP
@EfficientNLP 8 ай бұрын
Speech-to-text models generally use encoder-decoder architectures and cannot be handled by decoder-only model. ChatGPT I believe uses a separate speech model to transcribe before the main text based model.
@Sessrikant
@Sessrikant 8 ай бұрын
@@EfficientNLPOn decoder-only architecture for speech-to-text and large language model integration Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been explored well. The "decoder-only" architecture has also not been well studied for speech processing tasks. In this research, we introduce Speech-LLaMA, a novel approach that effectively incorporates acoustic information into text-based large language models. Our method leverages Connectionist Temporal Classification and a simple audio encoder to map the compressed acoustic features to the continuous semantic space of the LLM. In addition, we further probe the decoder-only architecture for speech-to-text tasks by training a smaller scale randomly initialized speech-LLaMA model from speech-text paired data alone. We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines, highlighting the potential advantages of decoder-only models for speech-to-text conversion. arXiv:2205.01086
@MrFromminsk
@MrFromminsk 9 ай бұрын
If the decoder only models can be used for summarization, translation, etc, why do we even need encoders?
@EfficientNLP
@EfficientNLP 9 ай бұрын
For many tasks like summarization, both decoder-only and encoder-decoder architectures are viable. However, encoder-decoder architectures are preferred for certain tasks that are naturally sequence-to-sequence, like machine translation. Furthermore, for tasks involving different modalities, such as speech-to-text, only encoder-decoder models will work; you cannot use a decoder-only model.
@saramoeini4286
@saramoeini4286 4 ай бұрын
Hi. Thanks for your video If my encoder produce series of tags for each word in input sentence and I want to use that tags for generating text that is correct based on input and generated tags of encoder, how can i use decoder for this?
@EfficientNLP
@EfficientNLP 4 ай бұрын
I don't know of any model specifically designed for this, but one approach is to use a decoder model, where you can feed the text and tags in as a prompt (you may experiment with different ways of encoding this and see what works best).
@saramoeini4286
@saramoeini4286 4 ай бұрын
@@EfficientNLP Thank you.
How is Beam Search Really Implemented?
8:15
Efficient NLP
Рет қаралды 11 М.
Decoder-Only Transformers, ChatGPTs specific Transformer, Clearly Explained!!!
36:45
StatQuest with Josh Starmer
Рет қаралды 121 М.
Players vs Corner Flags 🤯
00:28
LE FOOT EN VIDÉO
Рет қаралды 38 МЛН
Will A Guitar Boat Hold My Weight?
00:20
MrBeast
Рет қаралды 199 МЛН
Секрет фокусника! #shorts
00:15
Роман Magic
Рет қаралды 114 МЛН
Bike Vs Tricycle Fast Challenge
00:43
Russo
Рет қаралды 38 МЛН
Transformers, explained: Understand the model behind GPT, BERT, and T5
9:11
The KV Cache: Memory Usage in Transformers
8:33
Efficient NLP
Рет қаралды 38 М.
Query, Key and Value Matrix for Attention Mechanisms in Large Language Models
18:22
Machine Learning Courses
Рет қаралды 3,5 М.
Sequence-to-Sequence (seq2seq) Encoder-Decoder Neural Networks, Clearly Explained!!!
16:50
StatQuest with Josh Starmer
Рет қаралды 188 М.
Transformer models: Encoder-Decoders
6:47
HuggingFace
Рет қаралды 69 М.
Attention in transformers, visually explained | Chapter 6, Deep Learning
26:10
How a Transformer works at inference vs training time
49:53
Niels Rogge
Рет қаралды 53 М.
What are Transformer Neural Networks?
16:44
Ari Seff
Рет қаралды 162 М.
Players vs Corner Flags 🤯
00:28
LE FOOT EN VIDÉO
Рет қаралды 38 МЛН