What are Transformer Models and How do they Work?

Рет қаралды 17,963

Күн бұрын

This video is part of LLM University
docs.cohere.com/docs/transfor...
Transformers are a new development in machine learning that have been making a lot of noise lately. They are incredibly good at keeping track of context, and this is why the text that they write makes sense. In this blog post, we will go over their architecture and how they work.
Bio:
Luis Serrano is the lead of developer relations at Co:here. Previously he has been a research scientist and an educator in machine learning and quantum computing. Luis did his PhD in mathematics at the University of Michigan, before embarking to Silicon Valley to work at several companies like Google and Apple. Luis is the author of the Amazon best-seller "Grokking Machine Learning", where he explains machine learning in a clear and concise way, and he is the creator of the educational KZbin channel "Serrano.Academy", with over 100K subscribers and 5M views.
===
Resources:
Blog post: txt.cohere.com/what-is-semant...
Learn more: / luisserrano
Neural Networks: • A friendly introductio...
Attention Models: • What is Attention in L...

Пікірлер: 18

@LiveNobin 10 ай бұрын

I try on internet several times to understand the very basic of What are Transformer Models and How do they Work but not found very good explanations like this. Thank you for this very meaningful information.

@jpvalois 4 ай бұрын

Thanks Luis! Your explanation hits just the right notes for me: no fluff, not too complex, well structured, logical, good rhythm. Excellent overall. I'll be checking out your other material. Merci beaucoup!

@khanshovon 11 күн бұрын

Explained well and simply.

@Vitanzi Жыл бұрын

Great Explanation, thanks...

@junpyohong2132 Жыл бұрын

Thanks! It's a really clean and straightforward explanation.

@alagoumiri Ай бұрын

🎯 Key Takeaways for quick navigation: 00:00 *🤖 Introduction to Transformer models* - Transformer models are key to recent advancements in NLP tasks like text generation and semantic search - They can capture context better than previous models, which was a major challenge 01:02 *🧠 How previous neural network models worked* - Input words were represented as 0/1 vectors - Neural network tried to mimic patterns from data to predict next word - But lacked understanding of overall context beyond a few words 02:13 *🌐 Transformer models capture context* - Unlike neural nets, transformers can understand and generate text with coherent context - They build up text output one word at a time based on the context 03:22 *🛠️ Architecture of Transformer models* - Has components like embeddings, positional encoding, attention, feed-forward layers - Attention is the key mechanism that allows capturing context 06:08 *⚖️ How attention works* - Allows focusing on relevant words for context via "gravitational" pull - Multi-headed attention uses multiple representations for richer context 09:51 *🎲 Softmax for probabilistic output* - Converts scores to probabilities to get varied outputs - Allows sampling different word choices instead of same answer always 10:59 *👨‍💻 Post-training for specific tasks* - General pre-training data is not enough for targeted use cases - Requires further training on curated question-answer and conversational data - Allows specializing transformer for tasks like open-domain QA, chatbots etc. Made with HARPA AI

@paragjain12 8 ай бұрын

indeed very clear explanation... that's how we need to teach convoluted concepts

@sithembisodyubele1156 Жыл бұрын

You are the best....great explanation

@user-eg8mt4im1i 6 ай бұрын

Super clear thanks !!😊

@mikecane 11 ай бұрын

Thanks very much!

@sunnykumar1425 5 күн бұрын

awesome

@rollingstone1784 2 ай бұрын

@cohere, @jpvalois: Excellent video however, there are some inaccuracies: 6:00 instead of "series of transformer blocks", should be "series of transformers" only? (or: attention block and feedforward block?). The description here says "three transformer layer"? The description also says "attention blocks". 09:00: the attention and feedforwarding blocks should be left to right and the arrows also left to right - this reflects the flow of the data better 09:20: should be "feedforwarding layer" instead of only "layer"? 09:40: is the first layer not an attention layer? And could an attention layer and a feedforwarding layer be combined to a transformer layer? (see "transformer blocks" on 06:00)

@sithembisodyubele1156 Жыл бұрын

How can I enrol on your courses? I need more of this ..especially autoencoders.

@scchouhansanjay Жыл бұрын

This explanation contains only the transformer’s encoder part. Which is used in BERT. But GPT model usage only the decoders. Please let me know if I am wrong?

@bastabey2652 Жыл бұрын

Bing chatGPT agrees with your statement That is correct. BERT is a pre-trained transformer model that only uses the encoder part of the transformer architecture. BERT is designed for natural language understanding tasks, such as question answering, sentiment analysis, and named entity recognition. BERT can process both left and right context of a given word, and can handle both single-sentence and sentence-pair inputs. GPT is another pre-trained transformer model that only uses the decoder part of the transformer architecture. GPT is designed for natural language generation tasks, such as text summarization, text completion, and text generation. GPT can only process the left context of a given word, and can only handle single-sentence inputs.

@user-er2uw8eu8m 9 ай бұрын

Is this post training called prompt engineering or fine tuning?

@mobime6682 4 ай бұрын

Described seemed more about local content than conversation threads.

@EnricoGolfettoMasella 7 ай бұрын

I watched a video about transformers just before yours that made me literally dizzy 😝. After watching your video I no longer think this is UFO technology (not as much as before, let’s say)