What are Transformer Models and How do they Work?

  Рет қаралды 17,963

Cohere

Cohere

Күн бұрын

This video is part of LLM University
docs.cohere.com/docs/transfor...
Transformers are a new development in machine learning that have been making a lot of noise lately. They are incredibly good at keeping track of context, and this is why the text that they write makes sense. In this blog post, we will go over their architecture and how they work.
Bio:
Luis Serrano is the lead of developer relations at Co:here. Previously he has been a research scientist and an educator in machine learning and quantum computing. Luis did his PhD in mathematics at the University of Michigan, before embarking to Silicon Valley to work at several companies like Google and Apple. Luis is the author of the Amazon best-seller "Grokking Machine Learning", where he explains machine learning in a clear and concise way, and he is the creator of the educational KZbin channel "Serrano.Academy", with over 100K subscribers and 5M views.
===
Resources:
Blog post: txt.cohere.com/what-is-semant...
Learn more: / luisserrano
Neural Networks: • A friendly introductio...
Attention Models: • What is Attention in L...

Пікірлер: 18
@LiveNobin
@LiveNobin 10 ай бұрын
I try on internet several times to understand the very basic of What are Transformer Models and How do they Work but not found very good explanations like this. Thank you for this very meaningful information.
@jpvalois
@jpvalois 4 ай бұрын
Thanks Luis! Your explanation hits just the right notes for me: no fluff, not too complex, well structured, logical, good rhythm. Excellent overall. I'll be checking out your other material. Merci beaucoup!
@khanshovon
@khanshovon 11 күн бұрын
Explained well and simply.
@Vitanzi
@Vitanzi Жыл бұрын
Great Explanation, thanks...
@junpyohong2132
@junpyohong2132 Жыл бұрын
Thanks! It's a really clean and straightforward explanation.
@alagoumiri
@alagoumiri Ай бұрын
🎯 Key Takeaways for quick navigation: 00:00 *🤖 Introduction to Transformer models* - Transformer models are key to recent advancements in NLP tasks like text generation and semantic search - They can capture context better than previous models, which was a major challenge 01:02 *🧠 How previous neural network models worked* - Input words were represented as 0/1 vectors - Neural network tried to mimic patterns from data to predict next word - But lacked understanding of overall context beyond a few words 02:13 *🌐 Transformer models capture context* - Unlike neural nets, transformers can understand and generate text with coherent context - They build up text output one word at a time based on the context 03:22 *🛠️ Architecture of Transformer models* - Has components like embeddings, positional encoding, attention, feed-forward layers - Attention is the key mechanism that allows capturing context 06:08 *⚖️ How attention works* - Allows focusing on relevant words for context via "gravitational" pull - Multi-headed attention uses multiple representations for richer context 09:51 *🎲 Softmax for probabilistic output* - Converts scores to probabilities to get varied outputs - Allows sampling different word choices instead of same answer always 10:59 *👨‍💻 Post-training for specific tasks* - General pre-training data is not enough for targeted use cases - Requires further training on curated question-answer and conversational data - Allows specializing transformer for tasks like open-domain QA, chatbots etc. Made with HARPA AI
@paragjain12
@paragjain12 8 ай бұрын
indeed very clear explanation... that's how we need to teach convoluted concepts
@sithembisodyubele1156
@sithembisodyubele1156 Жыл бұрын
You are the best....great explanation
@user-eg8mt4im1i
@user-eg8mt4im1i 6 ай бұрын
Super clear thanks !!😊
@mikecane
@mikecane 11 ай бұрын
Thanks very much!
@sunnykumar1425
@sunnykumar1425 5 күн бұрын
awesome
@rollingstone1784
@rollingstone1784 2 ай бұрын
@cohere, @jpvalois: Excellent video however, there are some inaccuracies: 6:00 instead of "series of transformer blocks", should be "series of transformers" only? (or: attention block and feedforward block?). The description here says "three transformer layer"? The description also says "attention blocks". 09:00: the attention and feedforwarding blocks should be left to right and the arrows also left to right - this reflects the flow of the data better 09:20: should be "feedforwarding layer" instead of only "layer"? 09:40: is the first layer not an attention layer? And could an attention layer and a feedforwarding layer be combined to a transformer layer? (see "transformer blocks" on 06:00)
@sithembisodyubele1156
@sithembisodyubele1156 Жыл бұрын
How can I enrol on your courses? I need more of this ..especially autoencoders.
@scchouhansanjay
@scchouhansanjay Жыл бұрын
This explanation contains only the transformer’s encoder part. Which is used in BERT. But GPT model usage only the decoders. Please let me know if I am wrong?
@bastabey2652
@bastabey2652 Жыл бұрын
Bing chatGPT agrees with your statement That is correct. BERT is a pre-trained transformer model that only uses the encoder part of the transformer architecture. BERT is designed for natural language understanding tasks, such as question answering, sentiment analysis, and named entity recognition. BERT can process both left and right context of a given word, and can handle both single-sentence and sentence-pair inputs. GPT is another pre-trained transformer model that only uses the decoder part of the transformer architecture. GPT is designed for natural language generation tasks, such as text summarization, text completion, and text generation. GPT can only process the left context of a given word, and can only handle single-sentence inputs.
@user-er2uw8eu8m
@user-er2uw8eu8m 9 ай бұрын
Is this post training called prompt engineering or fine tuning?
@mobime6682
@mobime6682 4 ай бұрын
Described seemed more about local content than conversation threads.
@EnricoGolfettoMasella
@EnricoGolfettoMasella 7 ай бұрын
I watched a video about transformers just before yours that made me literally dizzy 😝. After watching your video I no longer think this is UFO technology (not as much as before, let’s say)
What Are Word and Sentence Embeddings?
8:24
Cohere
Рет қаралды 24 М.
What are Transformer Models and how do they work?
44:26
Serrano.Academy
Рет қаралды 98 М.
The Worlds Most Powerfull Batteries !
00:48
Woody & Kleiny
Рет қаралды 27 МЛН
When Steve And His Dog Don'T Give Away To Each Other 😂️
00:21
BigSchool
Рет қаралды 14 МЛН
He tried to save his parking spot, instant karma
00:28
Zach King
Рет қаралды 22 МЛН
Simple Introduction to Large Language Models (LLMs)
25:20
Matthew Berman
Рет қаралды 52 М.
What is Semantic Search?
11:53
Cohere
Рет қаралды 23 М.
Transformers for beginners | What are they and how do they work
19:59
What is Similarity Between Sentences?
7:09
Cohere
Рет қаралды 10 М.
You need to learn AI in 2024! (And here is your roadmap)
45:21
David Bombal
Рет қаралды 638 М.
AI Language Models & Transformers - Computerphile
20:39
Computerphile
Рет қаралды 325 М.
Why Neural Networks can learn (almost) anything
10:30
Emergent Garden
Рет қаралды 1,2 МЛН
Самый топовый ПК без RGB подсветки
1:00
CompShop Shorts
Рет қаралды 55 М.
How charged your battery?
0:14
V.A. show / Магика
Рет қаралды 4,2 МЛН
Урна с айфонами!
0:30
По ту сторону Гугла
Рет қаралды 3 МЛН
ВСЕ МОИ ТЕЛЕФОНЫ
14:31
DimaViper Live
Рет қаралды 44 М.