Stanford CS25: V2 I Introduction to Transformers w/ Andrej Karpathy

Рет қаралды 742,926

Күн бұрын

Пікірлер: 243

@fgfanta 7 күн бұрын

Listening to Karpathy is always a treat. Even more so in these days, with overabundance of on-line training featuring "teachers" who read from a script and put their audience to sleep. When they don't have the script read by some amateur of the subject matter, or even voice actors. With Karpathy instead I feel like a kid who plays soccer at the church but gets to train with Messi!

@Athens1992 Жыл бұрын

what better Friday night with Karpathy expalining transformers love it!!! good night from Greece

@stanfordonline Жыл бұрын

Hi George, thanks for watching. We will be releasing more videos from this series soon - stay tuned!

@Athens1992 Жыл бұрын

@@stanfordonline amazing love Karpathy teaching and how easy he made them be

@rajatpatel5691 Жыл бұрын

@@Athens1992 total agree 💯

@harunyigit897 9 ай бұрын

Good night from turket too

@sumitsp01 Жыл бұрын

I was not aware that Megatron was attending this lecture to understand Transformers. He did ask some great questions 😄

@SampadMohanty7 Жыл бұрын

This is legendary

@yuktikaura Жыл бұрын

Epic😀😀

@existenceisillusion6528 10 ай бұрын

Sounded more like DarkSeid

@alonsogarrote8898 8 ай бұрын

at what min?

@sumitsp01 8 ай бұрын

@@alonsogarrote8898 every time when someone from audience asks a question.

@ahmedivy Жыл бұрын

Pure Gold Content by a LEGEND Teacher 💖

@everydaybob Жыл бұрын

Guys did Andrew Ng help you with audio for this lecture? It's his trademark usually to use "state of the art" mic (filtered by a pillow)

@Juan-n6k3c 2 ай бұрын

Audio Pillow filters are tight!!!

@shauryaseth8859 Жыл бұрын

Andrej is so good that we had Bane sitting in the audience asking questions

@SampadMohanty7 Жыл бұрын

Its Megatron, not Bane

@lukeliem9216 Жыл бұрын

I discover that the best way to understand this lecture is to study in parallel Andrej's "Let's build GPT: from scratch, in code, spelled out" KZbin video. Browsing thru that video give me much better insight into understanding this video. He was directly coding the attention mechanism in PyTorch in that video, and it is fascinating how things just start clicking.😇😀😀

@sapnilpatel1645 Жыл бұрын

True.

@DaveJ6515 Жыл бұрын

"All pieces clicking in place" is exactly the way I was describing the feeling to my students no later than ten minutes ago. You are definitely right.

@rembautimes8808 2 ай бұрын

Yeah Andrej's coding view was so important and a valuable help. Wil watch it again.

@МихаилЧертушкин-я2с Жыл бұрын

Thank you very much! If possible, please keep posting other lectures from 2023 playlist, this is awesome! 👍

@dr.mikeybee Жыл бұрын

The attention mechanism is a dual-embedding architecture. It looks at the probability of two words being next to each other -- at least it uses something like cosine similarity to compare the tokens in a sentence. That's really the basis. For sequence to sequence translation, we use the fact that language has a definite shape inside a semantic space. Once again, we use something like cosine similarity to find a context signature (vectorized representation) that is closest to the context signature of the sequence in the original language.

@sh4ny1 Жыл бұрын

@ 1:11:40 The guest is asking about the attention mechanism communication phase in a data that don't have edge consistencies where connections are changing for example in different molecules that could have same amount and type of atoms but the bond between them is changing. This won't work with the vanilla Transformer architecture where each token is attending to itself and all the other token. so it like a fully connected graph. Alternative way to process this data would be to just use GNNs with attention mechanism that respects these edge connectivities across data. or if one really wants to use transformer for this task we would need to incorporate this prior knowledge of graph connectivity into the transfomer. one recent paper (by Microsoft I think) that achieved this is "graphformer". Cheers !

@jerryyang7011 Жыл бұрын

What a legend Andrej is - the historical context puts quite a bit of "human touch" on Transformers and AI/ML as a whole.

@dr.mikeybee Жыл бұрын

I always listen when Andrej talks.

@RalphDratman 9 ай бұрын

@@dr.mikeybee I love Andrej

@ronus007 7 ай бұрын

Fantastic talk. Asked an LLM to get some highlight ideas from the transcription: Historical Context and Evolution: 1. Prehistoric Era: • Early models like RNNs and LSTMs were good at encoding history but struggled with long sequences and context. • Example: Predicting “French” in “I grew up in France. I speak fluent ___” is difficult for these models. 2. 2017: Attention is All You Need: • The landmark paper by Vaswani et al. introduced the transformer architecture. • Focused on the self-attention mechanism, which allows models to process data more effectively. 3. 2018-2020: Expansion Beyond NLP: • Transformers began being used in various fields beyond NLP, such as computer vision, biology, and robotics. • Google’s quote on improved performance with transformers highlights their impact. 4. 2021-2022: Generative Era: • Introduction of generative models like GPT, DALL-E, and stable diffusion. • Increased capabilities in AI, with models scaling up significantly and being applied to more complex tasks. Technical Deep Dive: 1. Self-Attention Mechanism: • The self-attention mechanism allows models to weigh the importance of different parts of the input data. • It computes the relevance of each word to every other word in a sentence, enabling better context understanding. 2. Multi-Headed Attention: • Multi-headed attention involves running the attention mechanism in parallel multiple times, each with different weights. • This allows the model to focus on different aspects of the data simultaneously. 3. Transformer Architecture: • Consists of encoder and decoder layers that process input and output sequences. • Each layer has self-attention and feed-forward neural network components. 4. Implementation Details: • Use of embedding layers to convert input data into vectors. • Positional encoding to maintain the order of the sequence. • Application of residual connections and layer normalization to stabilize training. Applications and Future Directions: 1. Current Applications: • Computer Vision: Using Vision Transformers (ViTs) to process images by breaking them into patches. • Speech Recognition: Models like Whisper use transformers to process mel-spectrograms of audio. • Reinforcement Learning: Decision transformers model sequences of states, actions, and rewards. • Biology: AlphaFold uses transformers to predict protein structures. 2. Future Potential: • Video Understanding and Generation: Anticipated advances in models capable of processing and generating video content. • Long Sequence Modeling: Improving transformers to handle longer sequences more efficiently. • Domain-Specific Models: Development of specialized models like DoctorGPT or LawyerGPT trained on specific datasets. • Generalized Agents: Creating models like Gato that can perform multiple tasks and handle various inputs. 3. Challenges and Innovations: • External Memory: Enhancing models with long-term memory capabilities. • Computational Complexity: Reducing the quadratic complexity of the attention mechanism. • Controllability: Improving the ability to control and predict model outputs. • Alignment with Human Brain: Researching how to align transformer models with human cognitive processes.

@Kirby-Bernard 10 ай бұрын

@manifestasisanubari 10 ай бұрын

Thanks for the recommendation! ♥

@pictzone 10 ай бұрын

how tf do some people just blatantly copy/paste another comment lol

@RalphDratman 9 ай бұрын

Andrej's "Let's build GPT" video: kzbin.info/www/bejne/oXTGaXmjesdkpLs

@jason_huang03 Жыл бұрын

really looking forward to the rest of vidos of 2023!

@susdoge3767 7 ай бұрын

this is by far the best video on transformer i have seen, kudos

@user-wr4yl7tx3w Жыл бұрын

Audio could be better

@HemangJoshi Жыл бұрын

Definitely

@HemangJoshi Жыл бұрын

Even a $10 mic could give better results than this, they didn't even honor karpathy enough to get a decent mic 🎤 can't believe stanford shot video like this

@frankyvincent366 Жыл бұрын

Yes, there's AI algorithm to improve sounds by suppressing room noise... made using transformers 😅

@miyamotomasao3636 Жыл бұрын

And in English, too !

@recursion. Жыл бұрын

Dude I'm pretty sure they know about this. Be grateful that you're getting access to materials from one of the top schools in America.

@Anonymous-lw1zy 11 ай бұрын

FWIW, at 33:00, for the inputs tensor, plus the last character from the targets tensor (so the first quoted section is 47, 58, 1 51, 59, 57, 58, 1, 40), I get: [["it must b"], [" Get him "], ["come: And"], ["u look'st"]]

@kartikxramesh 2 ай бұрын

Andrej is so inspiring man

@wildwind4725 Жыл бұрын

The year is 2023, and we've AI models capable of writing a decent essay. At the same time, the audio quality in online presentations is sometimes worse than that of the Apollo mission.

@nerouchih3529 7 ай бұрын

28:00 A unique view at attention. In this image all 6 nodes are related with all 6 nodes in self-attention case. And in cross attention it would be like set A sends a message to nodes in set B. And voila, it's a fully-connected layer! But with tokens passed instead of values

@firasobeid70 7 ай бұрын

The Unbelievable effectiveness of RNNs..from that article I learned about Andrej! That helped be develop my first LM in 2020. Meticulous explanations!

@jcorey333 Жыл бұрын

This was amazing to learn about the historical context of transformers! The audio was a bit low quality, but I'm still glad this was posted

@swagatochatterjee7104 Жыл бұрын

I'm a simple man. I see Andrej. I tap the video

@rachadlakis1 6 ай бұрын

It's amazing to see how transformers have revolutionized various fields of Deep Learning. Thank you for sharing this valuable information and resource links. It's truly fascinating to learn about the advancements in AI and the impact it's making across different domains.

@leizhang3329 Жыл бұрын

This video is an introduction to transformers in the field of AI, covering their applications in natural language processing, computer vision, reinforcement learning, and more. The instructors discuss the building blocks of transformers, including attention mechanisms and the use of self-attention and multi-headed attention. They also touch on the flexibility and efficiency of transformers compared to RNNs. Highlights : This section is an introduction to the course on Transformers and the instructors. The course is about deep learning models that have revolutionized the field of AI. Transformers have been applied in various fields such as natural language processing, computer vision, reinforcement learning, biology, and robotics. The instructors have research interests in reinforcement learning, computer vision, NLP, and have publications in robotics and autonomous driving. The message passing scheme in Transformers involves nodes looking at each other, with the decoder only looking at the top nodes. In the cross attention with the decoder, features from the top of the encoder are consumed. Multi-headed attention is the application of the attention scheme multiple times in parallel. Self-attention refers to each node producing a key, query, and value from itself. The section explains the process of combining token embeddings and positional embeddings in a transformer model. Token embeddings and positional embeddings are added together. Optional dropout is applied to the set of words and their positions. The input is fed into blocks of transformers. The output of the transformer is linearly projected to obtain the probability distribution for the next word. The targets, offset by one in time, are used for cross-entropy loss calculation. The blocks in the transformer model have a communication phase and a compute phase. In the communication phase, nodes in the graph communicate with each other. The section discusses different types of transformer models and their training objectives. There are decoder-only models like GPT, encoder-only models like BERT, and encoder-decoder models like T5. BERT is trained using a different objective than language modeling, such as sentiment classification. Transformers are trained using masking and denoising techniques. The connectivity in transformers usually does not change dynamically based on the data. Transformers are flexible and can easily incorporate additional information by chopping it up and feeding it into the model with self-attention mechanism. Whisper is a copy-paste transformer that works well with melSpectrogram. Transformers can be used in RL to model sequences of states, actions, and rewards. Transformers are also used in AlphaFold to model molecules computationally. Transformers can easily incorporate extra information into a ComNet by chopping it up and using self-attention. Transformers are more efficient and optimizable than RNNs due to their shallow wide graph structure, which allows for parallel processing and easy gradient flow. RNNs are inefficient and not optimizable due to their long thin compute graph structure. Transformers have a shallow wide graph structure, which enables quick supervision to input transitions and easy gradient flow. Transformers can process every word in parallel, unlike RNNs which process words sequentially. The efficiency of transformers allows for larger network sizes, which is crucial in deep learning.

@ashutoshnirala2565 Ай бұрын

Damn, the starting audio quality was horrible.. if it was not for the comments below, I would have skipped this video. Its an amazing video with so much of background. Thank you for this.

@mikeiavelli Жыл бұрын

Andrej starts at 10:16

@ericgonzales5057 10 ай бұрын

Please make more videos like this, I need to learn more from Andrej about the code, it would help me with my project so much! i Love how he explains it and that guys question was so dumb! come on!

@dsazz801 Жыл бұрын

Thank you for sharing such a great quality of lecture!

@footfunk510 Жыл бұрын

Thanks for the video. I look forward to watching the upcoming lectures.

@bpmoran89 8 ай бұрын

Describing RNNs and LSTMs as prehistoric is wild

@amywang8711 Жыл бұрын

Great, very insteresting. Thanks for providing vedioes.

@1potdish271 Жыл бұрын

Great Lecture by the Legend "Andrej karpathy".

@liangcheng9856 Жыл бұрын

sound quality plz.

@yuxingben399 Жыл бұрын

Please release more videos from this series.

@stanfordonline Жыл бұрын

Stay tuned! More videos from this series will be published soon.

@ac12484 Жыл бұрын

Very good, thanks Andrei

@albertocubeddu-ai 5 ай бұрын

This is awesome! Thanks for making this lectures public. Minor feedback: Microphone quality is not the greatest.

@AIautopilot Жыл бұрын

This is the funniest moment from the presentation at 🤣1:00:22 . Great video, Andrej is so knowledgeable and down to earth

@ehza Жыл бұрын

Andrej is godsend!

@dr.mikeybee Жыл бұрын

We should also understand the linear operations on weighted representations in the projection matrices. These create a context signature that is easier to compare.

@TwoSetAI Жыл бұрын

cannot hear anything about the questions.

@briancase9527 Жыл бұрын

Wow, I missed this when it was contemporary; glad I found it now at least. Great video with great content! Thanks!

@Bharathkumar-gv4ft Жыл бұрын

Thanks for making this beautiful piece of content available to public!

@stanfordonline Жыл бұрын

Hi Bharath, awesome feedback! Thanks for watching.

@jimshtepa5423 Жыл бұрын

@@stanfordonline do you expect all other lectures to be published on yt?

@stanfordonline Жыл бұрын

@@jimshtepa5423 Hi Jim! We have 3 more lectures that will be published in the coming days and our team is working on making the remaining lectures available.

@Bharathkumar-gv4ft Жыл бұрын

@@stanfordonline That will be great! I am eagerly waiting for the "Neuroscience-Inspired Artificial Intelligence" seminar by Trenton Bricken and Will Dorrell (Mar 7)

@snowman2627 Жыл бұрын

Andrej the best teacher! The node graph analogy is quite intuitive.

@stanfordonline Жыл бұрын

Hi Hao, thanks for watching and for your comment!

@1msirius 2 ай бұрын

thanks a lot Karpathy sensei!

@anthonyrhopkins 16 күн бұрын

In the halls of Stanford, wisdom's light, A course on transformers, shining bright. Not of machines that shift and roam, But of models that bring knowledge home. From text to vision, their reach extends, In every field, their power blends. Instructors with passion, hearts so grand, Guiding us through AI's vast land. In the year of seventeen, a dawn so clear, Attention was born, a breakthrough near. From ancient RNNs to LSTMs, To transformers, the field ascends. Tokens and nodes, keys and queries, In graphs they converse, no need for worries. Self-attention, cross-attention too, In parallel, they process through. From Shakespeare's prose to spectrograms, Transformers learn, they understand. In context, they find their way, Learning from prompts, day by day. A tool of purpose, broad and grand, Optimized for GPU's hand. Efficient, expressive, they transform, In every realm, they become the norm. So here's to wisdom, vast and wide, To transformers, our AI guide. In every heart, let this be known, In every mind, this seed be sown.

@AI_ML_iQ Жыл бұрын

Attention mechanism does not require different matrices for query and key, both in self attention and cross aeration mechanisms. See paper by R. V. R. Pandya titled "Generalized Attention Mechanism and Relative Position for Transformer" .

@temiwale88 11 ай бұрын

I'm, thankfully, not lost. I'm hanging on to these bombs. Thanks Andrej!

@soumilyade1057 Жыл бұрын

Quality of the audio has ruined an otherwise great lecture 😬 see it to if it can be improved...thank you ❤

@dimitargueorguiev9088 9 ай бұрын

I am skeptical about the common sense and logical/causal reasoning capabilities of the Transformer-based architectures. The fact that out of N different scenarios one can see output which in M < N cases it can be explained with adhering to logical/causal reasoning does not mean that the Transformer-based architectures induce logical/causal reasoning.

@davidsewell4999 Жыл бұрын

Is it just my audio or is Satan always the one asking questions in the audience?

@neuralthink 5 ай бұрын

😂

@rudraxxadb 5 ай бұрын

Great content and such a beautiful explanation. Question: At 24:43 when the incoming nodes information is used, shouldn't that be using m.value() instead of m.key()? m.value is what is exposed to others?

@TheBontenbal Жыл бұрын

Great lecture as always (except for audio ;-)) . Is there somebody who has a link to Andrej's code? Thank you.

@jaredthecoder Жыл бұрын

Audio is a little clearer if you put it on .75

@TheNewton 8 ай бұрын

19:47 so is there a functional difference between calling the usage of softmax `attention` instead of the simpler word `search` beyond trying to be catchy?

@tzz27 Жыл бұрын

Always enjoy AI lectures from Stanford.❤

@yuktikaura Жыл бұрын

Great lecture

@IAmScottCarlson Жыл бұрын

It was really really hard to listen to this one due to the Audio Quality, please resolve for any future presentations.

@peteluo5367 Жыл бұрын

Thanks for sharing. This is really useful for me.

@wolpumba4099 9 ай бұрын

*ELI5 Abstract* *Imagine transformers as super-smart LEGO blocks:* * *They learn by paying attention:* Transformers figure out what's important in a bunch of information, just like you focus on the right LEGO piece to build something cool. * *They talk to each other:* Transformers share info, like when you ask a friend to pass a LEGO brick. * *They can be built in many ways:* You can make different things with LEGOs, and transformers can learn to do different stuff too! They can understand words, make pictures, and even play games. * *They get better with practice:* The more you build with LEGOs, the better you get. Transformers get smarter the more they learn from examples, like getting better at building a castle after making a few towers first. * *They need a little help sometimes:* Sometimes you need instructions for a fancy LEGO build. Transformers can also use hints to learn faster, especially when they don't have lots of examples. * *They like to remember things:* Transformers have a scratchpad, just like you use a notebook to remember steps, so they don't forget important stuff. *Transformers are changing the world:* They're like the new building blocks for computers, making them understand us and do much cooler things! *Abstract* This video explores the remarkable transformer architecture, a foundational building block in modern AI. Transformers were introduced in the 2017 paper "Attention is All You Need" and have revolutionized fields like natural language processing (NLP), computer vision, and reinforcement learning. The video delves into several key aspects of transformers: * *Core Concepts:* Attention mechanisms, message passing on directed graphs, and the interplay between communication and computation phases within a transformer block. * *Implementation:* A detailed walkthrough of a minimal transformer implementation (NanoGPT) highlights data preparation, batching, positional encodings, and the essential components of transformer blocks. * *Transformers Across Domains:* The ease with which transformers adapt to diverse modalities (images, speech, reinforcement learning) underscores their flexibility. * *Meta-Learning Capabilities:* Transformers exhibit in-context learning or meta-learning capabilities, highlighted by the GPT-3 model. This suggests potential for gradient-like learning within transformer activations. * *Optimizability and Efficiency:* Transformers are designed to be highly optimizable by gradient descent and computationally efficient on GPUs, key factors in their widespread adoption. * *Inductive Biases and Memory:* While inherently general, transformers can incorporate inductive biases and expand memory via techniques like scratchpads, demonstrating adaptability. The video also includes discussions on the historical context of transformers, their relationship to neural networks, and potential future directions in AI. *Keywords:* Transformers, Attention, Deep Learning, NLP, Computer Vision See also: kzbin.info/www/bejne/oXTGaXmjesdkpLs

@wolpumba4099 9 ай бұрын

*Summary* *Introduction to Transformers* * *0:05** - Welcome and course overview:* Introduction to a course focused on transformers in artificial intelligence (AI). * *0:52** - Instructors introduce themselves:* The course instructors share their backgrounds. *Foundations of Transformers* * *3:24** - Introduction to transformers:* The basics of transformer architecture are explained. * *3:38** - Explanation of the attention timeline:* Discussion of how attention mechanisms developed over time. *Understanding and Implementing Transformers* * *3:51** - Transformer Evolution:* Progression from RNNs, LSTMs, and simple attention to the dominance of transformers in NLP, vision, biology, robotics, and generative models. * *10:18** - Andrej Karpathy presents on transformers* Karpathy provides historical context on why transformers are important and their evolution from pre-deep learning approaches. * *15:15** - Origins of the Transformer* Exploration of foundational papers on neural machine translation and the introduction of attention to solve the "encoder bottleneck" problem. * *20:13** - Attention is All You Need:* Discussion of the landmark 2017 paper, its innovations, and core concepts behind the transformer (attention, positional encoding, residual networks, layer normalization, multi-headed attention). * *22:36** - The Speaker's view on Attention:* A unique perspective on attention as a communication phase intertwined with computation. * *25:13** - Attention as Message Passing:* Explanation of attention as nodes in a graph communicating with "key", "query", and "value" vectors. Python code illustrates the process. * *30:58** - NanoGPT: Transformer Implementation* Introduction of NanoGPT, a minimal transformer the speaker created to reproduce GPT-2, followed by in-depth explanations of its components, data preparation, batching, and block structure. *Transformers: Applications and Future Directions* * *52:56** - Transformers Across Domains:* How transformers are adapted for images, speech recognition, reinforcement learning, and even biology (AlphaFold). * *54:26** - Flexibility with Multiple Inputs:* The ease of incorporating diverse information into transformers. * *55:43** - What Makes Transformers Special?:* Highlighting in-context learning (meta-learning), potential for gradient-like learning within activations, and the speaker's insights shared via tweets. * *58:27** - The Essence of Transformers:* Three key properties: expressiveness, optimizability, and efficiency on GPUs. * *59:51** - Transformers as General Purpose Computers Over Text:* Analogy comparing powerful transformers to computers executing natural language programs. * *1:06:28** - Inductive Biases in Transformers:* The balance between data and manual knowledge encoding, and how to modify transformer encodings. * *1:08:42** - Expanding Transformer Memory:* The "scratchpad" concept for extending memory. *Questions and Answers* * *27:30** - Q&A: Self-Attention vs. Multi-headed Attention* Explaining the differences and purposes. * *46:12** - Q&A: Dynamic Connectivity in Transformers* Discussion on graph connectivity in transformers. * *50:20** - Q&A: Future Directions* Exploring beyond autoregressive models and the relation to graph neural networks. * *1:02:01** - Q&A: RNNs vs. Transformers* Contrasting the limitations of RNNs and the strengths of transformers. * *1:04:21** - Q&A: Multimodal Inputs* How transformers handle diverse data types. * *1:10:09** - Q&A: ChatGPT* The speaker's limited exploration of ChatGPT. * *1:10:41** - Q&A: S4 Architecture and Speaker's Next Steps* Focus on NanoGPT for GPT-like models and interest in building a "Google++" inspired by ChatGPT. Disclaimer: I used gemini advanced 1.0 (2024.03.03) to summarize the video transcript. This method may make mistakes in recognizing words and it can't distinguish between speakers.

@23232323rdurian Жыл бұрын

the AUDIO is real choppy.....hard to make out the words spoken...but great lecture

@amoghjain 11 ай бұрын

Hello!! Thank you for sharing the talk!! is it possible to share the slides as well?? Thanks

@jbperez808 Жыл бұрын

@4:09 "performance increased every time we fired our linguists..." if you listen closely. The auto-transcript caught more of it than the human one.

@vimukthirandika872 10 ай бұрын

awesome!

@1ntrcnnctr608 Жыл бұрын

when auto "mastering"/EQ of audio integration here on YT?

@1ntrcnnctr608 Жыл бұрын

@@hyperadapted yup, yearning for quality these days

@1ntrcnnctr608 Жыл бұрын

@@hyperadapted "everyone will have a better learning experience" - 👑

@sansin-dev Жыл бұрын

It's a pity the audio is so bad

@gregx8245 10 ай бұрын

Div Garg's audio is so horrible, I'm moving on to other videos at the 1 minute 30 second mark. You guys have a lot to learn about video production. (Have you heard of microphones?)

@beofonemind Жыл бұрын

I'm putting this bad boy in my watch later..... with pen and paper and focus...... but the fact that this talk is available is amazing. Thanks Andrej. Thanks Stanford.

@jamesdelancey9752 7 ай бұрын

it's been 11 months. did you "watch later"?

@reechr Жыл бұрын

Why release this 6 months later????

@DaTruAndi Жыл бұрын

Great content, audio quality makes it a bit more challenging to listen to and speakers maybe could try to speak a bit slower and more clearly to make it more accessible to international audiences. Slowing down to 0.75 and turning on subtitles helps a bit. Maybe transcribing with Whisper additionally could be an option.

@yuluqin6463 Жыл бұрын

exactly, 0.75 works better for me

@linlinpan3150 5 ай бұрын

Got one of the greatest technologist of our time, and can't find a microphone from after the year 2000

@AAAJJJ-z3v Жыл бұрын

Is it feasible to get access to the code samples that Andrej is talking about?

@НиколайНовичков-е1э Жыл бұрын

Great seminar!

@harrylee27 Жыл бұрын

The audience who asked questions sounds like a real Transformer, 46:35

@lukeliem9216 Жыл бұрын

A feedback for the Stanford team is to improve the microphone system for their webcasting. The questions posed by people in the classroom are muffled because of noise cancelation (turned on by default) and it really degraded the quality of this seminar. I look forward to a re-do of this Transformer seminar since it is the foundation of Generative AI. So in a nutshell, better microphone setup, and a better explanation of transformer from Andrej. His 6-node graph complicated rather than clarified his explanation.

@SuperZardo Жыл бұрын

You believe they used mics, I think they just spoke into some kind of toilet bowl

@niclored 11 ай бұрын

If you dont work in the quality of the audio everything you did for this presentation is kinda ruind. Please try with a better mic since this is the stanford account and this is fairly recent. Audio should not be an issue and in this video is.

@gdymind7021 11 ай бұрын

Thank you fro the great pre!

@lukeliem9216 Жыл бұрын

I think Andrej is still in the process of percolating his understanding on transformers. So the lecture is not as cohesive as his lectures in CS231 on CNN. I look forward to his 2nd or 3rd try on this subject matter. His presentation at Microsoft BUILD is simpler to comprehend, though it is less technical and implementation focused as this lecture.

@christofferweber9432 Жыл бұрын

Sad that a great lecture is cut short by questions that could have been taken offline...

@zpengh 5 ай бұрын

22:37 Andrej's interpretation of transformer architecture.

@alielouafiq2552 Жыл бұрын

OMG ! just noticed this was released today !

@bryanduong-p7f Жыл бұрын

Great tutorial 🎉

@sahreenhaider9906 11 ай бұрын

What questions did Megatron ask? I mean the audio was pretty bad

@KarenLasser-n7i Жыл бұрын

What was the sentence he said before, ‘I have to be very careful?’

@iansnow4698 Жыл бұрын

Hi Andrej, Its a great historic view of Attention that you showed there, especially the email is a golden discovery in my eyes. All I could found before was as deep as Yoshua's papers. I have have a question hope you or some one else could answer here. Is there any connection of the Key Value Query mechanism in the later paper to the weighted average of BiRNN idea in the email? Or maybe that was simply a new idea in the Attention Is All You Need paper? Best regards, Ian

@NanXiao 4 ай бұрын

I think it's better to check Andrej Karpathy's full videos on transformers and neural nets: - kzbin.info/www/bejne/oXTGaXmjesdkpLssi=kpQy5jGrqvT9Ymsz - kzbin.info/www/bejne/jH7NXmaJZtmeq5Isi=DgOju3xZbKQmqzlk

@harshitkumar5147 Жыл бұрын

Where do I get the slides?

@user-0j27M_JSs 25 күн бұрын

How is it possible that the sound from the lecturers is not possible to hear? You guys work on cutting edge, but are not able to fix basics?

@anmolagarwal999 Жыл бұрын

Andrej starts at 10:12

@vamshi3676 Жыл бұрын

He has mentioned that multihead is attention in parallel but from other video I understood that a big attention layer is chopped into pieces so that they can be processed parallel. Am I wrong or he missed that point? Please someone clarify 🙏🙏