Transformers: The best idea in AI | Andrej Karpathy and Lex Fridman

Рет қаралды 362,104

Жыл бұрын

Lex Fridman Podcast full episode: • Andrej Karpathy: Tesla...
Please support this podcast by checking out our sponsors:
- Eight Sleep: www.eightsleep.com/lex to get special savings
- BetterHelp: betterhelp.com/lex to get 10% off
- Fundrise: fundrise.com/lex
- Athletic Greens: athleticgreens.com/lex to get 1 month of fish oil
GUEST BIO:
Andrej Karpathy is a legendary AI researcher, engineer, and educator. He's the former director of AI at Tesla, a founding member of OpenAI, and an educator at Stanford.
PODCAST INFO:
Podcast website: lexfridman.com/podcast
Apple Podcasts: apple.co/2lwqZIr
Spotify: spoti.fi/2nEwCF8
RSS: lexfridman.com/feed/podcast/
Full episodes playlist: • Lex Fridman Podcast
Clips playlist: • Lex Fridman Podcast Clips
SOCIAL:
- Twitter: / lexfridman
- LinkedIn: / lexfridman
- Facebook: / lexfridman
- Instagram: / lexfridman
- Medium: / lexfridman
- Reddit: / lexfridman
- Support on Patreon: / lexfridman

Пікірлер: 236

@LexClips Жыл бұрын

Full podcast episode: kzbin.info/www/bejne/mZXMdWBvgrKjmJI Lex Fridman podcast channel: kzbin.info Guest bio: Andrej Karpathy is a legendary AI researcher, engineer, and educator. He's the former director of AI at Tesla, a founding member of OpenAI, and an educator at Stanford.

@fezkhanna6900 Жыл бұрын

I hope to get to see Jacob Devlin, or more relevantly Ashish Vaswani on Lex Clips. Id love to hear Jacobs foresight on the masking technique

@WALLACE9009 Жыл бұрын

Please, interview Vadswani

@mauricemeijers7956 Жыл бұрын

Andrej speaks at 1.5x speed and Lex, as always, at 3/4x. Yet, somehow they understand each other.

@Pixelarter Жыл бұрын

And I listen both at 1.5x (2x is a bit too much to absorb the dense content)

@zinyang8213 Жыл бұрын

through a transformer

@jaiv Жыл бұрын

do you mean lex speaks at like 0.3/0.25x

@frkkful Жыл бұрын

golden comment

@WahranRai Жыл бұрын

They used transformer !

@totheknee Жыл бұрын

Damn. That last sentence. Transformers are so resilient that they haven't been touched in the past *FIVE YEARS* of AI! I don't think that idea can ever be overstated given how fast this thing is accelerating...

@SMH1776 Жыл бұрын

It's amazing to have a podcast where the host can hold their own with Kanye West in a manic state and also have serious conversations about state-of-the-art deep learning architectures. Lex is one of one.

@1anre Жыл бұрын

The fact that he’s from both parts of the world helps me I’d assume

@kamalmanzukie Жыл бұрын

1/ n

@2ndfloorsongs Жыл бұрын

Lex is one of one, two of one, one to one, and two to one all at the same time.

@totheknee Жыл бұрын

Okay, but tbh Kayne is a racist, incompetent d-bag. So the only people who couldn't "hold their own" would be even more incompetent rubes like Trump or Bill O'Reilly who cry their way into and out of every situation like the snowflakes they are.

@ZombieLincoln666 17 сағат бұрын

He’s not asking any technical questions lol

@baqirhusain5652 Жыл бұрын

My professor Dr Sageeve Oore gave a very good intuition about residual connection. He told me that residual connections allow a network to learn the simplest possible function. No matter how many complex layer we start by learning a linear function and the complex layers add in non linearity as needed to learn true function. A fascinating advantage to this connection is that it provides great generalisation. ( Dont know why, I just felt the need to share this)

@xorenpetrosyan2879 Жыл бұрын

Residual Connections were first proposed as an elegant solution for training very deep networks. Before ResNets the deepest networks researchers managed to train were 20-layer, go deeper and your network training would become too unstable and making it deeper could even hurt accuracy. But ResNets enabled a much more stable training of much deeper networks (up to 150 layers!) that generalised better. This was achieved by those linear residual connections that'd send the gradient signals unchanged to the first layers of a very deep network, because older DNNs had trouble passing any useful signal any deeper. Fascinating that this worked so well and turned out to be helpful not only in ConvNets (which the original ResNet was) and in architectures that didn't even exist at the time (Transformers).

@ArtOfTheProblem Жыл бұрын

thanks for sharing. another question about the heads. would you agree that in simple terms a transformer "looks at everything at every step" (and absorbes what's relevant), and so on the one hand it's a very naieve, brute force approach. where if the modeled was learned from scratch it might do 'much less' (guess at what it needs and from where, etc.) to acheve sparsness among other things

@xorenpetrosyan2879 Жыл бұрын

@@ArtOfTheProblem if you knew what your model should look at in advance, yes, but you don't and SOTA results Transformers produce is a proof of that. It's a perfect example of The Bitter Lesson - more compute and data >>> hand tuned features and architectures

@ArtOfTheProblem Жыл бұрын

@@xorenpetrosyan2879 makes sense,

@offchan Жыл бұрын

Basically it's Occam's razor at work. Simple reasonings are usually more generalizable than complex reasonings, when both reasonings fit the evidence.

@aangeli702 11 ай бұрын

Andrej's influence on the development of the field is so underrated. He's not only actively contributing academically (i.e. through research and co-founding OpenAI), but he also communicates ideas so well to the public (for free, by the way) that, he not only helps others contribute academically to the field, but also encourages many people to get into it simply because he manages to take an overwhelmingly complex (at least for me it used to be) topic such as the Transformer and strips it down to something that can be (more easily) digested. Or maybe that's just me - as my professor in my undergrad came no where near to an explanation of Transformers that it as good and intuitive as Andrej's videos do (don't get me wrong, [most] professors know their stuff very well, but Andrej is just on a whole other level).

@fearrp6777 10 ай бұрын

You have to remove his nuts from your mouth and breathe before you type.

@adicandra9940 5 ай бұрын

can you point me the video which Andrej talk about this Transformers in depth? or other Andrej videos you mentioned. Thanks

@aangeli702 5 ай бұрын

@@adicandra9940 kzbin.info/aero/PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ He's got more videos in his channel and also many videos on YT where he gives talks/lectures (e.g. a talk at Tesla)

@revimfadli4666 5 ай бұрын

To think that this guy made evolving fish simulator at about 12...

@ZombieLincoln666 17 сағат бұрын

underrated? tf are you talking about?

@diedforurwins Жыл бұрын

6:30 😂 imagine how fast this sounds to lex

@MyGroo Жыл бұрын

I literally went to check if my playback speed got changed to 1.5x listening to this guy

@omarnomad Жыл бұрын

2:18 Meme your way to greatness

@tlz8884 Жыл бұрын

I double checked if i was listening at 1.25x speed when Andrej was speaking

@danparish1344 2 ай бұрын

“Attention is all you need” is great. It’s like a book title that you can’t forget.

@MsStone-ue6ek 7 ай бұрын

Great interview. Engaging and dynamic. Thank you.

@bmatichuk Жыл бұрын

Karpathy has some great insights. Transformers seem to solve the NN architecture problem without hyper parameter tuning. The "next" for transformers is going to be neurosymbolic computing i.e. integrating logic with neural processing. Right now transformers have trouble with deep reasoning. Its remarkable that reasoning processing automatically arises in transformers based on pretext structure. I believe there is a deeper concept of AI waiting to be discovered. If the mechanism for auto-generated logic pathways in transformers could be discovered, then that could be scaled up to produce general AI.

@taowroland8697 Жыл бұрын

Kanye is not manic or crazy. He just saw a pattern and talked about it.

@alexforget Жыл бұрын

Doesn’t excel in sequential reasoning? Our brain seem to be divided in two for that reason, allowing parallel processing on one side with sequential on the other with very few connection in between. The two modes of thinking are antagonist, they cannot coexist in the same structure or they need to be lightly connected to not confuse the other part. It’s still a problem human struggle with.

@marbin1069 Жыл бұрын

They are able to reason and every new model (> size) seems to reason better. Right now, there is no need for neurosymbolic AI.

@Paul-rs4gd Жыл бұрын

I have great hopes for Transformers. It seems like the forward pass is 'system 1' reasoning (intuitive perception/pattern recognition), but the autoregressive, sequential output from the decoder is like 'system 2' reasoning (Daniel Kahneman). The decoder produces a token, and then considers all the input plus all the output-so-far in order to produce the next token. This should be able to create a chain of reasoning. i.e. Look at the input facts plus all the conclusions so far in order to produce the next conclusion. Tokens may well be the basis for symbol-like reasoning too, but still in differentiable form.

@bmatichuk Жыл бұрын

@@marbin1069 That is a very intriguing idea and one that is being tossed about in various ML circles, but I'm skeptical. Full logical reasoning requires thinking about variables, loops, conditionals and induction. Complex problem solving requires careful planning and step by step assessment. The Transformer architecture is not recursive, and its memory is limited. It's not clear to me how scale alone addresses this problem. There is also the problem of stored knowledge. Transformers generate knowledge at run-time, which means that the entire output is probabilistic. In some context you want your agent to look up knowledge in a database. Transformers can't do that right now.

@amarnamarpan 10 ай бұрын

Dr. Ashish Vaswani is a pioneer and nobody is talking about him. He is a scientist from Google Brain and the first author of the paper that introduced TANSFORMERS, and that is the backbone of all other recent models.

@user-uv2sy5je4z 8 ай бұрын

Agreed!

@dotnet364 8 ай бұрын

Its political.

@user-uv2sy5je4z 8 ай бұрын

@@dotnet364 agreed

@dotnet364 7 ай бұрын

Even Ilya admitted that attention is all you need was the breakthrough. 2 hrs of that produced more results than 2 years of their own work. Now these guys are worth 90b because of Vaswani

@AnOzymandias 5 ай бұрын

im not disagreeing, but when i read the paper im pretty sure it said the author order was decided randomly so i think vaswani just got lucky, and was part of a super important team of researchers@@dotnet364

@oleglevchenko5772 2 ай бұрын

Why Lex doesn't invite actual inventor of Transformers, e.g. Ashish Vaswani? All these people like Sam Altman, Andrej Karpathy are reaping the harvest of the invention by that paper "Attention is all we need", yet they are not invited even once to Lex talks.

@ReflectionOcean 6 ай бұрын

- Understanding the Transformer architecture (0:28) - Recognizing the convergence of different neural network architectures towards Transformers for multiple sensory modalities (0:38) - Appreciating the Transformer's efficiency on modern hardware (0:57) - Reflecting on the paper's title and its meme-like quality (1:58) - Considering the expressive, optimizable, and efficient nature of Transformers (2:42) - Discussing the learning process of short algorithms in Transformers and the stability of the architecture (4:56) - Contemplating future discoveries and improvements in Transformers (7:38)

@wasp082 7 ай бұрын

The attention name was surrounding there in the past on other different architectures. It was common to see recurrent bidirectional neural networks with "attention" on the encoder side. That's why the name "attention is all you need" comes from. That because it basically deletes the need of a recurrent or sequentially architecture.

@jeff__w Жыл бұрын

1:56 “I don’t think anyone used that kind of title before, right?” Well, maybe not as a title, but I can’t imagine that the authors of the paper were unaware of the lyric “Love is all you need” from The Beatles’ 1967 song “All You Need is Love.”

@mikeiavelli Жыл бұрын

I am surprised that they could find it surprising, as there is a long tradition of "meme" titles for papers in computer science, some of which are now classics. E.g. from the top of my head: - Lively linear Lisp: “look ma, no garbage!” - Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire - A Short Cut to Deforestation - Clowns to the left of me, jokers to the right (pearl): dissecting data structures - I am not a number -- I am a free variable There's also the "considered harmful" pattern, that originated from the classic paper 'GOTO Considered Harmful'. The opposing view was expressed in a paper titled: "'GOTO Considered Harmful' Considered Harmful". And then a paper considering both views came: "'GOTO Considered Harmful' Considered Harmful" Considered Harmful? Many papers now use the "[insert concept here] considered harmful" template to critique some concept in CS. "Attention is all you need" is in the same spirit. I like it.

@MrMcSnuffyFluffy Жыл бұрын

Optimus Prime would be proud.

@dianes6245 4 ай бұрын

Yes the TF is great, however: 1- Next word prediction only gets the most popular ideas, not the right ones. 2- Hilton wants to improve on Back Prop. 3- Hallucinations in = Hallucinations out. Attribute that to Plato. 4- Will it get lost on real world data - the info in matter? 5- How bad is the compute wall? Factorial? N**4 does not consider parameters and data that must increase. 6- How will the measurement problem and non locality in physics affect AI? 7- does the entropy of physics eat your models alive if you dont engineer them perfectly?

@ashutoshzade5480 4 ай бұрын

Great short video. What are the some of the limitations of the transformer architecture you can think of?

@alexforget Жыл бұрын

Amazing how one paper can change the course of humanity. I like that kind of return on investment, let’s get more weird ambitious.

@rajatavaghosh1913 2 ай бұрын

I read the paper and was wondering Transformer is another kind of a LLM for generative tasks as they mentioned it as a model and also compared with other models at the last of the paper but finally after watching this explanation by Andrej i understood it is a kind of an architecture that learns the relationship between each sequence

@Halopend Жыл бұрын

Self-attention. Transforming. It's all about giving the AI more parameters to optimize what are important internal representations of the interconnections between data itself. We've supplied first order interconnections. What about second order? Third... or is that expected to be covered in the sliding window technique itself? It would seem the more early representations we can add the greater we can couple to "the data" complex/nuance. At the other end, the more we couple to the output, the closer to alignment we can achieve. But input/output are fuzzy concepts in a sliding window technique. There is no temporal component to the information. The information is represented by large "thinking spaces" of word connections. It's somewhere between a CNN like technique to parse certain subsections of the entire thing at once, to a fully connected space between all the inputs. That said sliding is convenient as it removes the hard limit of what can be generated and makes for an easy to understand parameter we can increase at fairly small cost our increase our ability to generate long form representations exhibiting deeper level nuance/accuracy. The ability to just change the size of the window and have the network adjust seems a fairly nice way to flexibly scale the models, though there is a "cost" to moving around IE: network stability meaning you can only scale up or down so much at a time to maintain the most knowledge incurred from previous trainings. Anyway, the key ingredient is, we purposefully encode the spatial information (to the words theme-selves) to the depth we desire. Or at least that's a possible extension. The next question of course is which areas of representation can we supply more data that easily encodes within the mathematics of information we think is important to be represented in the information (that isn't covered by the processes of the system itself (having the same thing represented in multiple ways (IE: the Data + the system) ) is a path to overly-complicated systems in terms of 'growth/addendums". The easiest path is to just represent in the data itself. And patch it. But you can do stages of processing/filtering along multiple fronts and incorporate them into a larger model more easily, as long as the encodings are compatible (which I imagine will most greatly affect the growth of these systems/swapability though standardized ). Ideally this is information that is further self-represented within the data itself. FTT are a great approximations we can use to bridge continuous vs discrete knowledge. Though calculating it on word encodings feels a poor fit, we could break the "data signal" into an individual chosen subset of wavelengths. Note this doesn't not help in the next word prediction "component" of the data representation, but is a past knowledge based encoding that can be used in unison with the spatial/self-attention and parser encoding to represent the info (I'm actually not sure of the balance between spatial and self-attention except that the importance of the token in the generation of each word to the previous word (along with a possibly a higher order of inter-connections between the tokens) is contained within the input stream). If it is higher order than FFT's may already be represented and I've talked myself in a circle. I wonder what results dropouts tied to categorization would yield on the swap-ability of different components between systems? Or the ability to turn various bits/n/bobs on/off in a way tied to the data? I think that's how one can understand the partial derivative reverse flow loss functions as well, by turning off all but one path at a time to split the parts considered, but that depends on the loss function being used. I imagine categorization of subsections of data to then spit off into distinct areas would allow for finer control on representations of subsystems to increase scorability on specific test without affecting other testing areas as much. Could be antithetical to AGI style understanding, but it allow for field specific interpretation of information in a sense. Heck. What if we encoded each word as their dictionary definitions?

@datasciyinfo5133 Жыл бұрын

Best explanation of the essence of Transformer architecture. I think the title is a red herring because it makes it more difficult to understand. You need much more than Attention, you need all the smart tweaks. And it keep making my mind think of Megatron from the movies, and I’m not sure what if any is the relationship. I like the generalized differentiable program, as the best description of a Transformer Model today. But that could change. The description is from Yuan LeCun in 2017-19 time period. Jennifer

@surecom12 29 күн бұрын

@1:15 Well they were probably AWARE since they named the paper : "Attention is all you need" 🤭

@brunotvrs Жыл бұрын

Saving that one to see if I can understand wth Andrej is talking in a year.

@Skynet_the_AI Жыл бұрын

My fav Lex clip byfar

@KGS922 Жыл бұрын

I'm not sure I understand the other guy

@ColinGrym Жыл бұрын

In a rarity for this channel, the title approaches clickbait. I will cede the point that it's 100% accurate to the conversation, but I'm bitterly disappointed that I didn't get to hear expert opinion about benevolent, shapeshifting AI platforms protecting sentient organics from the Megatrons of the universe.

@_PatrickO Жыл бұрын

This title has nothing to do with clickbait. They are talking about transformers. The scope of AI was even in the title. It is the least clickbaity as you can get.

@moajjem04 Жыл бұрын

satire

@Skynet_the_AI Жыл бұрын

Nice

@oldtools6089 Жыл бұрын

PRIME!!!! YOU NEEED ME!

@abramswee Жыл бұрын

Need a phd in AI before I can understand what he is saying

@maxflattery968 Ай бұрын

No , no one really understands why it works. For example we know how an airplane wing works and it is a verifiable theory through experiment. You will never get an explanation of detail of why these algorithms work, it’s just full of jargon.

@RyanAlba Жыл бұрын

Incredibly interesting and thought provoking. But still disappointed this wasn't about Optimus Prime

@michaelcalmeyerhentschel8304 Жыл бұрын

well, this clip was great until it ended abruptly in mid-sentence. PLEASE! I signed up for clips, but I am frustrated to have missed a punchline here from Karpathy explaining how GPT architecture has been held back for 5 years. I have SO little time to now scan the full version! But at least you linked it. Your sponsor has left me sleepless.

@leimococ Жыл бұрын

I had to check a few times if my video speed was not set to 1.5x

@JM-gw1sg 10 ай бұрын

The model YOLO (you only look once) is also an example of a "meme title"

@Statevector Жыл бұрын

When he says the transformer is a general-purpose differential computer does he mean in the sense that it is Turing complete?

@MellowMonkStudios Жыл бұрын

I concur

@michaelfletcher940 Жыл бұрын

Pay this man his money (rounders)

@iukeay 6 ай бұрын

This guy just slapped me with a book

@RalphDratman Жыл бұрын

What is meant by "making the.evaluation much bigger"? I do not understand "evaluation" in this context.

@gabrielepi.3208 11 ай бұрын

So the last 5 years in AI are "just" dataset size changes while keeping the original Trasformer Architecture (almost) ?

@generichuman_ Жыл бұрын

"Don't touch the transformer"... good advise regardless of what kind of transformer you're taking about.

@shinskabob8464 Жыл бұрын

Best idea seems to be to make sure the robot needs to be plugged in so it can't chase you

@rohitsaxena22 Жыл бұрын

We as a field are stuck in this local maxima called Transformers (for now!)

@StasBar Жыл бұрын

The way this guy thinks and speaks reminds me of Vitalik Buterin. What do they have in common? High intelligence is not the only factor here.

@SevenDeMagnus Жыл бұрын

Coolness

@Muslim-uc2bh Жыл бұрын

Does any one have a simple definition for “general differentiable computer”?

@kazedcat Жыл бұрын

There is no simple definition. Computers can be abstracted into a system of mathematical functions. If you have a general purpose computer with their representative function that can be differentiated using differential calculus that is what he meant.

@strictnonconformist7369 Жыл бұрын

The most interesting aspect of the Transformer architecture is it can integrate as well as differentiate, in effect, as a result that it can take a given input, an generate an output based on it, and that can add to the size of the total with input+output, or it can be reduced in size if desired. In essence, it can rewrite equations in either direction, integrate and differentiate. It's also not a purely binary computation, and probability plays into it, as does rounding error. When you think about it, when we feed code into a typical Neumann architecture and have it work on data, the computation and output is based both on input (data) with the transformation (instructions) generating data of some kind as output, where part of that data is the next flow of execution: the biggest difference is it isn't probabilistic and it's much more discrete and deterministic in nature. Weirdly, it's the imprecision of representation of Transformers that are their greatest value in generalization, in that it enables things that are similar to have a similar value and fit into patterns in a more generalized manner.

@ytubeanon 2 ай бұрын

5:04 while the effort is appreciated, I don't think the blocks analogy simplifies things enough to help the layman

@guepardo.1 Жыл бұрын

What is the best idea in AI and why is it transformers?

@the_primal_instinct Жыл бұрын

Ah, so that's why they called their bot Optimus.

@mayetesla Жыл бұрын

Maye the force

@johnlucich5026 6 ай бұрын

Someone should tell Altman; “IF YOU CANT RUN WITH BIG DOGS STAY ON PORCH”

@imranq9241 Жыл бұрын

The weird thing about transformers is that it's just so random. There is huge space of potential architectures, but it's not clear why transformers are so good

@SMH1776 Жыл бұрын

I feel like the success of the transformer stems from their improvement over LSTMs. There is a huge range of problems that can be solved with sequence-to-(something) and transformers do a better job of encoding sequences because the attention mechanism makes it easier to look further back (or forward) in the input sequence, and they're orders of magnitude more parallelizable. LSTMs being so bad makes transformers look so good. I agree 100% that there are bound to be much better architectures in the set of all possible unknown model architectures, and I'm excited to see the next leap forward.

@ArtOfTheProblem Жыл бұрын

i don't think it's random though, i think of it as "look at everything at every step" it's a kind of brute force in a way

@farrael004 Жыл бұрын

It's not random if you understand how to design neural networks. For example, why did they choose to not use bias in the key/query single forward layer in the attention head? Because they want it to be a lookup table for what the inputs represent to eachother, and that's exactly what a single forward layer does. ResNets have been a recent development when the transformers paper came out, so it makes sense for them to take advantage of its demonstrated capabilities. If you read enough neural architecture research, you'll start seeing patterns of how different components affect eachother and start piecing together what the ideal architecture would look like. Then you can write a paper about it.

@sdsfgfhrfdgebsfv4556 Жыл бұрын

@@farrael004 that's like saying that reading about computer chip architechture from the 80s will help you figure out what the architechture was like in the 90s or in a new generation

@clray123 Жыл бұрын

It's not random, but the authors of the paper did a really shitty job explaining their design choices.

@marcin8112 Жыл бұрын

5:05 he lost me after series of blocks

@oldtools6089 Жыл бұрын

Question: without the optimization algorithms designing the hardware manufacturing is there reason to believe that the fundamental nature of these mechanisms reflect the inherent medium of computation? Nope. I guess not. They're watching it.

@samario_torres 3 ай бұрын

Do a podcast with him and George at the same time

@axelanderson2030 Жыл бұрын

Attention is truly all you need.

@flwi Жыл бұрын

Did anyone learn about transformers recently and can recommend a video that made it easy for them? I'm quite new to ML and would appreciate recommendations.

@boraoku Жыл бұрын

Transforms will land here soon - AK’s KZbin Playlist: m.kzbin.info/aero/PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ

@bourgeois_radical9027 Жыл бұрын

Sebastian Raschka. And his book Machine Learning with PyTorch.

@peterfireflylund Жыл бұрын

This one did it for me -- by Leo Dirac (grandson of P.A.M. Dirac): kzbin.info/www/bejne/iWOaoXuBd6qjaZI

@jabowery Жыл бұрын

Recurrence all you need.

@MaximilianBerkmann Жыл бұрын

Long time no see badmephisto!

@TheJordanK Жыл бұрын

I like your funny words magic man

@WALLACE9009 Жыл бұрын

Any guess why Vadswani is ignored?

@dancar2537 Жыл бұрын

and the greatest thing about it is that you do not need it

@NicolaiCzempin Жыл бұрын

Attention is all you need? No, 🎼🎶All You Need Is Love 🎶

@michaelfletcher940 Жыл бұрын

Lex is the translator between genius ideas and us normal folk

@blondisbarrios7454 4 ай бұрын

Minuto 6:31 tu cara cuando tu amiga te dice que hizo 25mil fotogramas para Ghibli en una semana 😄

@yusuf4433 5 ай бұрын

And he's known for explaining stuff in simple terms...

@ankitbatra270 5 ай бұрын

that was probably one of the best high level explanations of the transformer i have come across

@johnlucich5026 6 ай бұрын

Recall Football; SOMETIMES ONE NEEDS TO GO BACK A LITTLE TO RUN ALL THE WAY WAY FORWARD ! ?

@123cache123 Күн бұрын

"English! Do you speak it?!"

@johnlucich5026 6 ай бұрын

Altman TALKS-But-ILYA IS THE ONLY ONE “SAYING” ANYTHING ! ?

@moritzsur997 Жыл бұрын

I loved them as well but after the first 2 films it really got boring

@johnlucich5026 6 ай бұрын

Most people can Walk academically-But-ILYA IS “RUNNING” INTELLECTUALLY ! ?

@xpkareem Жыл бұрын

I looked up some of the words he said. I still have no idea what he is talking about.

@kazedcat Жыл бұрын

Transformers are easier if you look at it as a math equation. But you need to know how to multiply a matrix first.

@johnlucich5026 6 ай бұрын

Dr ILYA; consider that it’s not Altman- it’s “FAULT-MAN” ! ?

@AndreaDavidEdelman Жыл бұрын

It’s self attention ⚠️

@jzuni001 Жыл бұрын

How many people thought they were going to talk about Megatron? Lmao

@ryansimon9855 Жыл бұрын

i know absolutely nothing about transformer architecture or ai. only recently did i find these clips and they are very interesting to watch. i immedietely recognize the arrangement and flow in the transformer model architecture as a fractal design present in nature already. it was an instant reaction i had to oberserving the transformer model architecture. this "recognition" of the design is likely only due to human pattern recognitions trying to convince me i am seeing similar shapes, but interesting none the less when you look at it this way. not sure if this random youtube comment helps anyone or anyone will see it. im speaking from a purely basic understanding and google searchs lol fun video to watch.

@IlyaKowalewski Жыл бұрын

Transformers are more similar to compression algorithms, basically you have a certain encoder-decoder setup where the model attempts to continuously encode into some multi-dimensional vector representation (vectors are basically lists of numbers of a fixed size) whatever is fed as its input to be able and derive it back unchanged, they then introduce noise at multiple points in the process to "keep the model up on its toes" so it doesn't learn the exact composition of the data, but rather how it's meant to typically relate to itself. This leads into a very interesting proposition that training is ultimately about introducing noise, and inference (prediction) is about reducing that noise; the digital signal processing people were right all along, kind of!

@tadeohepperle Жыл бұрын

I don't see how transformers relate to fractals in any way.

@ryansimon9855 Жыл бұрын

@@tadeohepperle they dont at all. i typed that whole comment just to say they have a slightly similar shape lol

@ArtOfTheProblem Жыл бұрын

@@IlyaKowalewski interesting can you say more about where the noise is introduced ?

@ArtOfTheProblem Жыл бұрын

@@IlyaKowalewski just the regularization aspect?

@chenwilliam5176 Жыл бұрын

I don't think so 😕

@johnlucich5026 6 ай бұрын

DR ILYA IS TRANSFORMING CULTURE SOCIETY & WORLD ! ?

@techpiller2558 Жыл бұрын

"Once you understand the way broadly, you can see it in all things." -- This could be the name for the AGI paper, whoever will write that. Just give a credit for "AGI technology enthusiast TechPiller in a KZbin comment". :)

@markcounseling Жыл бұрын

If I understand correctly, transformers = good. N'est pa?

@cmilkau Жыл бұрын

The paper is called "Attention Is All You Need", and IMHO attention is what made transformers so successful, not its application in the transformer architecture.

@thekittenfreakify Жыл бұрын

...i am dissapointed it's not about optimus prime

@arifulislamleeton Жыл бұрын

Introduce myself my name is Ariful Islam leeton im software developers And developer open AI

@vulkanosaure Жыл бұрын

I think human brain has 2 mode of working : an efficient mode (non Turing complete) and an expensive one (Turing complete) As a simplified illustration, there are 2 ways of doing a simple multiplication : memorising it, or running the multiplication and counting on your finger. Our brain is capable of both, and transformers are probably only able to do it the non Turing complete way. It stills produce impressive results, which teaches something ABT how much seemingly complex reasoning we can produce just by memorizing / interpolating data. But it has a fundamental limitation compared to the turing complete way of reasoning (which is why chat gpt fails pretty fast with simple math problems) (This is just my take on the subject and I obviously have a big respect/admiration for what karpathy achieved with chat GPT and am excited to see the way it will evolve)

@ycombinator765 Жыл бұрын

wow, I love this perspective.! by the way as I am typing this, I realize that typing out a comment via laptop is way more satisfying than just tapping on my android lol

@johnlucich5026 6 ай бұрын

If one gave Dr ILYA an Enima you would get an ALTMAN ! ?

@markcounseling Жыл бұрын

What?

@JDNicoll Жыл бұрын

Robots in disguise....

@Skynet_the_AI Жыл бұрын

Genius in disguise...

@vulkanosaure Жыл бұрын

Father of robots in disguise

@dainionwest831 4 ай бұрын

Optimizing NNs based on just the output layer always never really made sense to me, it's really cool knowing the transformer has solved that!

@MrRaja Жыл бұрын

Andrej speaks on my level of speed. Lex is more concise but packed with knowledge in each word. So speaking em slower lets the listener build sort of neural net of relativity in their brain all done automatically which is only limited by knowledge and experience.

@abir95571 Жыл бұрын

This makes me slightly freaked out that we don’t really understand what we’re developing….

@oldtools6089 Жыл бұрын

Relax: They're watching it. The optimization and transformer algorithms are just calculation patterns in a medium and life adapts.

@abir95571 Жыл бұрын

@@oldtools6089 I'm a ML engineer by profession. We don't understand why does attention mechanism work so well. We just observe it's effects and agree to a common consensus

@randomusername6 Жыл бұрын

@@oldtools6089whatever helps you sleep at night

@oldtools6089 Жыл бұрын

@@abir95571 this correspondence has distracted me from my existential dread, so whatever you're doing is working. Thanks.

@michaelfletcher940 Жыл бұрын

This guy is semi-Elon’s brain but articulate

@neofusionstylx Жыл бұрын

Lol in terms of AI knowledge, this guy runs circles around Elon.

@SJ-eu7em Жыл бұрын

Yeah coz Elon sometimes talks a bit tarded way like he's lost in his brain...

@kazedcat Жыл бұрын

This guy is smarter than Elon.

@TheChannelWithNoReason Жыл бұрын

@@kazedcaton this subject

@TheEsotericProgrammer 2 ай бұрын

No he's just smarter... @@TheChannelWithNoReason

@robbie_ Жыл бұрын

I studied neural networks in the 1990's as part of my undergrad degree in AI. I did a paper on simulated annealing... anyway there's been a lot of "progress" since then, but also unfortunately a lot of drivel written and bollocks spoken. I might trust an ANN bot to hoover my carpet, within reason, but I wouldn't trust it to make me a cup of tea. It's not clear to me I'll ever be able to either. Something important is missing...

@jimj2683 Жыл бұрын

Dude, you are a nobody. You are on the lower end of it-people. You have no idea where things are heading or even how the cutting edge today works. You are NOT an expert. Stop pretending to know anything.

@robbie_ Жыл бұрын

@@jimj2683 The "cutting edge" today is still spectacularly stupid.

@SMH1776 Жыл бұрын

Specialized transformer architectures can outperform humans in a variety of specific language and visual tasks, but there is nothing approaching a general AI that rivals biology. Even if we had the code (we don't), we lack the hardware to make it feasible.

@oldtools6089 Жыл бұрын

affordable precision engineering and perhaps very little materials-science innovation is all that really stands in the way of a fully articulated and self-powered humanoid AI using what we know exists today...using open-source off-the-shelf components for jank-points.

@oldtools6089 Жыл бұрын

@@SMH1776 The bizarre thing about life is that it adapts to its environment.

@theeagle7054 Ай бұрын

How many of you saw the entire video and understood nothing?? 😂😂😂

@EmilyStewart-dh8gf Жыл бұрын

Lex Fridman, you seem bored and uninterested. Holding your head up with your hand. You have Andrej in front of you, be professional. ;-)

@derinko Жыл бұрын

Lex posted today he's suffering from depression since last year, so it might be that...

@cit0110 Жыл бұрын

make your own podcast and have your neck perpendicular to the ground

@schuylerhaussmann6877 Жыл бұрын

Any top AI expert that is not a member of the tribe?

@ClaudioPascual Жыл бұрын

Transformers is not so good a movie

@lachainesophro8418 Жыл бұрын

Is this guy a robot...

@jimluebke3869 Жыл бұрын

So instead of pure intuition -- convolutional neural nets -- we have moved to something more like consciousness.

@jetspalt9550 Жыл бұрын

What absolute rubbish!! Transformers are either Autobots or Decepticons. I don’t know what he was talking about but get a clue dude!

@lizs821 Жыл бұрын

Umm ya I agree but the residual connections should be connected to the layers sequentially to the blocks of the Python code to optimize the transformer architecture to simultaneously admire the resilience of the convergence of a stable environment in evaluating the determining code structure. If I have to explain it to you then you are not deserving of the explanation.🎃

@Skynet_the_AI Жыл бұрын

Huh?

@markj6854 Жыл бұрын

I think you'll have to explain that to any English language speaker as it's not grammatically correct.

@gerhitchman Жыл бұрын

If only this were true. There have been a bunch of papers showing that standard architectrues based on MLPs alone rival transformers (see MLP Mixer paper for example). Self-attention is *not at all* required for high perforance and scalable AI.

@joshborders4089 Жыл бұрын

This is also the case in time-series forecasting, where transformers struggle to outperform simpler methods like boosted trees.

@ricosrealm Жыл бұрын

certainly it is domain specific. In NLP nothing rivals it. For simpler tasks, attention might not be as important. For vision tasks, transformers are proving to be impactful.

@axelanderson2030 Жыл бұрын

@@joshborders4089 some transformers such as the Time Series Transformer works really well if you have enough data.