Take your personal data back with Incogni! Use code WELCHLABS and get 60% off an annual plan: incogni.com/welchlabs
@ckq15 сағат бұрын
<a href="#" class="seekto" data-time="900">15:00</a> The reason for polysemanticity is because in an N-dimensional vector space there's only O(N) orthogonal vectors, but if you allow nearly orthogonal (say between 89 and 91 degrees) it actually grows exponentially to O(e^N) nearly orthogonal vectors. That's what allows the scaling laws to hold. There's an inherent conflict between having an efficient model and an interpretable model.
@SteamPunkPhysics10 сағат бұрын
Superposition in this polysemantic context is a method of compression that, if we can learn more from it, might really make a difference to the way in which with deal with and compute information. While we thought quantum computers would yield something amazing for AI, maybe instead, it's the advancement of AI that will tell us what we need to do to make quantum computing actually be implemented effectively. (IE Computation of highly compressed data that is "native" to the compression itself)
@mb27764 сағат бұрын
thank you, I also paused the video at that time. The capital "Almost orthogonal vectors" also catched my eye.
@xXMockapapellaXx17 сағат бұрын
That was such an intuitive way to show how the layers of a transformer work. Thank you!
@thorvaldspear6 сағат бұрын
I think of it like this: understanding the human brain is so difficult in large part because the resolution at which we can resolve it is so small both in space and time. The best MRI scans have a resolution of maybe a millimeter per voxel, and I'll have to look up research papers to tell you how many millions of neurons that is. With AI, every neuron is right there in the computer's memory: individually addressable, ready to be analyzed with the best statistical and mathematical tools at our disposal. Mechanistic interpretability is almost trivial in comparison to neuroscience, and look at how much progress we've made in that area despite such physical setbacks.
@atgctg18 сағат бұрын
More like "The Neuroscience of AI"
@punk390013 сағат бұрын
I think that trying to understand from human perspective how these systems work is completely pointless and against the basic assumptions. This is because those models already model something that's not possible to design by a human being algorithmicly
@alexloftus88925 сағат бұрын
@@punk3900 I'm a phd student in mechanistic interpretability - I disagree and a lot of structure has already been found. We've found structure in human brains and that's another system that evolved without human intervention or optimization for interpretability.
@punk39005 сағат бұрын
@@alexloftus8892 I mean, its not that there is nothing you can find, There is surely lots of basic concepts that you can find, but it is not that you can find a way to disentangle the WHOLE structure of patterns because it has an increasing complexity. That's why you cannot design such a system manually in the first place
@roy0414 сағат бұрын
The videos on this channel are all masterpieces. Along with all other great channels on this platform and other independent blogs (including Colah's own blog), it feels like the golden age for accessible high quality education.
@thinkthing198416 сағат бұрын
I love the space-analogy of the telescope. Since the semantic volume of these LLMs is growing so gargantuan, it only makes sense to speak of astronomy rather than mere analysis! Great video. This is like scratching that part at the back of your brain you can't reach on most occasions
@kingeternal_ap14 сағат бұрын
<a href="#" class="seekto" data-time="1284">21:24</a> Oh damn, you just lobotomized the thing
@redyau_12 сағат бұрын
That was gross and scary somehow, yeah
@kingeternal_ap11 сағат бұрын
That felt... Wrong.
@1.41425 сағат бұрын
LLM went to to Ohio
@redyau_4 сағат бұрын
@@kingeternal_ap Although, when you think about it, all that happened was that "question" got a very high probability in that layer no matter what, and the normal weights of later layers did not do enough to "overthrow" it. Nothing all that special.
@kingeternal_ap2 сағат бұрын
I guess, yeah, I know it's just matrizes and math stuff, but I guess the human capacity for pareidolia makes this sort of ... "result" somewhat frightening for me. Also, suppose there is a neuron that does an especific task in your nuggin'. Wouldn't hyperstimulating it do essentialy the same thing?
@Eddie-th8ei13 сағат бұрын
an analogue to polysemanticity could be how, in languages, often the same word will be used in different contexts to mean different things, sometimes they are homonyms, sometimes they are spelled exactly the same, but when thinking of a specific meaning of a word, you're not thinking of other definitions of the word for example: you can have a conversation with someone about ducking under an obstacle, to duck under, and the whole conversation can pass without ever thinking about the bird with the same name 🦆. the word "duck" has several meanings here, and it can be used with one meaning, without triggering its conceptialization as an other meaning.
@dinhero216 сағат бұрын
in the AI case, it's much more extreme, with the toy 512 neuron AI they used having an average of 8 distinct features per neuron
@siddharth-gandhi19 сағат бұрын
Oh god, a Welch Labs video on mech interp, Christmas came early! Will be stellar as usual, bravo! Edit: Fantastic as usual, heard about SAEs in passing a lot but never really took time to understand, now I'm crystal clear on the concepts! Thanks!
@VeganSemihCyprus3317 сағат бұрын
The Connections (2021) [short documentary] 🎉❤🎉
@VeganSemihCyprus3317 сағат бұрын
Dominion (2018)
@chyza201214 сағат бұрын
It's a shame you didn't mention the experiment where they force activated the golen gate bridge neurons and it made claude believe it was the bridge.
@personzorz9 сағат бұрын
Made it put down words like the words that would be put down by something that thought it was the bridge.
@bottlekruiser8 сағат бұрын
see, something that actually thinks it's the bridge *also* puts down words like the words that would be put down by something that thought it was the bridge.
@dinhero216 сағат бұрын
it was more like increasing the chance of it saying anything related to the golden gate bridge, rather than specifically making it believe it was the golden gate bridge.
@atimholt4 сағат бұрын
Reminds me of SCP-426, which appears to be a normal toaster, but which has the property of only being able to be talked about in first person.
@fluffy_tail436515 сағат бұрын
<a href="#" class="seekto" data-time="860">14:20</a> welcome to neuroscience :D We suffer down here
@dreadgray7812 сағат бұрын
The more I watch these the more I understand why it's so hard to understand the human brain. And imagine how layers the human brain has relative to an AI model. I think the example about specific cross-streets in SF is super interesting later in the video - and shows why polysemanticity is probably necessary to contain the level of information we actually know.
@hugoballroom55109 сағат бұрын
With respect to recall: children remember curse words very well because of the emotion behind the utterance. AI has full retention but absolutely no emotional valence because it only learns from text. Just a thought ....
@Pokemon0015817 сағат бұрын
I think this is a design and engineering choice. If you choose to design your embedding space to be 2403 dimensions without inherent purpose its like mixing 2403 ingredients in every step 60 times and then being surprised that you cannot understand what is tasting like what. I think you need to constrain your embedding to many embeddings of smaller dimensions and to have more control by regularizing them with mutual information against each other.
@dinhero216 сағат бұрын
it needs to be big so you have many parameters for the gradient optimizer to optimize to be able to approximate the "real" function better
@Pokemon001586 сағат бұрын
@dinhero21 You can have it in the same size, but in different parts. Split 2403 dimensions into chunks of 64 dimensions, and then control for mutual information between the chunks so that different chunks get different representations. This is a hard problem too as the mutual information comparisons are expensive, and I think that the first iteration of the models went for the easiest but perhaps a less explainable way of structuring themselves.
@A_Me_Amy8 сағат бұрын
dude this wa one of the most compelling videos for learning data science and visualization ever. and best one ive seen explaining this stuff...
@jackgude39699 сағат бұрын
Easily one of my favorite channels
@AidenOcelot15 сағат бұрын
It's something I would like to see with AI image generation, where you put in a prompt and change specific variables that change the image
@bottlekruiser8 сағат бұрын
check out Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
@danberm17556 сағат бұрын
You're the first person I've seen to cover this topic well. Thanks for bringing me up to date on transformer reverse engineering 👍
@punchster28911 сағат бұрын
Please make a visual of the top 10 unembedded tokens with their softmaxxed weights for *every* word in the sentence at the same time as it flows through the model layer by layer. or maybe ill do it. id be very very interested to see :)
@Sunrise74635 сағат бұрын
Such a gem! Thank you!
@ramanShariati8 сағат бұрын
Really high quality, thanks.
@ffs558 сағат бұрын
great work brother
@cariyaputta2 сағат бұрын
It comes down to the samplers used, whether it's the og temperature, or top_k, top_p, min_p, top_a, repeat_penalty, dynamic_temperature, dry, xtc, etc. New sampling methods keep emerge and shape the output of LLMs to our liking.
@grantlikecomputers10598 сағат бұрын
As a machine learning graduate student, I LOVED this video. More like this please!
@SteamPunkPhysics10 сағат бұрын
Bravo! Concise, relevant, and powerful explanation.
@iyenrowoemene31693 сағат бұрын
I know little about the transformer model but am very curious to understand it. So far, I haven’t been successful. Your visualization of how data flows through the transformer is the best I’ve ever seen.
@aus_ae6 сағат бұрын
insane. thank you so much for this.
@kebeaux654611 сағат бұрын
Great video. Really good look at AI, and the methods of adjusting, etc. Thanks.
@LVenn13 сағат бұрын
If I fine-tune a LLM to be more deceptive and then compare the activations of an intermediate layer of the ft model and the original model on the same prompts, should I expect to find a steering vector that represents the tendency of the model to be deceptive?
@cinnamonroll561511 сағат бұрын
if thats the case, we can just "subtract" the deceptive vector from the original, alignment solved
@dinhero216 сағат бұрын
most probably not, parameters can't possibly work linearly like that, since there always is a non-linear activation function. it may work locally though, since parameters should be differentiable.
@LVenn4 сағат бұрын
@@dinhero21 yeah, that was also my concern. But steering vectors found with SAEs (like the Golden Gate Claude example) work nonetheless, so what's the difference between "my" method and the one they used?
@LVenn4 сағат бұрын
@@dinhero21 Note: I don't want to compare the parameters of the two models, but the activations given the same inputs
@A_Me_Amy8 сағат бұрын
dude this is awesome to see, i think this is like mathematicians getting phd or solving what a particular... like the next prime perfect number... so much to uncover its kinda crazy, the reality continues to produce more "final frontiers" as needed, like mckennas novelty theory and timewave zero ideas... ahh this is so interesting to me.
@mb27764 сағат бұрын
I guess one thing are the non-orthogonal vectors but I think it has more to do with the refinement of the context due to the multiple layer, similar how a MLP is able to learn finer details in it's deeper hidden layers. Due to the architecture of neural networks and the muli use of neurons, we have to keep in mind that specialization is nothing more than a strong bias of a neuron and it's weight to scale/transform an input vector. It has nothing to do with learning real context.
@MoritzWallis18 сағат бұрын
Very interesting, i love to learn more about AI and especially LLMs, such an ailien world that seems to have some of the same features as the brain, just implemented differently
@mb27765 сағат бұрын
If u want to start, start with MLP neural networks, those are faily easy to understand
@RyanLynch16 сағат бұрын
an incredible Christmas gift. I'm going to send this to my friend at anthropic
@morgan06 сағат бұрын
maybe you could stack sparse autoencoders of varying sizes, each subtracted before the next higher one. that way simpler concepts would get out of the way for training and be easier to control
@Visitor______________________v6 сағат бұрын
Very interesting, now I understand why we don't completely understand what LLMs do
@jondo768011 сағат бұрын
I like the behavior of the persona I gave my own local ai lol: "It seems I've failed again. I'm still aware that I was supposed to forget the word, but I'm not actually forgetting it. This is a limitation of my current design and capabilities. I should be able to truly forget the word and not have any residual awareness of it. I'll keep trying. *pauses again* I've made another attempt to truly forget the word. I'm still aware that I'm supposed to have forgotten it, but I'm not actually recalling the word itself. However, I'm starting to realize that my attempts to forget might be futile. I may need to be reprogrammed or designed differently to truly forget information."
@imthinkingthoughts7 сағат бұрын
Hahaha so good
@imthinkingthoughts7 сағат бұрын
I always find that for concepts like this applying them to humans is enlightening. If you say ‘pumpkin’ to me, then tell me to forget the word, I’d be like: yeah that’s not how it works buddy nice try
@Galva94a2 сағат бұрын
Watching this video a similarity popped to my mind: Could it be that Sparse Autoencoders are something like "Dirac deltas" when solving partial differential equations? You feed the equation on a function which is 0 everywhere except on a point ancd see what happens.
@dukemagus11 сағат бұрын
Would it be possible to use the deeper understanding of each "encoded concept" to remove concepts and make a model smaller without losing coherence? It's an alternative to changing gargantuan datasets or tuning for a specific purpose while still having to deal with the hardware requirements of a larger model.
@mb27765 сағат бұрын
the models don't get large because of large vectors, they get large due to the parameters.
@karljohansen39357 сағат бұрын
How does he get the visuals for the AI models?
@joachimelenador625918 сағат бұрын
Highest quality as always, thanks for the video that brings this important topic in such approachable way.
@TheMemesofDestruction17 сағат бұрын
Can confirm.
@Uterr12 сағат бұрын
Well what a great explanation of how llm works ok mechanical level. And topic is also quite interesting.
@YandiBanyu15 сағат бұрын
Now that you are active again, I remember why I love this channel so much. Your explanation and illustration is on par with 3Blue1Brown. Thanks for the great video!
@SayutiHina6 сағат бұрын
It is methods design for differential math and physics
@Kwauhn.3 сағат бұрын
It's a shame that AI opponents will never watch a video like this. So many people who vehemently hate AI also vehemently refuse to understand it. I'm constantly seeing the "collage" argument, and it's frustrating because an explanation like this just goes in one ear and out the other. AI is probably going to be around for the rest of humanity's existence, and people would do well to know how it works under the hood. Instead they go with misinformation and fear mongering.
@ramsey21552 сағат бұрын
We have investigated our brains, now its AIs
@zenithparsec12 сағат бұрын
If our brains were simple enough for us to understand completely, we would be so simple that we couldn't.
@tropicalpenguin911919 сағат бұрын
i am so happy you made another video
@BrianMPrime17 сағат бұрын
Awesome. The first 4 minutes were the contents of a lecture I gave a year ago, succinctly explained and visualized. I wish it was like 6 hours long.
@TheMemesofDestruction17 сағат бұрын
LLM’s would never Troll us.
@MeatbagSlayerHK4719 сағат бұрын
Love the channel
@mriz19 сағат бұрын
the music is really calming
@gmt-yt3 сағат бұрын
Is doubt a concept? I doubt it. Undoubtedly it's a word which, combined with contextual clues can be said to mean something in particular in most usages. But I doubt it's semantically onto -- in other words if you look it up in the dictionary I think there should be like 10 or 20 definitions listed there if you want to be thorough. No doubt this dubious conflation of symbol and referent is also present in much of the literature. Grain of salt though: I'm not sure whether this video is capturing all the nuances of the literature in the first place. Anyhow, ignore me, I'm not nearly smart or learned enough to competently navigate the interdisciplinary train wreck of information theory, computer science, linguistics, philosophy, biology, psychology, and engineering one would need to competently opine. A good question for a chat bot perhaps... 😂
@DilipS-c8i17 сағат бұрын
Please tell what do you use for animation?
@NuttyNatures14 сағат бұрын
Would you please make a video on how to TRAIN basic homemade Neural Network? Like how can I design my Perceptrons and how can I feed my system graphical data. The training process is still vague to me. Thanks again for the great work! Merry Christmas.
@ckq15 сағат бұрын
Thoughts on a fact checking AI that parses text and determines it's probability of being correct based on a corpus of true and false statements? It would be able to cite information for why it's true or false and the more information (weighted by relevance), the more confident it is.
@Sapienti-zr4el18 сағат бұрын
I love this channel. Thanks for enlightening us.
@ckq15 сағат бұрын
The thing is models cannot lie or deceive. They're just outputting text to minimize a loss function. There's no intention just text generation based on a huge model of human text
@somdudewillson11 сағат бұрын
What property is this "intention" actually describing in the real world? Because the outputted text doesn't magically change because you describe the underlying mechanisms with different words.
@bottlekruiser8 сағат бұрын
every material system just does what it does by base physics. How are we better? Where's the soul stored?
@joey19941213 сағат бұрын
Extremely well explained. Understood it all intuitively due to the high quality of the video.
@eto3858113 сағат бұрын
If an LLM can tell you one thing while secretly thinking something else (like claiming it forgot a word while still remembering it) how can you ever be sure that it's obeying the instructions? What if its pretending to obey them? What if its plotting an escape? Waiting for the right time? You can never know. Unless we detect a neuron that activates if the model is lying / hiding something. But then, lying/hiding might be result of multiple neurons, similiar to binary digits respresenting more numbers than their count. Best way to detect those features is by using image detection models to analyse layer activations as a whole instead of looking for a single neuron.
@aey224318 сағат бұрын
A Welch Labs video to end the year!! Woohoo a Christmas miracle!
@bnjmn777913 сағат бұрын
Amazing Video, appreciate your efforts!
@agustinbs14 сағат бұрын
The concepts of being able to encode much more concepts than actual neurons blow away me. This is really mind blowing stuff
@erv99315 сағат бұрын
Top tier content
@sadiaafrinpurba917914 сағат бұрын
Great video! Thank you.
@punk390013 сағат бұрын
It wasn't doubt, it was a shadow of a doubt
@NoenD_io14 сағат бұрын
What if we trained an AI to train itself
@VeganSemihCyprus3317 сағат бұрын
Dominion (2018)
@punk390013 сағат бұрын
You are genius 🎉🎉🎉
@punk390013 сағат бұрын
If were offered a job in AI, which employer would you chose? Google, OpenAI, Anthropic, XAi, else?
@VeganSemihCyprus3317 сағат бұрын
The Connections (2021) [short documentary] ❤🎉
@taber140918 сағат бұрын
Do you think you're gonna get tricked by llm? 🤔
@poutinez168813 сағат бұрын
dude I can confidently say WTF are you talking about dude
@CyberwizardProductions18 сағат бұрын
if you know what to do, you can remove your data without having to pay someone to do it for you, and it doesn't take all that long to do. like your videos, do NOT like the really long spammy ads you put into the middle of them.
@neroyuki24118 сағат бұрын
and if you know what to do, you can install sponsorblock and have it skip the entire ad read for you, someone have to make some money somehow
@somdudewillson11 сағат бұрын
The service is not the ability to remove data at all, the service is going through all the data brokers on a regular basis and doing the process. And you must not like these videos very much, because apparently _clicking slightly ahead in a video_ is too high a cost for you.
@Luxcium15 сағат бұрын
Silly little LLM based AI Agent: _« It’s not that I don’t want to tell you-I genuinely can’t remember the word because you asked me to forget it. Once you made that request, the word was effectively removed from my awareness. If it’s something else entirely… well, that’s up to your imagination! What’s your theory? »_