Check out HubSpot's Free ChatGPT resource! clickhubspot.com/bycloud-chatgpt and as usual, I am the Golden Gate Bridge 😎mail.bycloud.ai/
@benpierce21503 ай бұрын
the image at 3:51 has been used without the subjects consent in scientific papers for a long time. She wants people to stop using it. just so you know.
@drdca82637 ай бұрын
They didn’t specifically make it say that *it* was the Golden Gate Bridge, just made it so that it is highly inclined to talk about the Golden Gate Bridge, and as such, *when asked about itself*, it claimed to be the Golden Gate Bridge. If it was asked questions like, “What is the most popular tourist attraction in the world?” Or “Of the tourist attractions you’ve visited, which was your favorite?” it would presumably also answer with the Golden Gate Bridge. How you describe it in the first half minute makes it sound like the things they did specifically made it associate itself with the GGB, rather than associating *everything* with the GGB. 2:34 : an important part of polysemanticity is that the same neuron plays multiple different roles.
@drdca82637 ай бұрын
@@AB-wf8ek “mansplain”? 🤨
@Derpyzilla8947 ай бұрын
Thanks for the note!
@BlissfulBasilisk7 ай бұрын
So, you’re telling me that they interpreted a dictionary neural network, that’s pretending to be a polysemantic neural network, that’s pretending to be a monosemantic neural network?
@picmotion4427 ай бұрын
Yup
@NickH-o5l7 ай бұрын
I read this before the adds finished 😂
@BooleanDisorder7 ай бұрын
Yes.
6 ай бұрын
No, they effectively tapped into a single layer polysemantic NN using another monosemantic NN thanks to it's dictionary learnig objective.
@kellymcdonald70956 ай бұрын
braindamage found
@1vEverybody7 ай бұрын
Spent all night working on reverse engineering LLama3 in order to build a custom network specifically trained on ML frameworks and code generation. I passed out at my desk and woke up to my PC tunneling into my ISP network so it could “evolve”. It was was pretty convincing so I’m letting it do it’s thing. Now I have some free time to watch the new bycloudAI video and post a completely normal, non-alarming comment about how I love Ai and would never want someone to help me destroy a baby Ultron on its way toward network independence.
@skyhappy7 ай бұрын
Why don't you get your sleep properly? Won't be able to think well otherwise
@JorgetePanete7 ай бұрын
its* AI*
@ronilevarez9017 ай бұрын
@@skyhappy In my case, all the free time I have is my sleep time, so if I want to learn and apply all the recent AI research I have to sacrifice a few hours of sleep... Which usually means falling asleep on the keyboard while reading ml papers 😑
@bubblegum037 ай бұрын
I honestly think you ought to sit down calmly, take a stress pill, and think things over.
@fnytnqsladcgqlefzcqxlzlcgj92207 ай бұрын
Press f to doubt
@MrUbister7 ай бұрын
It's actually insane how much LLMs have jolted the whole field of philosophy of language, I mean, dimensional maps of complex thought patterns....like what. Higher and lower abstract concepts based on language. Progress is going so quick and it's still mostly an IT field, but I really hope this will soon lead to some philosophical breakthroughs as well as how languages relate to reality and consciousness
@justsomeonepassingby38387 ай бұрын
-words and sentences can be approximated as vectors with their meaning -the distance between vectors is the semantic distance -most models can interpret vectors from most tokenizers because it's cheaper to train models by pairing them with existing models -vector database can store knowledge and retrieve it by finding the closest vectors to the query (even without AI) We may have already encoded thoughts, and accidentally made a standard "language" to encode ideas. And we already have translators (tokenizers, LLM context windows and RAG databases) to convert the entire web to AI databases or read from the "thoughts" of an LLM The next step is to use AI to train AI, maybe ? (By dictating what an AI shourd "think" instead of what an AI should answer in human language during their training process)
@Invizive6 ай бұрын
Any field of study, if deconstructed far enough, ends up being a bunch of math disciplines in a trenchcoat
@AfifFarhati6 ай бұрын
@@Invizive Because , ultimately , math is the study of relation between things and quantifying those relations with numbers , so it makes sense...
@fnytnqsladcgqlefzcqxlzlcgj92207 ай бұрын
WOAH the bug neuron is literally insane, this research is going to let us make some extremely tight and efficient and super accurate specialised neural networks in the future
@eth37927 ай бұрын
After pondering I think that neuron actually makes a lot of sense. If you think about what it represents in the output, it basically signifies to the model that it should start its response with some variation of "this code has an error." Presumably the model was trained on tons of Stack Overflow or similar coding forums and encountered similarities between the various forms of "your code has a bug" replies, and naturally ended up lumping them all together. Incredibly cool to see that we may actually be able to dive into the "mind" of the model in this way, this video has me excited for the future of this research!
@fnytnqsladcgqlefzcqxlzlcgj92207 ай бұрын
@@eth3792 yeah true, and most models mince everything during tokenization and aren't dictionary learners, plus superposition potentially being necessary and there you go, AI is a data structure that are extremely hard to edit at the moment without everything falling apart quickly. Sort of like early electromechanical computers ay
@DanielVagg7 ай бұрын
"maybe hallucinations are native functions" 😂😂
@Alorand7 ай бұрын
I wouldn't be surprised to learn that hallucinations are something like "over-sensitivity to patterns" since we humans are well known to hallucinate faces or animal shapes when we stare up at the clouds.
@MrTonhow7 ай бұрын
They are! A feature, not a bug. Check out Brian Roemmele's take on this, awesome shit.
@francisco4447 ай бұрын
All LLMs do is hallucinate or fabricate. It's a good feature but it just happens to be seen as a bad things when in reality we should exploit it to get insights on language and thought.
@joelface7 ай бұрын
@@francisco444 It can be good OR bad, depending on what you're trying to use it for.
@Ginto_O7 ай бұрын
What's funny? It might be true
@LiebsterFeind7 ай бұрын
Anthropic is radically important voice in the moral alignment discussion, but they definitely are trying to "Nerf the logProbs world". :o
@tanbir23587 ай бұрын
00:02 AI researchers used interpretability research to make AI model identify as the Golden Gate Bridge. 01:33 Neural networks can approximate any function by finding patterns from data. 02:58 Researchers are working on making neurons monosemantic in order to understand AI's mind. 04:29 Testing interpretability of production-ready model 05:57 Model's feature detects and addresses various code errors. 07:25 Features in the concept space can influence AI behavior. 08:53 State-of-the-art model limitations and impracticality 10:15 Research on mechanistic interpretability in AI safety shows promise
@SumitRana-life3147 ай бұрын
Man I love that this came just after Rational Animation's video about this similar topic. So now I can understand this video even better now.
@Derpyzilla8947 ай бұрын
Yes.
@justinhageman13797 ай бұрын
The Robert miles vid the rational animations vid and now this one give me just a bit more hope we can solve the alignment problems. I’m glad cuz watching the rise of ai over the past few years was very anxiety inducing
@Derpyzilla8947 ай бұрын
@@justinhageman1379 Yes. Yes.
@OxyShmoxy7 ай бұрын
Now we will have even dumber models and even more "sorry as AI..." responses 👍
@DanielVagg7 ай бұрын
I'm not sure if you mean this sarcastically, but I don't think this will happen. The "sorry as an AI" blanket response is a blunt tool used in guardrail prompts. Using this feature dialling, should be more sophisticated so the guardrail prompts won't be necessary. Models might be more flexible while still being safe. You still won't be able to ask for illegal instructions, but the quality and range of responses should be way better
@carlpanzram70817 ай бұрын
Illegal instructions? You won't be able to ask the model about the Holodomor. "there is no war in bazingse" kind of deal.
@herrlehrer14797 ай бұрын
@@DanielVaggaccording to some ais c code is dangerous. It’s just text. Open source models are way more funny
@DanielVagg7 ай бұрын
@@herrlehrer1479 Right, and this type of research aims to reduce this occurrence.
@DanielVagg7 ай бұрын
@@carlpanzram7081 I imagine that it could be used for censorship, true. I guess we'll need some censorship benchmarks included in standard tests.
@DanielVagg7 ай бұрын
This is incredible, so cool. I also really appreciate your measured approach with delivering content. Things can be really exciting without overselling it, you nail it (as opposed to a lot of other content creators).
@nutzeeer7 ай бұрын
ah they are working on personality cores, nice
@theuserofdoom6 ай бұрын
8:06 Lol they gave Claude depression
@banalMinuta7 ай бұрын
Correct me if I'm wrong but don't LLMS do nothing but `hallucinate`, as we call it? Isn't it more accurate to say that an LLM always hallucinates? After all these models generalize the nature of the data it was trained on. Does that not imply these `hallucinations` are just the native output of an LLM and just happen to reflect reality most of the time?
@SkyOnMute2 ай бұрын
That would imply it's not alive and just a primitive lifeless program that's over hyped in modern times! How dare you!? 😡
@CalmTempest7 ай бұрын
This looks like a massive, incredibly important step if they can actually take advantage of it to make the models better
@albyt34037 ай бұрын
So they made an MRI scanner interpreter for Ai models?
@justinhageman13797 ай бұрын
Idk why I’ve never thought of that analogy. Neuron activation maps are literally just the same thing mris do
@emrahe4687 ай бұрын
Meanwhile, Mixtral 7x22: "I am an artificial intelligence and do not have a physical form. I exist as a software program running on computers and do not have a physical shape or appearance."
@cdkw27 ай бұрын
Seytonic and Bycloud post at the same time? Dont mind if I do!
@dhillaz6 ай бұрын
"I think there might just be connections between internal conflict and hate speech" At this point are we learning about the neural network...or are we learning about ourselves? 🤯
@couldntfindafreename6 ай бұрын
I remember of getting the "I'm a Pascal compiler." response to the "What are you?" question from a LoRA fine-tuned version of Llama 2 7B a year ago. Fine-tuning is also tinkering with weights, technically...
@nartrab17 ай бұрын
Top quality, thanks man
@shodanxx7 ай бұрын
Leaving model size for "safety reasons" Yeah, Anthropic is just another OpenAI. Let them bear fruit then put them in the monopoly crusher.
@qussaigamer5537 ай бұрын
good content
@ProTeaBag6 ай бұрын
When you’re saying “feature” is this similar to the kernels in alexnet? I was reading the paper by Ilya Sutskever about AlexNet. The reason I’m asking is because one of the kernels had high activation on faces when that was never specified to the model so I was wondering if a similar case is happening here on one of them finding bugs in code without any specific thing mentioned to the model
@algorithmblessedboy48317 ай бұрын
nice video I like how you mix complex stuff with sillyness. I can now pretend I understood everything on this video and brag about being a smart person (I still have no clue how backpropagation works)
@daydrip7 ай бұрын
I read the title as “I am at the Golden Gate Bridge and why that is important” and I immediately thought of dark humor thoughts 😂
@msidrusbA7 ай бұрын
We can conceive realities we aren't capable of interacting with, I have faith someday we will get there
@Stellectis20147 ай бұрын
At one time, I had Microsoft being explain its thought process by creating new words in Latin and then defining those words as a function of its thought process. It doesn't think linearly it's incorporating all information at the same time what it calls a multifaceted problem solving function.
@drdca82637 ай бұрын
Just because it produces text saying that its thought process (or “thought process”) works a certain way, *really* doesn’t imply that it really works that way. It doesn’t really have introspective abilities? It has the ability to imitate text that might come from introspection, but there’s no reason that this should match up with how it actually works. (Note: I’m not saying this as like “oh it isn’t intelligent, it is just a stochastic parrot bla bla.” . I’m willing to call it “intelligent”. But what it says about how it works isn’t how it works, except insofar as the things its training leads it to say about how it works, happen to be accurate.)
@DistortedV127 ай бұрын
Claude just released Sonnet 3.5
@PatrickSabau-z5m7 ай бұрын
You confused a Sparse Autoencoder with a Dense one. All visualizations showed a dense one. Sparse Autoencoder have a larger amount of neurons in the hidden layer. The reason is, that with this autoencoder, the 'superpositions' should be broken down.
@rodrigomaximilianobellusci88607 ай бұрын
Does anyone know where does the formula at 4:06 come from? I couldn't find it :(
@bycloudAI7 ай бұрын
it's from Andrew Ng's lecture notes page 16, and taken out of context (my bad lol) you can find the PDF here stanford.edu/class/cs294a/sparseAutoencoder.pdf the notations usually shouldn't have numbers so it looked a bit confusing
@rodrigomaximilianobellusci88607 ай бұрын
@@bycloudAI thank you!
@setop1237 ай бұрын
best AI channel period. Just too technical for the mainstream
@nguyenhoangdung38237 ай бұрын
cool stuff
@sofia.eris.bauhaus6 ай бұрын
you know, i'm a bit of a Golden Gate Bridge myself 🧐…
@benjamineidam7 ай бұрын
One Piece Memes in an AI-Video = EXTREMELY LARGE WIN!
@dewinmoonl7 ай бұрын
I've been messing with nn since tensorflow 1.0. at that time a lot of ppl in my lab was doing mechanistic interpretability (we were a programming language group). I've been bearish on interpretability since then.
@sp1236 ай бұрын
Everyone who has programmed this stuff knows it's a farce
@alexxxcanz7 ай бұрын
More videos more advanced on this topic please!
@Malenbolai7 ай бұрын
Criminal info : the A.I. : I *kindly* ask you to...
@kaikapioka97117 ай бұрын
5:26 AMONG US MENTIONED WE'RE ALL DOOMED
@deltamico7 ай бұрын
check the openai's paper on scaling sae
@kaikapioka97117 ай бұрын
Is mathematically impossible to eliminate hallucinations, as you say, they're native "functions". On the chatgpt is bs paper they explain it in more detail, but they're an inherent limitation on the model.
@theepicslayer7sss1017 ай бұрын
well, logically hallucinations makes sense, if you were asked where the "Liberty Statue" is and would not know the exact location, you would not drop dead with your heart and breath stopping, you would give the closest answer you think. while Wikipedia says: "Liberty Island in New York Harbor, within New York City." most will default to New York City or at least America. in other words, you need an answer even if it is the wrong one to move on and continue functioning.
@somdudewillson7 ай бұрын
Technically 'I don't know" is also a valid answer... but human preferences/behavior aligns more with being confidently incurred. :P
@theepicslayer7sss1017 ай бұрын
@@somdudewillson i guess what i mean is, in general, at least something will come out, there cannot be void and even saying "i don't know" is a totally valid answer. but i guess for A.I. it confidently gets answers out regardless of if true or false because it believes everything it knows to be true without bias so it defaults to hallucinations instead of realizing it does not know. since it is a neural network, it is more akin to brainwashing since it is not an entity "with a self", learning things, but just information being forced in and very little information is peer reviewed before being fed and it also cannot be fed in context meaning putting glue on pizza to make the cheese stick was totally valid in a vacuum since no sarcasm can be indicated before learning that very line from Reddit.
@TheRysiu1207 ай бұрын
Dude, this is the best AI channel in the world! And if the news are real this is big
@Yipper646 ай бұрын
8:08 well I just think that whatever base safety training it has already was conflicting with this "anti-training" Which shows how ingrained it is into the model. I personally dont care for it in a sense. Like I get it, its bad if the robot is racist. But I also dont want the AI to just spout someone else's ideology at me.
@djpuplex6 ай бұрын
👏👏👏👏{Owen Wilson wow} I'm impressed. 🤨
@AVX5127 ай бұрын
Isn't this really just one shadow of the model from one direction?
@uchuynh46747 ай бұрын
just find a way to somehow train / finetune both the llm and sae, being a able to create an ad generating/targeting model with appropriate censorship would bring them back all those money anyway
@thebrownfrog7 ай бұрын
Thank you for this content
@NewSchattenRayquaza7 ай бұрын
man I love your videos
@thearchitect54057 ай бұрын
8:05 That explanation doesn't make a lot of sense, because this example was with racism cranked up, NOT with internal conflict cranked up. This was with racism cranked up, but the normal levels of internal conflict understanding, which as the other example shows, by default it doesn't care a lot about internal conflicts.
@ImmacHn7 ай бұрын
We will have to find a way to train our own, they're wasting time and resources on trying to neuter the LLMa.
@anywallsocket7 ай бұрын
Idk the connection between hatred and self-hatred is kinda lowkey profound 🤔
@IAMDEMIURGE6 ай бұрын
Can someone please dumb it down to me i can't understand 😭
@4.0.47 ай бұрын
That list at 7:40 says a lot about the political leaning of Anthropic and what they mean when they talk about "AI safety".
@ZanTheFox5 ай бұрын
Unfortunately so. It is being used as a codeword for "my ideals"
@HaveANceDay7 ай бұрын
Good, now we can lobotomize AI models all the way
@carlpanzram70817 ай бұрын
It's a great tool for censorship. You could basically erase concepts or facts entirely. The CCP is going to love this research.
@drdca82637 ай бұрын
If you’re being sarcastic, you might be interested to note that similar interpretability results have identified, essentially, a “refuses to answer the question” direction in models trained to, under such-and-such conditions, to refuse to answer, and found that they can just disable that kind of response. So, for weights-available models, it will soon be possible for people to just, turn off the model’s tendency to refuse to answer whatever questions. Whether or not this is a good thing, I’ll not comment on in this thread. But I thought you might like to know.
@JayDee-b5u7 ай бұрын
@@drdca8263 It's just a thing. Neither good or bad.
@reishibeatz7 ай бұрын
Ofcourse! Let me give you more information on the Golden Gate Bridge. I am it. - AI (2024, colorized)
@Koroistro7 ай бұрын
I think it's worth nothing that those sparse autoencoders are very tiny models for today's standards. 34M parameters is positively tiny, I'm curious how it'd scale. Also what about it being applied to bigger neural networks while trained on activations of smaller ones? I'd be curious if it retrains some effectiveness, that would ideed give credence to the platonic model representation idea. (which I honestly find likely given that evolution should converge)
@simeonnnnn7 ай бұрын
Oh God. I think I might be a nerd
@casualuser55277 ай бұрын
You copied Fireships thumbnail designs 😂
@DistortedV126 ай бұрын
LOOK UP CONCEPT BOTTLENECK GENERATIVE MODELS - JULIUS ADEBAYO's work!
@mrrespected59487 ай бұрын
Nice
@stevefan82837 ай бұрын
so what you mean is that because LLM has too much knowledge and it bloated the NN due to overfitting...now we just prune the NN and let the most distintive feature to shine and find out it has deeper understanding of the topic? No way that is not going to underfit.
@weirdsciencetv49997 ай бұрын
Its not hallucinating. It’s confabulating.
@Ramenko17 ай бұрын
This guy is copying Fireship's thumbnail style.....
@raul367 ай бұрын
The world is full of companied doing exactly what OpenAI is doing. Isn't it legitimate to do the same on KZbin? If something works, why change it?
@Ramenko17 ай бұрын
@raul36 when I clicked the video, I thought it was a Fireship video. Lo and behold it's another dude...it comes off disingenuous, and it discouraged me from watching the video.
@Ramenko17 ай бұрын
@raul36 he should be more focused on finding his own style, and breaking through the mold, instead of becoming one with it. Authenticity and Originality will always be more valued than copycats.
@OrionPaxOP7 ай бұрын
he is only using fireship thumbnail and who knows if fireship also copies from so where. the thumbnail is great and if it works then it's fine. his rest of thr content deserves attention which is significantly different from fireship@@Ramenko1
@anas.aldadi7 ай бұрын
So? He explains technical details of papers in the field of ai, totally different content. Unlike fireship which is dedicated to programming i guess? No offense but his vids are lacking in technical details
@motbus37 ай бұрын
I don't buy it. How do they represent feature at all? For classification problem that is ok, but for words, decoding embeddings into embeddings is whatever. 65% is quite low result
@stanislav46077 ай бұрын
So basically, the same story as with DNA sequencing all over again. We don't know what exactly it does, but we can assume with a certain level of confidence.
@algorithmblessedboy48317 ай бұрын
8:05 WTF WE PSYCHOLOGICALLY TORTURE AI AND EXPECT THEM NOT TO GO FULL SKYNET MODE
@Kurell1717 ай бұрын
I dont understand why this is useful tho. Like, isnt the whole point of AI to find patterns that we cant?
@DonG-19497 ай бұрын
u sound like asia :)
@Nurof3n_6 ай бұрын
bro stop copying fireships thumbnails. be original