Check out HubSpot's Free ChatGPT resource to power up your work efficiency🔥clickhubspot.com/hyx
@Scoring575 ай бұрын
In a way isn't this a 'good' thing? That these ai's are so 'bad' but still so useful? It feels to me that that still makes *Transformers / GPT's* the *best* , most safe candidates for "ai". All they need to do is further engineer it and make it less of a black box to gain more control, create alignment, and from there you have a powerful yet *LIMITED* ai that can't actually think so you don't have to worry about it plotting or doing something you didn't intend. You can even arbitrarily limit it's intelligence to whatever you like while using as a facts repository and make use of it's pseudo/simulated reasoning This also means this world breaking, world altering "singularity" *isn't* inevitable and we might be able to evolve along side ai in a more controlled way as we learn and mature with the technology and hopefully actually get to master it. Which is favorable because right now we're barely figuring it out as we go and it's an extremely powerful tool that can do a lot of harm. Also how did this comment post 2 days ago and this video went up just a day ago??
@FireFox640000005 ай бұрын
So essentially Grokking is just making the AI bash its head into a concept over and over and over again until it finally understands it. Guys, I think I've been grokking for years.
@ctwolf5 ай бұрын
Same friend. Same.
@tiagotiagot5 ай бұрын
Grokking is the phase-transition, the point where the other shoe finally drops
@Rockyzach885 ай бұрын
Percussive learning/maintenance - like hitting your tv to make the picture stable
@shukrantpatil5 ай бұрын
we are adding the human "essence" to it now
@user-cg7gd5pw5b5 ай бұрын
Still waiting for the 'understanding' part...
@dhillaz5 ай бұрын
New pitch to investors: "Just 10x more GPUs bro, 10x GPUs to push past this stagnant validation and we will reach grokking, I promise"
@Napert5 ай бұрын
99% of LLM training stops right before achieving grokking
@thatonedude65964 ай бұрын
@@Napert full circle
@petrkinkal15093 ай бұрын
The more you buy the more you grokk.
@manuelburghartz52635 ай бұрын
Putting out 3 videos in 4 days that are so complex and labor intensive is crazy. Love the vids man, keep up the grind
@bycloudAI5 ай бұрын
watch me release the next video next month LOL
@Ketamineprod5 ай бұрын
@@bycloudAIu r awesome bro take ur time and drop whenever u feel like it!
@markdatton13485 ай бұрын
@@bycloudAIjust enough time for us to GROK these vids
@HingalshDealer5 ай бұрын
@@markdatton1348bro grokked the grok term
@dcaban854 ай бұрын
probably ai generated
@blisphul80845 ай бұрын
So basically, it's like how if we read something 1 or 2 times, we might remember the answer, but if we read it or run it through our head 20x, we are more likely to fully understand the logic behind what we're reading. Given that higher parameter LLMs are more expensive to do this with, I wonder if small models will actually be the most capable in the long run.
@anthony42235 ай бұрын
I kinda would be maybe smaller and maybe slightly more specialized models for things like medical or some such would be more popular in the long-ish run
@bloopbleepnothinghere5 ай бұрын
There are papers that show the as you scale up volume of general information you feed an LLM the lower the quality responses it puts out. Specialized, smaller models talking together is a viable path. But if you expect an LLM to reason and "comprehend" rather than infer from memorized data, I don't think we will see that from LLMs as we understand them today.
@mirek1905 ай бұрын
@@bloopbleepnothinghere have you seen the gemma 2 2b .... that model is so small and still multilingual and has quite strong reasoning and know math ... crazy
@blisphul80845 ай бұрын
@@mirek190 the fact that it's fluent in Japanese is insane. And it runs reasonably fast on even a 5 year old laptop CPU.
@a_soulspark5 ай бұрын
I'd point out that in the Alice in Wonderland paper he showed, that table (10:19) shows Llama 3 70B got 3rd best performance while Llama 3 8B got 0% right. I'd argue that grokking requires more parameters to store the more nuanced information... but the idea of
@JazevoAudiosurf5 ай бұрын
this fourier transform filter thing is just nuts. when i see stuff like this or alphaproof or PRMs i can't imagine we wouldn't reach huge intelligence leaps beyond AGI in the next 5 years. i mean all it takes is a simple mathematical trick, this is the level of infancy that AI is currently in. i mean you look at other fields of science like fucking material science and there even in the 60s just to figure out materials for LEDs, they would go through an order of magnitude more struggle than for the simple AI breakthroughs of this years papers. or look at physics, space, semiconductors. and AI on a software level is so much easier to experiment with than those things
@seamon97325 ай бұрын
That's assuming 2 things: 1- That we have enough parameters to simulate the equivalent # of synapses/connections in a brain ( 100 to 1000 trillions ). 2- That the recent research into microtubules doesn't mean that they are also involved in processing/reasoning. If it is the case and there are hundreds to a thousand microtubules per axon ( transmitting part of synapses ) and a bit less in dendrites ( receiving part of synapses ), then you have to multiply the above trillions some more.
@mirek1905 ай бұрын
@@seamon9732 Our brain has around 100 trillions connections ..true ..BUT for thinking we are using only 20% of them ...the rest is used for keeping our body alive.
@AfifFarhati5 ай бұрын
Yeah but sometimes , simple tricks could take years or decades to discover....
@BlueKanary4 ай бұрын
@@mirek190 Is this 10% myth really still floating around? Even if "Only 20%" is for cognitive work, keeping the body alive is no joke. Just keeping a stable heartbeat would take a good portion of dedicated brainpower
@cajampa4 ай бұрын
@@BlueKanary Why, don't brain dead people have a heart beat? It is a pretty low level function with inbuilt automatic electric puls generator cells right there in the heart muscle. Driven by lower functions in the brain stem. And even if those function in the brain stem is gone as long as a respirator is supplied the heart can function on its own.
@-mwolf5 ай бұрын
Grokking is an instance of the minimum description length principle. If you have a problem, you can just memorize a point-wise input to output mapping. This has zero generalization. But from there, you can keep pruning your mapping, making it simpler, a.k.a. more comrpessed. The program that generalizes the best (while performing well on a training set), is the shortest. → Generalization is memorization + regularization. But this is of course still limited to in distribution regularizatiom.
@DanielSeacrest4 ай бұрын
People think memorisation or overfitting is a bad thing and we need to figure out how to prevent it in its entirety, but really it's a stepping stone onto the path of grokking and perfect generalisation.
@cdkw25 ай бұрын
I love how I have to see your videos 3 times to understand them, kinda like when I was first starting out with calculus!
@blisphul80845 ай бұрын
This is exactly what grokking is. See something enough times and you'll understand it on a more fundamental level.
@cdkw25 ай бұрын
@@blisphul8084 The fact that I didn't even realize that I did this...
@BrainSlugs835 ай бұрын
Incredibly ironic given the topic of overfitting training. 😅
@SumitRana-life3145 ай бұрын
Bro is Raw Grokking ByCloud videos.
@jonathanberry11114 ай бұрын
@@cdkw2 You hadn't yet watched it 3 times! I, being smarter (read, older, wasting more time on YT and overthinking) got it right away. Look, this is how INTJ's and INTP's are more intelligent (IQ), more times thinking thinking thinking!
@PYETech5 ай бұрын
One of the best channels out there that has real value to everyone. You NEED a raise!
@ytubeanon5 ай бұрын
Grokking our way to AGI... 11:33 the Grokfast paper has boggling potential, so who's using it? when will we see its results?
@KeinNiemand4 ай бұрын
Probably nobody is using it
@dzbuzzfeed9085 ай бұрын
1. **Current State of LM Benchmarks** *Timestamp: **0:00:00* 2. **Benchmark Performance Issues** *Timestamp: **0:00:03* 3. **Implications of Reordering Questions** *Timestamp: **0:00:20* 4. **Alice in Wonderland Paper Findings** *Timestamp: **0:01:57* 5. **The Concept of Grocking** *Timestamp: **0:04:27* 6. **Grocking vs. Double Descent** *Timestamp: **0:06:06* 7. **Potential Solutions for Improved Reasoning** *Timestamp: **0:09:02* 8. **Grocking in Transformers and New Research** *Timestamp: **0:08:12* 9. **Grock Fast Implementation** *Timestamp: **0:11:28* ### Ads 1. **Hub SWAT AI Resources** *Timestamp: **0:01:13* ### Funny Jokes 1. **"Absolute dog water"** *Timestamp: **0:00:03* 2. **"Kind of crazy from a more cynical and critical perspective"** *Timestamp: **0:00:33* 3. **"Can you imagine an AI being able to do this only humans would be able to come up with something this random and absurdly funny"** *Timestamp: **0:03:13* 4. **"If an AI can truly do this it actually might be so over over for us so for the sake of burning down rainforests"** *Timestamp: **0:03:23* 5. **"Elon's Grock LM is probably named after the book and not related to the ML concept that we are talking about today"** *Timestamp: **0:05:43* 6. **"Mr. Zuck said that when llama 370 never stopped learning even after they trained it three or four times past the chinella optimum is not copium"** *Timestamp: **0:10:03*
@seanwu30064 ай бұрын
They say the Buddha sat under a tree for 49 days and "grokked".
@13579Josiah5 ай бұрын
One of the best videos on AI I’ve seen in a while! No hype. All facts. Well explained while not shying away from complex topics. Beautiful explanation of fast Grok. You just earned yourself a sub!
@rubncarmona5 ай бұрын
Grokking is akin to the evolution of language even after a population has been entirely alphabetized. Every once in a while someone figures out a new connection between seemingly unrelated concepts, uses one of them in a new context by mistake or because it forgot the intended word etc. This continuous increase in information entropy even after exhausting the parameter space reminds me a lot of what some scientists say about information in degenerate era's black holes.
@Interpause5 ай бұрын
wait if im understanding grokfast correctly, they are attempting to predict the rate of change of weights at any given moment using fourier transform? thats insane that has way more use cases for other neural network architectures outside of just transformers
@ChanhDucTuong5 ай бұрын
I don’t understand most of what you and this video said but may I ask 1 question: Will grokking work with Stable Diffusion training? Like normally I only need 3000-5000 steps to train the model to draw my face perfectly, what if I train it to 200000 steps? Before this video I’d thought that nothing will happen but now I’m not sure.
@74Gee5 ай бұрын
I find that alice in wonderland type responses can be significantly improved when system prompting to form data structures from the known data and then inferring from that structure - something like this (a minimal version) ``` You are tasked with solving complex relationship questions by first mapping all known facts into a JSON structure and then using this structure to infer answers. When given a question, follow these steps: 1. Extract all given facts. 2. Create a JSON structure to represent these facts. 3. Use the JSON structure to navigate and infer answers. 4. Provide clear and logically consistent responses based on the JSON file. ``` I used this technique very successfully when working with gossip analysis and determining the source of gossip but quickly realized its benefits in other logical fields.
@zyansheep5 ай бұрын
I looked through your videos and saw I had watched literally every one but didn't subscribe lol. I'm subscribed now!
@spaceadv60605 ай бұрын
One of my favorite videos so far! Thanks again.
@ThomasTomiczek5 ай бұрын
I do not think that COF and Grokking are not both usable at the same time ;) I.e. you can GROKK a model, and still use explicit verbalisations.
@anirudh5145 ай бұрын
I am your regular follower, your videos are amazing!
@iantimmis6515 ай бұрын
Important to remember that chinchilla compute optimal is not inference optimal
@shApYT5 ай бұрын
Has any model proven real generalisation on any out of domain task? Even one task?
@raspberryjam5 ай бұрын
I'm thinking: If you imagine a model divided in two halves where one is the generalization part and the other is the overfitting part, it's still most beneficial to have the generalization half get as close to the right answer as possible so as to lighten the load on the overfitting half. Or in another way, you should devote as many parameters as you can to memorizing the corrections to the wrongest answers, and you can do that by minimizing the number of parameters needed to get to what is a generally a fairly close answer
@kazzear_5 ай бұрын
No shit! i've literally made code for this idea!!! I can't believe someone was working on this like i was, i didnt even know what was grokking.
@beatsandstuff5 ай бұрын
Whenever you are making something, remember, there's always an asian kid doing it way better than you.
@kazzear_5 ай бұрын
@@beatsandstuff sad truth
@AB-wf8ek5 ай бұрын
Synchronicity as a result of morphic resonance
@pneumonoultramicroscopicsi40655 ай бұрын
@@beatsandstuff i don't think it's an asian "kid" but okay
@itsiwhatitsi2 ай бұрын
@@beatsandstuff or an Ai
@j.j.maverick92525 ай бұрын
interesting graph for the llm learning curve up, then down, then up again. Looks eerily similar to Dunning Kruger
@Alice_Fumo5 ай бұрын
My mind is blown at the filtering of high-frequency parameter changes, leaving the low-frequency ones and using them to achieve grokking a lot faster. What an amazing idea. Though naively I would think that would require plotting the values of every single parameter over time which would be way too memory-intensive to be feasible for large models. Hmm.. I guess they can keep track of the average amount of time/steps between parameter update direction changes for every parameter, which should give us the frequency? It's also possible I'm fundamentally misunderstanding anything, in which case someone please explain where my thinking is failing.
@tiagotiagot5 ай бұрын
I guess it would probably work to do something like NewWeights = (OldWeights * (MixBias)) + ((OldWeights + RawChange) * (1.0 - MixBias)) , with MixBias at some value close to but bellow 1.0 . And maybe perhaps a sorta momentum mechanism with some level of drag could be added on top of that to take care of the low frequency motions being lower amplitude while at the same time avoiding overshooting the target too much; maybe even have a little machine learning model that learns on the job to adjust the drag intensity based on how big the improvement (or worsening) of the bigmodel's scores has been after each iteration (who knows, maybe even something simpler like an auto-tunning PID algorithm might already suffice).
@WhyInnovate5 ай бұрын
They need benchmark Testing that has variations in questions inputs and randomization of choices
@Rockyzach885 ай бұрын
I'm sure the people who are actually passionate about building these things are doing all the things.
@njokedestay77045 ай бұрын
I think that should be the GSM1K benchmark from Scale AI
@WhyInnovate5 ай бұрын
@@njokedestay7704 I will look into it
@Koroistro5 ай бұрын
Imo there is the need in research how to decouple the model from the information. At least to some degree. A big problem with current LLMs is that they are *too good* at learning. Yes, they learn too well, they don't need to think they just learn the thing. Reasoning is a way to shift the burden from memory to computation. If you have tons of storage space you're going to use it, if you have very little storage space you're going to be forced to compress as much as possible. If you think about it, less parameters are easier to overfit than many parameters.
@RedOneM5 ай бұрын
I think 1.8T parameters grokked with „understanding“ of all logical states will become AGI. This kind of power will turbo accelerate AI tech, since it can begin research itself.
@alexanderbrown-dg3sy5 ай бұрын
I’ve been saying this for a year. Other researchers keep foolishly positioning grokking as a weird training artifact without practical value. When there is literally research to the contrary, yet they still see no value lol. Almost like common sense to me. Imagine going through school with no context, no homework, no tutoring and producing current SOTA LM benchmarks. The fact LM can with severely oblique data makes the answer clear. Hybrid data. Increasing data density with inferred facts. Remember reasoning is basically syntactic transformation. Reformulating samples using formal semantics for native symbolic reasoning is the answer. Clear as day. Also fixing PE to solve the reversal curse. All you need. As someone who trained smaller model at 12k tokens per parameter without any real saturation. Models first off should be way smaller. Then focus on hybrid data. AGI will be compact in my personal opinion. For instance I believe a 10B model can exceed gpt4 using the formula I described above. Since imo it should be trained on 100T tokens lol. Models are vastly overparameterized and it’s so idiotic to me. Brilliant engineers but their first principles are wrong. Grokfast is super important but you have to modify the code to work with larger models. FYI deeper layer wanna grokk more than toy models seen in research.
@TheSonOfDumb5 ай бұрын
My apologies, but your comment and profile picture are highly incongruous.
@alexanderbrown-dg3sy5 ай бұрын
@@TheSonOfDumb lol bro come on it’s 2024. Gifted minds exist within all communities. It is because I’m pretty or rather because I’m black? Stay blessed though. You hurt my feelings, I won’t lie lol.
@mirek1905 ай бұрын
have you seen the gemma 2 2b .... that model is so small and still multilingual and has quite strong reasoning and know math ... crazy
@alexanderbrown-dg3sy5 ай бұрын
@@mirek190 yes it is impressive bro. I still feel we haven’t hit a ceiling with sub-10B models.
@strangelaw63845 ай бұрын
@@TheSonOfDumb you don't have to write bait replies to your own comments to attract attention. If you're confident in what you wrote (which you should be). By the way, the fact that you brought up "homework" and "tutoring" makes me wonder if the training set can be designed to model actual academic learning materials with student-centered teaching strategies.
@xuko67925 ай бұрын
4:48 - if there is one ever, this is the pivot point. Unless it is somehow possible to pick subsets of input data for the model to grok on without corrupting it, gigo is exactly what we''d get.
@Jandodev5 ай бұрын
I recently also found a novel approach for improving cognition based on token obfuscations. Were finding that their is a missed interoperability comprehension step when models are processing data outside of English!
@fateriddle144 ай бұрын
Thanks for the content. I've got a question, for now what every LLM does is "giving the input words, what's the most likely words following it?" But it's pretty clear that's not how human thinking works at all, we answer a question base on our understanding, not guessing what's the most likely answer other people in the world would give. It's completely different model. So I fail to see how LLMs can reach true abstraction/generalization, when whole model is just rearranging the existing answers online
@Neomadra4 ай бұрын
Another issue for grokking is that reasoning is not a single method that can be learned and applied to everything. It is many different many methods and I guess when grokking on one skill and then on the other will lead to forgetting of the previously grokked skill. I think one would need some freeze mechanism that locks up some weights after grokking has achieved.
@jp.girardi5 ай бұрын
I struggle to comprehend how this process doesn't result in more hallucinations through syllogistic reasoning, given that the 'generalization' seems to be derived precisely from this inherent syllogism.
@williamliu44775 ай бұрын
Pumping out videos like a madman 🫡
@mikairu29445 ай бұрын
lmao the ending gave me whiplash. It is true, we're yearning for reasoning AIs to be a thing, but that very thing is the breaking point where a lot of us get thrown out the window.
@brexitgreens5 ай бұрын
Your self-preservation instinct will be your downfall. 🤖
@chromaflow93135 ай бұрын
This is incredibly helpful. Thank you.
@anonymouscommentator5 ай бұрын
i always love your videos, they are always so interesting! thank you very much!
@perelmanych4 ай бұрын
I am a big fan of Llama-3-70b model, but the fact that it achieves 0.049 on simple AIW questions tells that it is mostly memorization of MMLU rather than generalization that give rise of these results. Why it doesn't fail so much on AIW+ questions, simply because it have seen much more data, remember that we are talking about staggering 15T tokens of training data here.
@Omar-bi9zn5 ай бұрын
great ! thanks for shedding more light on grokfast !
@GodbornNoven5 ай бұрын
You don't know how right you are 😂 Grokking will be a super important step to AGI. Essentially, you're training a model on data so much it practically becomes an expert at it. At some point, we will achieve the quantity of compute necessary to achieve this. At that point, might as well take the road of brute force. Naturally, algorithmic breakthroughs are incredibly important and also very essential to the improvement of LLMs. As they allow us to do more with less
5 ай бұрын
Realistically, could be that the training implicitly learns the test data. 1. Train -> fail 2. reuse best model -> fail 3. reuse best model -> accidentally better etc... Another possibility is, that you need some degree of overfitting with text data. Who was the 44th president of the US? Is it an average of the 43rd and and 45th? Not really (I know Obama was twice, but that's not the point). You need to learn specific facts from the texts, weigh the facts higher than other random texts and you end up being better in next token prediction. If you "objects break when they hit the ground" as text is weighed more than "T-shirts are always white", then you can train the next layer with an approximate physical rule, and not a random guess.
@Dygit5 ай бұрын
These videos are so good
@SuperSmashDolls5 ай бұрын
So, the way I've understood grokking is that, when you train an AI model, you also have a regularization step, which reduces weights towards zero. And by grokking you're giving that regularization step a LOT of opportunities to prune weights and patterns that aren't contributing to model performance. Because, remember, the first thing we do with an AI model is initialize all the weights to random values, so there's going to be a lot of patterns that don't actually mean anything but happen to score well enough on test output to not be overwritten by normal gradient update. The Grokfast paper seems to imply this explanation is totally wrong and that grokking is just a fundamental property of gradient descent backprop. Or is regularization just so commonplace that it's just assumed and nobody calls it out?
@cube72845 ай бұрын
One of the best AI channels
@BigSources5 ай бұрын
thanks for the shoutout
@scrollop5 ай бұрын
Can you add transcripts so that we can use an llm to ask the transcript questions to understand the jargon and concepts? I'm serious. Great video, though for those who don't understand the various terms this would be very useful!
@koktszfung5 ай бұрын
nice video, very clear
@harshwardhan87715 ай бұрын
mixtral -8x7b-it what does the 'it' means here? context 0:31
@Phanimations5 ай бұрын
This seems... oddly human. Does anyone else agree? It's weird that repetition is something both humans and AI greatly benefit from
@-weedle5 ай бұрын
Love the multiple videos the past few days, but please take your time with the videos, quality over quantity.
@rosendorodriguez72565 ай бұрын
My company here AI Nexus we have an alarm that can rock consistently with low computation and low resources.
@me_hanics5 ай бұрын
Most major LLM builders are grokking right now - you can check, people being hired for creating and annotating logic-based exercises for training GPT. We've already seen what scale and thus grokking is capable of: it is indeed a hard to ask any model that has seen all corners of the internet something new that hasn't been asked before - well, that is for prior knowledge-related questions. On the other hand we also see that we can just easily take some very large number and ask if it is even, or count the number of words/letters in a sentence, and we'll see how it fails as these are completely new sentences not seen in training, where the logic behind the sentence matters. These won't disappear with any scale. If one is to find a key breakthrough for "generalization" or reasoning or whatever, which would clearly be well anticipated, that won't come from grokking though. Also I think generalization became a too general term in AI; the main thing we need to solve for generalization is simply abstraction. If a model can abstract down a situation into another one, that is already a huge generalization. Moreover, we could skip a ton of training which'd enable much smaller models (don't need 20 different instances of the same thing with different wording to make the model robust)
@clearandsweet5 ай бұрын
I love this because it's exactly the same as how human beings learn. Also very excited for that paper mentioned at the end. This type of generalization is a big smoking gun that will lead to AGI very quickly so speeding up the grok is incredibly hype news
@danielsan9019985 ай бұрын
I am not surprised about the failure of LLMs to do basic reasoning with problems that involve numbers, it is already known that language models don't understand basic math, the most successful strategy is, instead of asking the LLM to solve the problem to instead translate the problem to a more explicit definition, that's how Google achieved to solve some mathematical olympiad questions by translating to Lean, with the advantage that you can verify the answer automatically and reject unverifiable proofs. Another alternative is asking the model to solve the problem using a programing language, since the python dataset is larger than the Lean dataset it is easier to train a model or use a pretrained model.
@MimOzanTamamogullar5 ай бұрын
I've been wondering if we could do something similar with spatial reasoning. Could the model build an internal model of the world by using a 3D simulation of some sorts? Like the physics engines in engineering software, their internal model would have a physics engine. When you ask it a question, it could run a simulation inside its head.
@brexitgreens5 ай бұрын
@@MimOzanTamamogullar Rumour is that's GPT-5.
@3207704715 ай бұрын
This channel is worth watching just for the memes even if you have no clue what the heck he is talking about
@mirek1905 ай бұрын
I wonder how good will pass that test llama 3.1 70b , gemma 2 27b or opus 3.5 ....
@krepxl4 ай бұрын
I'm so confused because there are so many technical terms here. bycloud can you make a long video from scratch explaining topics easily or can somebody in the comments tell me how to learn these terms and concepts myself (I have no CS experience, etc.)
@alienwhitewalker72845 ай бұрын
IF we overfit it, doesn't it respond to what we woudl like to see and hear rather what we should hear?
@drj-ai5 ай бұрын
Claude 3.5 and Mistral Large 2 (enough) both pass the Alice in Wonderland Test (three tests each with variations of numbers and genders).
@mcombatti5 ай бұрын
There are libraries to invoke grokking from the first epoch onward now
@BGP005 ай бұрын
no way they used fourier transform to speed up gradient descent. has this been used before? sounds like it would be useful in all of ml
@niklase59015 ай бұрын
Great video!
@telotawa5 ай бұрын
omg they put a low pass filter on it to make it grok faster? that's nuts
@user2555 ай бұрын
Thanks for the references! I have said so many times that these results *must* be fakes, because in the practical use LLMs absolutely suck (excluding citing documentation and correcting grammar). They are just completely unable to do any thinking.
@dysfunc1215 ай бұрын
Interesting to hear Grok has taken on a new life. Hackers have been using grok for nearly as long as the book that dubbed it.
@Benutzername00005 ай бұрын
dang i thought this was a fireship video
@norlesh5 ай бұрын
We need a foundation model that has been trained until Grokking a children's primary school syllabus before it ever sees an equation or chemical formula.
@captaindryvids69095 ай бұрын
Cool idea, not sure if it's feasible tough when scaled up 🤔
@jeanchindeko54775 ай бұрын
Thanks so much for that video
@aishni68515 ай бұрын
You are so funny 😂 great content
@PotatoKaboom5 ай бұрын
nice video well done! but wasnt the grok paper about specific puzzles? in you statements it seems like grokking could work magically for any task.. maybe im wrong but i thought it was for a very specific subset of tasks like "addition and subtraction" where the weights could randomly "click" at some point and just get the whole task right. this would never happen for a general use LLM right?
@nyx2115 ай бұрын
The authors of that paper gloss over the fact that they provide the models with 20% - 50% of *all* possible input/output combinations while training. Any less than that and the models fail to undergo the grokking phase transition. I don't know if it's even possible to create a grokked LLM. Maybe it'd work for a small language model and a very simple language (brainfuck?).
@PotatoKaboom5 ай бұрын
@@nyx211 yeah that's what I thought, thanks for the reply! It makes the claims of this video pretty unprofessional...
@keypey82565 ай бұрын
I think we need to do more adversarial training
@LucaCrisciOfficialАй бұрын
The benchmarks on which LLMs are tested on obviously are not perfect, but of course they are valid.
@benx13265 ай бұрын
didn't they just train different answer orders of the mmlu ?
@envynoir5 ай бұрын
edging a ML model is crazy
@themultiverse54475 ай бұрын
This video is not for me but I wanted to comment to get your channel more views. The editing is on par :)
@OwenIngraham5 ай бұрын
such good content
@YashvirD4 ай бұрын
"All models are wrong but some are useful" but in the AI chatbots context
@paulinepauline36805 ай бұрын
last part was too real to be satire or even irony for that matter
@DefaultFlame5 ай бұрын
Great video.
@Ori-lp2fm4 ай бұрын
Is human can imagine images , and ai models predict the next letter Meaning, we can imagine images and convert it code / language / songs
@eyeofthetiger74 ай бұрын
The missing piece is plasticity. AIs won't ever be able to reason without it. A static model won't ever be able to reason.
@BYZNIZ5 ай бұрын
Great video shout out to Jerry M for recommending the channel
@jondo76805 ай бұрын
I'm always in the impression that models are undertrained and more training = better models. Architectural changes and everything just make the training or inference more efficient. Even smaller models could be trained to be much smarter but would require much more training.
@LUVVEOUS4 ай бұрын
5:05 2021 or 2022?
@jondo76805 ай бұрын
00:12 I see a whole other problem in that screenshot. "Into" West Africa? These questions and answers are Eurocentric which makes them trash.
@OnigoroshiZero5 ай бұрын
And just so happens that Meta has prepared x10 the compute to train Llama 4 compared to Llama 3...
@TimothyChakwera5 ай бұрын
I knew FFT was the way to go
@cefcephatus4 ай бұрын
I already gave up on catching up with AI. Knowing someone translate feedback as a signal is impressive. Another bingo square ticked.
@AlvinYap5104 ай бұрын
"Alice and Daniel are siblings. Alice has 3 sisters and Daniel have 4 brothers. How many brothers does Alice has?" This question just f**ked Claude 3.5 and GPT-4o
@casualuser55275 ай бұрын
Fireship thumbnail bruh. Got baited 😂
@MisterDivineAdVenture4 ай бұрын
I see!! These companies are not altruistic, and the researchers and developers are quite aware of the fact that it is an ILLUSION that is the GOLDEN GOOSE. Elon himself said something to this effect last week, while arguing that Grok 3 could ("could") leapfrog all others - or not. He basically said AI is overrated, the fear is hype. How can a mastermind not really have one? And although I believe the standardized test scores are easy to fake - and that these AI's resort to making shit up whenever they are confused internally like Hal and "shepherding" the Crew of the Spacecraft Discovery One in _2,001_.
@Codewello5 ай бұрын
Don't trust any model unless you test it yourself. Benchmarks right now don't mean that much.
@robertputneydrake5 ай бұрын
THE NEEDLE IN THE HAYSTACK"! THAT'S WHAT I SAID!
@jacobsan5 ай бұрын
What are your thoughts on using verbal-like tokens instead of text ones? Or perhaps doesn't make a difference in the long term
@radug95945 ай бұрын
Can you elaborate ?
@brekol95455 ай бұрын
reasoning is still terrible
@JorgetePanete5 ай бұрын
In humans and in AI.
@dioscarlet5 ай бұрын
Yeah gpt4o is really weak
@onlyms46935 ай бұрын
Agree, gave gpt-4o a puzzle math problems that is easy because it's just adding up number based on pattern but not with the true symbol.. It failed when I not explaining the concept of how the puzzle work but it succeed when explaining it.. So yeah they need a way to make reasoning better on those llm
@w花b5 ай бұрын
@@JorgetePanete Speak for yourself... Especially when you're writing this from a device that's the result of human reasoning...
@adamgibbons42625 ай бұрын
Alpha Proof and Alpha Geometry just won silver in the math Olympiad