An Exactly Solvable Model for Emergence and Scaling Laws

  Рет қаралды 5,478

Tunadorable

Tunadorable

14 күн бұрын

The paper:
arxiv.org/abs/2404.17563
Support my learning journey either by clicking the Join button above or becoming a Patreon member!
/ tunadorable
Discuss this stuff with other Tunadorks on Discord
/ discord
All my other links
linktr.ee/tunadorable

Пікірлер: 35
@Crawdaddy_Ro
@Crawdaddy_Ro 13 күн бұрын
Emergence is one of the concepts I enjoy researching most! Complexity science is, without a doubt, a truly futuristic science! This paper really pulls my cord, dude! Edit: The paper is interesting but feels pretty basic when it comes to explaining emergence in deep learning models. They used a simplified model with specific tasks designed just for this research, and while it's cool to see skills following a power law and showing up as a sigmoid curve, I'm not sure how relevant it is to real-world applications. The models seem too tailored to this experiment to draw any solid conclusions about how skills emerge in more complex, practical scenarios.
@loganlawrence1476
@loganlawrence1476 12 күн бұрын
Parameter count limit in the bottleneck table might also be a proxy for inference costs or product latency, eg. a company sets aside a fixed budget for a deployed model but has lots of time until go-live and is willing to spend money on training to find the best performer within that speed constraint. Just an idea, great video btw!
@AndreRSilva-oz1nd
@AndreRSilva-oz1nd 13 күн бұрын
Man, amazing vids, keep the good work going!
@marcfruchtman9473
@marcfruchtman9473 13 күн бұрын
The title of this paper is super interesting. I do find the choice of "skills" as being basis functions within the model to be somewhat difficult for me to wrap my head around. It would be immeasurably more useful if they were able to demonstrate that it also modeled some real world example... such as using the MNIST data and applying some basis function such as detecting horizontal lines, vertical lines, Diagonals, loops, etc and then evaluating the result to see if it matched their findings when using the mathematically derived basis functions. I look forward to any future updates.
@wwkk4964
@wwkk4964 12 күн бұрын
Top tier content!
@whemmakatatt5311
@whemmakatatt5311 13 күн бұрын
NICE content. S tier
@netherportals
@netherportals 13 күн бұрын
Pretty cool new ability
@joe_limon
@joe_limon 12 күн бұрын
I think to advance future models, we are going to have to figure out how to increase training efficiency.
@JGLambourne
@JGLambourne 12 күн бұрын
're orthography of real world skills. Feels a little bit of a stretch to think of such complex things in this linear way, but I guess one could imagine some "basis" skills from which others are composed.
@andrewsilber
@andrewsilber 13 күн бұрын
Maybe Congress should authorize a full digitization of the Library of Congress if what we need is trillions of tokens of quality data. Presumably they could justify it on the grounds of national security, if the goal is to stay ahead in the “AI arms race”
@Tunadorable
@Tunadorable 13 күн бұрын
interesting
@phpn99
@phpn99 12 күн бұрын
It's a descriptive model. It has no predictive power.
@RoulDukeGonzo
@RoulDukeGonzo 12 күн бұрын
How does this relate to the whole "measurement creates emergence" thing?
@kimcosmos
@kimcosmos 12 күн бұрын
Is it possible to separate the simple skills by filtering out all data that does not assume those simple skills. ie. filter out the obvious data once it becomes obvious, to avoid repeated;ly reinventing the wheel. It means identify the obvious once it becomes obvious. ie looking for nonobvious or counter intuitive data. Running a prediction filter on (what has become) the obvious. It means testing generalising circuits (4 layers to find, +4 to test) and using them as retrieval heads to filter the data stream. Qstar is relatively compute inefficient but useful with sparse data because of its improved accuracy, and this would be a good use case. Filtered data, less shots. Maybe less parameters after filtering and maybe less layers if fast grokking retrieval heads with 8 layers.
@Tunadorable
@Tunadorable 11 күн бұрын
interesting. recently the fineweb-edu dataset was created as a filtered down version of fineweb where they asked llama 70b whether each document had educational value or not. i imagine that may be a conceptually easier method (albeit potentially more computationally intensive). a question like “is this document relatively mundane, or does it contain unusually rare/complex facts/reasoning?”. alternatively some sort of rating by perplexity or some other quantitative measure might work.
@kimcosmos
@kimcosmos 11 күн бұрын
@@Tunadorable RAG like retrieval heads can use a more focused subset for local learning. Especially few shot sparse data methodical analytics. ie "What am I missing here?" Fineweb extracts data pairs (ER?) with one of its 5 reward prompts for creating artificial data being "- Add another point if the extract addresses certain elements pertinent to education but does not align closely with educational standards. It might mix educational content with non-educational material, offering a superficial overview of potentially useful topics, or presenting information in a disorganized manner and incoherent writing style."
@kimcosmos
@kimcosmos 11 күн бұрын
​@@Tunadorable "Add another point if the extract addresses certain elements pertinent to education but does not align closely with educational standards. It might mix educational content with non-educational material, offering a superficial overview of potentially useful topics, or presenting information in a disorganized manner and incoherent writing style.". 1 out of 5 points in their artificial generator prompt. Its not using Q* to find optimum paths. Fineweb is getting the low hanging fruit. Q* shakes the tree and is good for ticker feeds
@sikunowlol
@sikunowlol 13 күн бұрын
oi
@RoulDukeGonzo
@RoulDukeGonzo 12 күн бұрын
From the comments i think i got the answer, but just to clarify, this is theoretical right? Why would skill data be so uniform on real skills.
@Tunadorable
@Tunadorable 12 күн бұрын
yes it's theoretical. on real skills it's likely not as uniform but very possible the general theme still holds true in aggregate. The idea that some skills are common while rare skills are very very very (orders of magnitude or exponentially more) rare seems reasonable; if anything the alternative would be that rare skills are only slightly more (geometric? linearly?) rare would be a good thing. However so far the fact that we've had to increase LLM training compute by orders of magnitude in order to get linear returns on benchmarks would imply the former
@JehovahsaysNetworth
@JehovahsaysNetworth 13 күн бұрын
ChatGPT can’t write PHP like I showed it how to. I tried and it failed to understand. If you know a better bot to try out direct me to one to choose to work with.
@SoFukinDope24
@SoFukinDope24 13 күн бұрын
easy solution: use anthropic
@JehovahsaysNetworth
@JehovahsaysNetworth 13 күн бұрын
@@SoFukinDope24 I will search for it and try it thanks
@RoulDukeGonzo
@RoulDukeGonzo 12 күн бұрын
Easier solution, learn python
@JehovahsaysNetworth
@JehovahsaysNetworth 12 күн бұрын
@@RoulDukeGonzo I know some python I used to use a piwiki bot on my mediawiki
@ricosrealm
@ricosrealm 12 күн бұрын
Claude is the best for coding.
@jacksaunders1929
@jacksaunders1929 12 күн бұрын
Have you thought about doing a PhD?
@Tunadorable
@Tunadorable 12 күн бұрын
oof during undergrad I considered doing it one in economics but back then after going through through the legit publication process, talking with professors, looking at the way the system works, etc it sounded more restrictive than freeing. considered it again when I decided I wanted to pivot into AI but I was blessed to chance upon a short conversation with Paul Christiano and he told me it wasn't necessary for this field, just self-publish then go work at a company. Rn I'm hoping I can become self-sufficient off KZbin and do a combo of research & science education without any boss/restrictions
@danielmartinmonge4054
@danielmartinmonge4054 9 күн бұрын
This paper seems to miss the point about emergent capabilities. From my understanding, the model is learning to solve a specific problem only because it appears in the dataset and is solved in an exact way. The more frequently this exact problem appears, the faster the model learns it.However, true logic, abstraction, and understanding are about finding broader connections between concepts and solving new problems that are not present in the dataset. My intuition suggests that this approach is not suitable for learning natural language. Human knowledge cannot be reduced to a finite set of easily solvable problems. This method overlooks the critical strength of large language models: symbolic abstraction, where specific problems are merely examples of broader categories.It seems to me that the paper fails to address the core aspects of these new architectures. It applies mathematical models designed for narrow, purpose-specific AI rather than for this broader kind of intelligence.
@waveFunction25
@waveFunction25 13 күн бұрын
Oi
@GNARGNARHEAD
@GNARGNARHEAD 12 күн бұрын
oi, this is a comment
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 610 М.
Quest To Find The Largest Number
11:43
CodeParade
Рет қаралды 175 М.
마시멜로우로 체감되는 요즘 물가
00:20
진영민yeongmin
Рет қаралды 23 МЛН
تجربة أغرب توصيلة شحن ضد القطع تماما
00:56
صدام العزي
Рет қаралды 51 МЛН
🤔Какой Орган самый длинный ? #shorts
00:42
Clowns abuse children#Short #Officer Rabbit #angel
00:51
兔子警官
Рет қаралды 24 МЛН
Where do particles come from? - Sixty Symbols
25:34
Sixty Symbols
Рет қаралды 165 М.
Biggest Breakthroughs in Math: 2023
19:12
Quanta Magazine
Рет қаралды 1,7 МЛН
Knowing the one-way speed of light
23:50
Huygens Optics
Рет қаралды 58 М.
Exploring Learning Dynamics in Concept Space
30:02
Tunadorable
Рет қаралды 2 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 231 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 328 М.
The Man Who Solved the World’s Hardest Math Problem
11:14
Newsthink
Рет қаралды 626 М.
The End Of Jr Engineers
30:58
ThePrimeTime
Рет қаралды 331 М.
Bill Gates Reveals Superhuman AI Prediction
57:18
Next Big Idea Club
Рет қаралды 142 М.
마시멜로우로 체감되는 요즘 물가
00:20
진영민yeongmin
Рет қаралды 23 МЛН