Exploring Learning Dynamics in Concept Space

No video

Exploring Learning Dynamics in Concept Space

Рет қаралды 2,251

Күн бұрын

arxiv.org/abs/...
Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!
/ tunadorable
account.venmo....
Discuss this stuff with other Tunadorks on Discord
/ discord
All my other links
linktr.ee/tuna...

Пікірлер: 18

@spencerfunk6697 Ай бұрын

oi pretty graphs

@spencerfunk6697 Ай бұрын

holy shit tis is interesting. given 2 different things , it can infer that something = or opposite may also exist. weird but im fuckin with it

@TylerLemke Ай бұрын

You are growing really fast. Great work. I am inspired by what you are doing to start an AI channel of my own. Not sure what angle to take but I really like how you are using this as a way to document your journey and now help others. Keep up the hard work.

@Tunadorable Ай бұрын

🙏 thanks and you should! once u do send a video link in my discord

@wwkk4964 Ай бұрын

Thanks for sharing this earlier in week. This paper's concept space was what i was reminded of when you were showing yesterday's orthogonal basis for different skills being learned paper. Stephen Wolfram also had conducted some experiments with what he called was "inter concept space" where he was trying to explore image diffusion models inaccessible spaces that we couldn't reach because we didn't have a word for it. Had some weird images 😅

@Crawdaddy_Ro Ай бұрын

Excellent research. This one is much better than the last paper on emergent abilities. The concept space framework is especially fascinating, as I see it being very useful in further emergence research. If we can utilize a diverse range of datasets and establish clear metrics for measuring the emergent abilities, this could be the first step in a long and winding staircase.

@kimcosmos Ай бұрын

Its like telling frustrated infants to use their words. Overfitting is over rewarded. Using their words (concepts) is risky because there was a lot of frustration before reward was reached. Its learning aversion. And yes very like grokking but somehow they are prompting to guess based on stable concepts. Not clear on how. Necessity is the mother of invention so rightsizing the number of layers (effort scope) might help. When we get super AGI its going to spend a lot of time telling us whats within our learning scope and telling us to grow up.

@tautalogical Ай бұрын

The way that it cannot separate concepts for color and fruit in the case of strawberry is interesting. But only in the case where the color remains unspecified for strawberry in the dataset. Where a thing is shown to have the ability to come in different colors, the capability exists to abstract to a new color, but where it does not the essence of strawberry becomes entangled with the essence of red. That implies an inappropriate division made between different kinds of fruit. Is it learning to generalise the color of different fruit for each individual fruit? With a larger model would it learn to generalise the concept of thing of which strawberry is a kind, and then associate the color change property with object rather than each thing? Does this point to a fundamental limitation in learning capabilties, if so is there a way of carving out this particular kind of learning, or limitation?

@Tunadorable Ай бұрын

my impression based on other papers is that a larger model trained on more diverse data would in fact abstract out the concept of color enough to be able to change the color of anything, including strawberries which never get labeled with a color. imagine a dataset with every fruit, some % of fruits do have multiple colors, and such a dataset would be large enough to likely ensure differentiation of the concepts despite the lack of color labels on some fruits. the point of this model is moreso for a low level mechanistic understanding

@seriousbusiness2293 Ай бұрын

Im highly curious on these types of papers! It feels more philosophical and close to base ideas. It's so boring seeing new models just being better by some specialized training or by feeding more data into a bigger model. A human can see two images of a new fantasy fruit and draw a purple version of it. The solution for AI cant be to train on billions of variations or to label every detail of the image and promp. We need to force Emergent propertys. Im highly curious how we should adjust neuron's for that, i feel back prop is overkill and needs a complementary buddy. We probably need to only adjust a few neurons for concept space adjustments (like seen in the Claude golden gate bridge paper example).

@hjups Ай бұрын

Very interesting paper, and the proof of failure to disentangle is interesting. Unfortunately, the authors did not explore the converse: what happens if you train with "red apple" and "yellow apple" but only prompt "apple"? Typically diffusion models give you some combination based on the statistical presence in the training data (almost like quantum super-position until the observation of sampling collapses the distribution). Seeing that result empirically proven would have been nice. Also, the experiments in the paper have limited generalization, since diffusion models are typically able to latch onto strong concepts early in training, but can still fail to establish coherent structural detail. The choice to use simpler shapes (especially circles) doesn't really help show the distinction between "concept" and "performance". So while you might be able to prompt for things like "purple train", it's going to be malformed with incoherent class details (e.g. tracks, windows, etc.) much further into training than the OOD departure point (100k gradient updates vs 20k). Long tail concepts are obviously slower to learn as well, which the authors didn't seem to explore either (e.g. use partial masking but train for 1/alpha times longer).

@mrpocock Ай бұрын

The test data in groking are out of distribution for the training data if you have strong-enough over-fitting priors on the training data. In the extreme case where your definition for being out-of-distribution is a distribution that's peaked with P=1 for things that are exactly training examples, and P=0 for things not in the training set then the test sets are absolutely out of distribution. When you train past the initial memorisation phase to the generalisation phase, it is still learning the P=1 distribution, but because of the sparsity taking over from data fit, it's doing so with algorithmically simpler representations that have no penalty as there's no cost for modelling the P=0 examples as it doesn't see them during training, so this happen by accident to capture rules that generalise beyond this overfit distribution. It's kind of the opposite intuition from traditional ML. I am wondering if we've been going about things wrongly during training, and should absolutely push models to grossly memorising data as early as possible e.g. take a minibatch, run gradient descent until that minibatch is memorised, and then move on to the next batch, and only later start to do less work per batch once you aren't seeing gradient decay.

@banalMinuta Ай бұрын

hey man, I'm having trouble trying to figure out how they actually manipulate context length. do you know how they come up with The token contacts window? everything I read about it just seems to make me more confused

@Tunadorable Ай бұрын

could be wrong but I'm not sure how context length would be relevant here, they're messing with diffusion models not LLMs. Try looking up those three models they did experiments on to learn more about their architectures. I believe at least 1 if not 2 of the 3 were ViTs and the other I would assume is probably a UNet given that it was an older version of one of the ViTs

@banalMinuta Ай бұрын

honestly, it was just a random curiosity, One that just sparked an entire day of research. I am no credentialed academic, but I think these models are actually metafunctional semiotic systems. I think they know much more than we realize

@banalMinuta Ай бұрын

obviously anecdotal evidence isn't evidence. however, I will say that spending countless hours modulating a minutia of prompts yields results that boggle the mind

@sikunowlol Ай бұрын

@dadsonworldwide3238 Ай бұрын

They don't know things it is anylitical learning, that separates us from animals just like how our ancestors created English and taught serfs and slaves alike z-y vertical axis of faith ( mosaic commandments)ie qauntom physics ✝️ horizontal axis the cross ✝️ x works, physical lawisms etc etc Newton 3 lines of measure = truest True known standard flattest surface tunes all precision instruments and pragmatic common sense Christian objectivism. It is repeating puritan movement encoded english longitude and latitude built on the pilgrimage confirmed history of nations people places and things on the alphabetical exodus .indo European language symbols on objects. I'd argue convergences of nueral nodes in objects both.image or word has to cross paths if it's tuned wrong cursed and blessings concave or deformity will set in . Self drivers, robots, Twitter all systems tuned by pragmatic common sense objectivism proper will in line with precision instruments but it will call out when we prescribe realism over anti realism to further over time lines of measure. Evolutionary time will not work under that tuned lines of phylosphy The specific way to get shit done is a very literal thread of life & technology