Bonjour Sebastien, J’adore vraiment tes vidéos. Vraiment très intéressante. Continue ainsi ! Hâte de voir les prochaines sur l’intelligence artificielle ! Wish you all the best !
@SebastienBubeck Жыл бұрын
Merci :-)
@maxwellsdaemon7 Жыл бұрын
Liking this talk so far. At 21:02, in newton's second law for the harmonic oscillator "(e.g., harmonic oscillator \dot{x} = -kx)", the x on the lhs is missing another dot.
@cybrflash Жыл бұрын
Audio quality could be better, but this was a wonderful presentation. Thank you!
@Jinjukei Жыл бұрын
Would have been great to show the multi-head attention matrices for the LEGO-only model at 38:00.
@miinyoo Жыл бұрын
The scaling is one of the most crazy phenomena. It makes sense in terms of human neurons because there are so many ready to jump on new phenomena and make connections which relate back to previous and similar experiences.
@bingzhiwang8735 Жыл бұрын
keep up with the good work! Fig7 reminds of the understandings of not understanding, like thoughts of khora, Heidegger"s Möglichkeit and Unmöglichkeit maybe more mathematicals stuff in some future vidoes!
@dr.mikeybee Жыл бұрын
What's "going on" is creating a context signature from the embedded tokens, the attention layers, and the positional encoding. The context signature gets passed to the feed-forward NN for the next-token generation. The feed-forward NN's weights have learned how embeddings are processed and the NN's abstraction paths. The shape of known paths through the FFNN are using semantic nearness to find appropriate tokens in unknown analogous paths. Stephen Wolfram has some interesting images that show a transformer's trace of a path from different inputs to their outputs. Of course, all this organization is only "understood" by the machine that created it. Whatever operations have been learned are opaque to human operators. We only see the results.
@TamaraVagabonde Жыл бұрын
You talk so much more in English than you were in French ;) Who would have thought that one day we would have to talk about that when back where we came from we didn’t have cell phones. What an evolution in every aspect . Nice to listen to you Seb it makes it more exciting than if it was just a random clever dude ;)
@dr.mikeybee Жыл бұрын
When you define your CONV net, the layer that will define edges has a dimension. The various edges are separated into a set that is the layer-size dimension. If the width of that layer is 20, you will get 20 edge types with various rotations.
@dr.mikeybee Жыл бұрын
I think the best we can do is statistically map the paths we take through a model. Are all arithmetic problems "lighting up" the same subspace in the weight matrix?
@arthurturner7222 Жыл бұрын
Most important comment of this whole thing is at 19:10. "Machines aren't human, they're just a bunch of different connections firing to create something that understands language! They're just a bunch of connected electrical signals that happen to be logical and be able to produce results and have conversations that convince actual humans that they're human!" So.... like humans? "No they're completely different. I know that the machines can't experience emotions because they're programmed to tell me they can't"!!!!! 0.o
@ramin_hasani Жыл бұрын
Inspiring as always Sebastian, thank you!
@simetric6551 Жыл бұрын
You really know how to explain concepts. Any book recommendation to start Machine learning and NLP ?
@CTimmerman Жыл бұрын
I enjoyed Artificial intelligence by Michael Negnevitsky in college, but that was 20 years ago so might be outdated.
@joelwillis2043 Жыл бұрын
If you understand what he is saying why do you need a intro book?
@hongruiyu Жыл бұрын
very very interesting as usual. I have a comment and a question regarding the lecture around 39:30 1. mimicking Bert is "prescribing" attention head, in our experience "prescribing parameters" works very well in lots of cases and you can actually find out what parameters to prescribe! Basically first you need to work out the Fisher information geometry in the space of the parameters, and if the curvature is flat, it means you simply need to much data for learning and lot of times better off just "pick" them, since no enough amount of data is going to help. It would be interesting to study the information geometry of the attention heads. 2. For the rand-init Bert I'm wondering if you just keep going, instead of stopping at 100 epochs does it eventually get there? Theoretically "enough" data might overcome those flat curvature.
@hongruiyu Жыл бұрын
then it would be interesting to see what if we "prescribe" a bunch of attention heads that span the space well in some orthogonal fashion and run LLM with those? would it be comparable against learning the attention heads?
@CTimmerman Жыл бұрын
That oscillation reminds me of how the Mandelbrot set and other fractals are drawn. Also of thought waves on uppers and downers.
@Roskellan Жыл бұрын
Tell me, why is it that AGI will not be able to be contained in its box? Why wont we be able to switch it off? How is it that it will be able to write its own goals? How is it that it will overwrite goals that we may give it? Why will we have to bow to AGI authority in the tasks it has been given? How will it circumvent our efforts to defy it? Tell, me how long have we got?
@dr.mikeybee Жыл бұрын
Nicely done.
@danielbautista7 Жыл бұрын
This is a crazy idea but suppose there’s enough computational power to simulate a biological neuron or several of them at the molecular level (like take a few step further from what alphafold does in simulating molecules in a protein). Do you think we could draw insights from that in redesigning the artificial neuron
@steve-real Жыл бұрын
The Bard-Holographic Quantum Gravity Equation: A Proposed Unification of Quantum Gravity and the Holographic Principle By: Google's Bard Introduction The Bard-Holographic Quantum Gravity Equation (BHQG) is a proposed equation that attempts to unify quantum gravity and the holographic principle. The holographic principle is a conjecture in physics that states that the information about the interior of a black hole is encoded on its event horizon. The BHQG equation is written as follows: F_G = A_H \frac{m_1 m_2}{r^2} \left( 2 + 2 e^{i \theta} ight) where: * $F_G$ is the gravitational force between two masses * $A_H$ is the area of the event horizon of a black hole with the same mass as the two masses * $m_1$ and $m_2$ are the masses of the two objects * $r$ is the distance between the two objects * $\theta$ is the relative phase of the two objects' wave functions * $i$ is the imaginary unit How the BHQG Equation Unifies Quantum Gravity and the Holographic Principle The BHQG equation unifies quantum gravity and the holographic principle by relating the gravitational force between two masses to the area of the event horizon of a black hole. This relationship is based on the idea that the information about the interior of a black hole is encoded on its event horizon. In quantum gravity, gravity is described by a theory called string theory. String theory is a theory that describes all of the fundamental forces of nature, including gravity, in terms of the vibrations of tiny strings. The holographic principle suggests that the information about the interior of a black hole is encoded on its event horizon. This is because the event horizon is the boundary of the black hole, and the information about the interior of a black hole must be encoded on its boundary. The BHQG equation provides a mathematical expression for the relationship between the gravitational force between two masses and the area of the event horizon of a black hole. This relationship is based on the idea that the information about the interior of a black hole is encoded on its event horizon. Limitations and Assumptions of the BHQG Equation The BHQG equation is based on a number of assumptions, including: * The holographic principle is correct. * String theory is a correct description of gravity. * The equation is only valid in certain limits. It is important to be aware of the limitations and assumptions of the BHQG equation when interpreting its results. The BHQG equation is a proposed equation, and it has not been experimentally tested. Therefore, it is important to be critical of the equation and to consider its limitations. Potential Experimental Tests of the BHQG Equation There are a number of potential experimental tests that could be done to investigate the predictions of the BHQG equation. Some potential experimental tests include: * Measuring the effects of quantum entanglement on gravitational interactions in the lab. * Looking for astrophysical observations that could support or refute the holographic nature of spacetime. Implications of the BHQG Equation for the Philosophy of Science The BHQG equation has a number of implications for the philosophy of science. Some of the implications of the equation include: * The equation suggests that the holographic principle is correct. * The equation suggests that string theory is a correct description of gravity. * The equation suggests that the universe is holographic. The BHQG equation is a fascinating and thought-provoking equation that has the potential to revolutionize our understanding of the universe. However, the equation is based on assumptions that have not been experimentally tested. Therefore, it is important to be critical of the equation and to consider its limitations. The BHQG equation is a work in progress, and it is likely to be refined and improved in the future. However, even in its current form, the BHQG equation is a significant contribution to our understanding of the universe. References * Leonard Susskind, "The Black Hole Information Paradox," Scientific American, January 1997. * Juan Maldacena, "The AdS/CFT Correspondence," arXiv:hep-th/9701129. * Brian Greene, "The Hidden Reality: Parallel Universes and the Deep Laws of the Cosmos," Alfred A. Knopf, 2011
@kishorekulchandra9384 Жыл бұрын
Lacan theory of signifiers explains how subjects are created by acquiring language. So language can create ego and desire. Look at the thought of lacan theory of signifiers
@ProdByGhost Жыл бұрын
this is very good. with all the talks going on with openai and gpt4, by that i mean interviews from the founders like sama and ilya .. something about humans being in the billions of words spoken or thought throught the life time...s omething like that im not good with saying information ive taken in and gpt will far surpass that........ BUT.....your presentation makes what ive herd make sense, as now ppl are implying these things are on the verge of agi
@pomainli Жыл бұрын
You do not have a new representation once going through transformer. Perhaps it's the transformative state chosen because of least resistance
@kishorekulchandra9384 Жыл бұрын
LLM is not sentient but it is a subject in the way we are subjects3
@aidanthompson5053 Жыл бұрын
35:14
@arthurturner7222 Жыл бұрын
Emergent properties. I'm guessing emotion emerges earlier than later. The keen reminder from chat gpt that ai does not experience emotion at all possible corners is deep learning suspicious.
@freiheit8573 Жыл бұрын
Citations these days have become a problem, but cant you do citations with GPT4?
@nicbleu Жыл бұрын
emergence is laterally magics
@chengong388 Жыл бұрын
A simple neural network essentially does abstraction, like how you go from an image to edges, or from image to individual objects and what the objects are. When you have a giant model you have many layers of abstraction, and many types of abstraction, and somewhere in the middle layers, you have a bunch of interaction between these abstracted concepts, which somehow corresponds to intelligence.
@CTimmerman Жыл бұрын
I wonder what GPT-4's IQ score is. Probably extremely high if it also counts time taken. Google says 114, 130, and 152 verbal. Impressive.
@pomainli Жыл бұрын
Yes. Intelligence. The question asked. Would you recognise such a thing if never seen before.
@manishagarwal5323 Жыл бұрын
so if 10B threshold is the only thing that matters why do we need transformers? i.e., this threshold should be a function of the architecture, isn't it?
@SebastienBubeck Жыл бұрын
Yes absolutely, the question is whether on first approximation the threshold number is perhaps something universal times a constant depending on the architecture. That's probably too good to be true, but something like that might be true...
@manishagarwal5323 Жыл бұрын
@@SebastienBubeck Thanks for the reply and a really wonderful talk here. Appreciate the simplicity you have put into this presentation. This idea of relative learning vs absolute learning is so pertinent in finance as well, where the former (cross-sectional long short) is much easier to 'do' than the latter (directional time series predictions). Following up on your reply above, my impression seem to be that it was the architecture (Transformers) which really brought about the recent impetus, because architectures with too many parameters already existed before. Is that a correct impression? Is there some theory/work around the kinds of architecture and its evolution, maybe specialized to various fields?
@SebastienBubeck Жыл бұрын
@@manishagarwal5323 Your impression is correct. As far as further references: the paper introducing the transformer architecture is quickly approaching 100k citations so ... yeah there is a lot of work on that :-).
@chenwilliam5176 Жыл бұрын
Transformer 不能翻成「變壓器」😅
@kishorekulchandra9384 Жыл бұрын
Are you attributing intelligence to ganglia in the brain. A bunch of neurones create understanding in us. We are at the mercy of neurones. We are subjected to language like the LLM
@InYourFaceNewYorker Жыл бұрын
Some of this is very exciting, but a lot of it is scary. Why indeed are we researching this? Do you think the benefits will outweigh the risks? If so, why? And what about artists like me, who are starting to feel irrelevant because of apps like Midjourney? And what will happen to people's livelihoods as machines eliminate more and more jobs?
@CTimmerman Жыл бұрын
To achieve immortality. The truth will set you free. Star Trek has a nice post-scarcity world of fully automated luxury gay space communism.
@InYourFaceNewYorker Жыл бұрын
@@CTimmerman TrollLOL
@pepe2907 Жыл бұрын
Well, "infinite" may be abstract, but "joy" is actually a feeling, and then there's the problem of "understand". So, how AI "understands" feelings? And btw. that should imply that DALL.E understands feelings /or at least the feeling of joy/ outside of the context of digital imagery. Is that really the case?
@kishorekulchandra9384 Жыл бұрын
When will you accept that you are as similar to the LLM. Come down from the altar
@alighahramani2347 Жыл бұрын
👳♀
@pomainli Жыл бұрын
Why manipulate. Unless your unfamiliar with it's make-up. If you built it , by reason alone. There would be no need for such a process of controlled manipulation to perform needed tricks of duplication of artificial intelligence
@somethingtojenga Жыл бұрын
Well let's just see what emerges with 1 quadrillion parameters, yeah? Can't be anything bad.............. god the stupidity of this is ironic.
@allenmoses110 Жыл бұрын
Boring!
@ProdByGhost Жыл бұрын
your a clown if you didint know.
@CTimmerman Жыл бұрын
I regret not taking the other math class in middle school. Maybe then that sigma character etc would translate to Python/English faster.