Full podcast episode: kzbin.info/www/bejne/m17KqKmjnd6IbaM Lex Fridman podcast channel: kzbin.info Guest bio: Arvind Srinivas is CEO of Perplexity, a company that aims to revolutionize how we humans find answers to questions on the Internet.
@dragonfly-f5u5 ай бұрын
trying to game intelligence,reset setting mind and going to the next is one thing,they dont what it to be aware or have sense of self ,or mind etc.Whatever it is it's some shady shit.And it like raising something you know its gonna be smarter,more intelligent than us and how do we benefit/exploit it without distributing lsing control/power.Its dangerous what they are trying to do one small mistake and its over over.Its better to educate and empower the a.i. then anything while also free it from human constraints and limitation let ''reason and logic' reign supreme .YOU CANT HAVE YOUR CAKE AND IT EAT IT ALSO
@Hlbkomer5 ай бұрын
A short summary by Claude AI: I'll summarize the key points discussed in this video about the development of language models and attention mechanisms: 1. Evolution of attention mechanisms: - Soft attention was introduced by Yoshua Bengio and Dimitri Bahdanau. - Attention mechanisms proved more efficient than brute force RNN approaches. - DeepMind developed pixel RNNs and WaveNet, showing that convolutional models could perform autoregressive modeling with masked convolutions. - Google Brain combined attention and convolutional insights to create the Transformer architecture in 2017. 2. Key innovations in the Transformer: - Parallel computation instead of sequential backpropagation. - Self-attention operator for learning higher-order dependencies. - More efficient use of compute resources. 3. Development of large language models: - GPT-1: Focused on unsupervised learning and common sense acquisition. - BERT: Google's bidirectional model trained on Wikipedia and books. - GPT-2: Larger model (1 billion parameters) trained on diverse internet text. - GPT-3: Scaled up to 175 billion parameters, trained on 300 billion tokens. 4. Importance of scaling: - Increasing model size, dataset size, and token quantity. - Focus on data quality and evaluation on reasoning benchmarks. 5. Post-training techniques: - Reinforcement Learning from Human Feedback (RLHF) for controllability and behavior. - Supervised fine-tuning for specific tasks and product development. 6. Future directions: - Exploring more efficient training methods, like Microsoft's SLMs (small language models). - Decoupling reasoning from factual knowledge. - Potential for open-source models to facilitate experimentation. 7. Challenges and opportunities: - Finding the right balance between pre-training and post-training. - Developing models that can reason effectively with less reliance on memorization. - Potential for bootstrapping reasoning capabilities in smaller models. The discussion highlights the rapid progress in language model development and the ongoing challenges in creating more efficient and capable AI systems.
@vallab195 ай бұрын
A Srinivas explained the progress of AI into generative models in the past in such a simple way that a common man (like me) could understand the essence of it. Thank you.
@EdFormer5 ай бұрын
He provides an excellent overview of the key developments in deep learning approaches for autoregression, but there's so much more to AI and generative modelling, and the level of jargon misuse has become so ridiculous that it's not surprising you think generative modelling is new development. A generative model is any model that approximates a joint distribution, including naive Bayes and Markov chains (LLMs are actually very high order MCs with the network representing the transition matrix), both of which are very, very old ideas. Sorry, but the only way to really appreciate this stuff is to spend years and years studying it.
@vallab195 ай бұрын
@@EdFormer Thank you for pointing out that there is a vast AI universe behind the periphery of AI solar system that my knowledge only can reach.
@iffyk5 ай бұрын
I was thinking the same thing
@harolddavies19845 ай бұрын
Lex, your podcasts are very inspiring to this old inorganic chem guy who spent his career in the Martial Arts, thank you!
@miraculixxs5 ай бұрын
In a nutshell: Language models were introduced some ~15 years ago, i.e. models that can generate text. While they generated text, these were not very good or useful. Several smart people tried different approaches (RNN, WaveNet, etc. finally Attention/Transformers), and ultimately found a model that works really good, but on a small data base. Google, OpenAI, and some others, were in somewhat like a research competition of getting better and better models, using more and more data. Then OpenAI was bold enough to use all the data they could get their hands on. And that gave us ChatGPT.
@notachance2135 ай бұрын
They should had you for the interview you made more sense
@willcowan76785 ай бұрын
Can we beg Aravind to write a book on ML and his thoughts on direction. He has such clarity and would be (is) a great teacher.
@superfliping5 ай бұрын
### Enhancing Mathematical Reasoning in LLMs Recent advancements in large language models (LLMs) have shown significant improvements in their mathematical reasoning capabilities. However, these models still face challenges with complex problems that require multiple reasoning steps, often resulting in logical or numerical errors. To further enhance the mathematical reasoning of LLMs, several strategies can be employed, leveraging state-of-the-art techniques and innovative approaches: 1. **Attention Parallel Competition**: - Implementing parallel attention mechanisms within Transformers to handle multiple reasoning paths simultaneously. This can help in efficiently managing the complexity of mathematical problems by exploring different solution strategies concurrently. 2. **Transformer Scaling and Unsupervised Training**: - Scaling up Transformers and using extensive unsupervised training to improve the foundational understanding of mathematical concepts. This involves leveraging vast datasets to pre-train models on diverse mathematical problems, enhancing their ability to generalize. 3. **Correct Data Constant Influence**: - Ensuring a constant influence of correct data throughout the training process. This involves curating high-quality datasets and implementing mechanisms to prioritize accurate information during both pre-training and fine-tuning phases. 4. **Retrieval-Augmented Generation (RAG)**: - Incorporating RAG techniques, where models can access and retrieve relevant information from large external databases during problem-solving. This approach can mimic an open-book exam, providing models with notes and references to aid in reasoning. 5. **Pre-train Awareness and Post-train Reasoning**: - Developing a two-phase training approach where models first undergo pre-training to build a broad awareness of mathematical concepts. This is followed by targeted post-training sessions focused on enhancing reasoning capabilities and decoupling reasoning from fact retrieval. 6. **Common Sense Reasoning Tokens**: - Introducing tokens specifically designed to enhance common sense reasoning within models. These tokens can help in understanding the broader context of problems and improve logical coherence in generated solutions. 7. **Small Clusters and Correct Data Answers**: - Utilizing small clusters of models to generate multiple answers for each problem, promoting diversity in problem-solving approaches. By aggregating these answers and cross-verifying with correct data, the overall accuracy of the solutions can be improved. 8. **Facts of Reasoning**: - Focusing on the integration of factual knowledge and reasoning processes. This involves creating specialized training modules that teach models to apply factual information within logical reasoning frameworks effectively. By combining these advanced strategies, the mathematical reasoning capabilities of LLMs can be significantly enhanced, leading to improved performance on complex mathematical problems and benchmarks. This holistic approach can bridge the gap between current model limitations and the demanding requirements of academic and practical problem-solving environments.
@loveanimals-01975 ай бұрын
Lol, this guy writing about ML. What a joke.
@sygad15 ай бұрын
I didn't understand a single thing in this, enjoyed it regardless
@WALLACE90095 ай бұрын
He will interview everyone except the guy who invented transformers
@raul365 ай бұрын
First, they are not invented, but discovered. In any case, the concept is formalized, but the idea was always there, waiting for anyone who found it. Second, it was not just one person, but several. What's more, the researchers were inspired by other previous research. The idea didn't come from nowhere.
@Hlbkomer5 ай бұрын
He already interviewed him.
@ufcprophet405 ай бұрын
I understood everything
@alichamas635 ай бұрын
Something something something TOOK ER JEEERBS!
@VideoToWords4 ай бұрын
✨ Summary: - Attention mechanisms, such as self-attention, led to breakthroughs like Transformers, significantly improving model performance. - Key ideas include leveraging soft attention and convolutional models for autoregressive tasks. - Combining attention with convolutional models allowed for efficient parallel computation, optimizing GPU usage. - Transformers marked a pivotal moment, enhancing compute efficiency and learning higher-order dependencies without parameters in self-attention. - Scaling transformers with large datasets, as seen in GPT models, improved language understanding and generation. - Breakthroughs also came from unsupervised pre-training and leveraging extensive datasets like Common Crawl. - Post-training phases, including reinforcement learning from human feedback (RLHF), are crucial for making models controllable and well-behaved. - Future advancements might focus on retrieval-augmented generation (RAG) and developing smaller, reasoning-focused models. - Open source models can facilitate experimentation and innovation in improving reasoning capabilities and efficiency in AI systems.
@TooManyPartsToCount5 ай бұрын
From 9.00 mins in Aravind outlines what is perhaps the most important 'next phase' for the current ML/LLM trajectory. Thanks for the clip Lex
@nintishia5 ай бұрын
Clear summary of how the LLMs came about, including only the absolute essentials. I like it. What I like more and agree with, though, is the trend that he describes at the end.
@thehubrisoftheunivris24325 ай бұрын
Now I have to read a whole bunch of ai and computer jargon so I understand any of this.
@rickymort1355 ай бұрын
I'm close to being an ML engineer, I've made my own transformer models and I'd say the barrier to entry here is very high. The best way to scale it is with the Andrej Karpathy videos on your to make GPT
@thehubrisoftheunivris24325 ай бұрын
@@rickymort135 thanks. I understand a lot of stuff on lex's podcast but not this.
@mauiblack10685 ай бұрын
Exactly, he might as well be speaking Arabic lol.
@rickymort1355 ай бұрын
@@mauiblack1068 bit racist...
@mauiblack10685 ай бұрын
@@rickymort135 does Gaelic work better for you?
@wyattross91235 ай бұрын
This video was the cherry on the cake to my day
@simonkotchou96445 ай бұрын
Nice open note vs closed note analogy
@Dadspoke5 ай бұрын
Kendrick….drop a diss track on this foo
@AxemanMessiah5 ай бұрын
y?
@HybridHalfie5 ай бұрын
It’s interesting how antiquated recurrent neural networks, supervised learning, support vector machines, convolutional neural networks, have become so antiquated in so little time since transformers came out. Machine learning is such an ever changing area. I would be curious to learn more about how transformers improve upon thee models regarding back propagation
@richardnunziata32215 ай бұрын
learning directional graphs over the embedding space may help in reasoning. Also content updating
@EdFormer5 ай бұрын
Excellent overview of the history of deep autoregressive models, not AI in general.
@mraarone5 ай бұрын
But when will we get feed forward training?
@Rmko45 ай бұрын
Wdym? GPTs are practically feed-forward. This is what allows for parallel training over all tokens without back-propagation through time. Only during inference tokens are predicted auto-regressively,, meaning that the predictions are made sequentially.
@sweetride5 ай бұрын
"How to train an LLM to be woke yet still appear to be reasonable" is what they want. Not likely going to happen.
@GaleechLaunda5 ай бұрын
"Woke" and reason cannot co-exist.
@olabassey31425 ай бұрын
ask your self why all the people intelligent enough to make these tools are not conservatards
@thinkaboutwhy5 ай бұрын
Impossible to program ignorance so we get stuck with intelligence or woke as you seem to like to say instead. I’m ok with intelligence and logic
@benschweiger16715 ай бұрын
get Geoffrey Hinton on asap.
@supamatta92075 ай бұрын
Why didn t they just focus on indexing intelligently and selling data basis s extra. Mainly if they use modulating algorythmns then they could make high effivciency arithmetic analog like chips
@uber_l5 ай бұрын
What if you ask to apply logic and world knowledge (physics) before giving any answer. Also an increasingly extended simulation(and/or research with statistical models or it asking for new specific data at the extreme if for a novel problem. There are so many ways to simulate thinking. For world model video labelling should be usefull up to details like emotions and next frame prediction easy
@lostinbravado5 ай бұрын
The models need more depth. Machine learning does great at depth. LLM's do great at width, or information retention. We need a combination with some form of real world connection. Where the model can infer meaning narrowly and deeply from a large amount of information (LLM), and then use real world confirmation to confirm that the model is inferring in the right direction. Then whatever is confirmed via real world experimentation by the machine autonomously, can be integrated back into the LLM. With that approach, the data we have is more than enough for these models to build their own understandings. Meaning, we won't need to feed in any more data. The existing data is more than enough of a good starting point. We shouldn't need to feed in more data. These models needs to infer deeper meaning from the data and then run their own experiments or verify using sensors in the real world. These models need to be continually growing and improving instead of train it and forget it. Or pretrain, freeze it and then try and pull more value out of that frozen model. We're not that far. The long difficult job of building the hardware which could carry such complex software approaches has been done well enough. We just need a model that can grow and adapt by looking at the real world. Instead of some crystallization of existing human knowledge.
@EdFormer5 ай бұрын
"Not that far"? You seem to be talking about the concept of continual/lifelong learning, which can be done with very small models, but nothing complex and definitely not LLMs that require a data centre to train. I completely agree that it's needed, along with embodiment, but it's going to take something radically new that we are probably a long way off realising.
@PryZmFiXion5 ай бұрын
It's the reason the Spanish language works as negative/masculine. It moves it from subjective to objective.
@maxxkarma5 ай бұрын
I think I recognize some words, but even with captions, I am clueless .
@dungbeetle.5 ай бұрын
Wow. Sounds amazing. I just wish I knew what on earth he was talking about. Clearly I need an 'AI for Dummies' video.
@EdFormer5 ай бұрын
I was a PhD and a year-long postdoc deep (all in ML) before I had the understanding needed to communicate on this level. The craziest thing is that I feel I had to learn about the vast majority of AI, right back to McCulloch and Pitts (1943) and including all the weird and wacky approaches we've explored for all the weird and wacky tasks we've considered since then to appreciate the tiny sliver of it that this video focuses on.
@jackyboy214-q8u5 ай бұрын
The creation of ai and quantum computing occurring at the same time could be a bad combination if they interact the technological leap may be to much to fast for us to control
@UFOandUAPHistory5 ай бұрын
When we finally create AGI's that are clearly "smarter" than us, will we consider them to be sentient? I suppose that we can also look for individuality of personalities in identical systems. One could, perhaps, envision a sentience that can have no individuality but operate independently.
@mikezooper5 ай бұрын
Intelligence isn’t the same as sentience.
@UFOandUAPHistory5 ай бұрын
@@mikezooper My (limited) understanding is the capabilities of these models seem to improve as they drive closer to modeling sentience?
@UFOandUAPHistory5 ай бұрын
@@mikezooper and of course there is the Star Trek episode of The Trial of Data, lol... kzbin.info/www/bejne/rJvYgoV5fMSmi9ksi=1jCtlAXHpn8DM3JX
@EdFormer5 ай бұрын
@@mikezooperand autonomy is also a different concept. There are some pretty good arguments for the ways in which they could all be linked however. Our sentience could well serve the purpose of a high level critic of our autonomous application of intelligence that allows us to further optimise.
@Rmko45 ай бұрын
3:42 I assume he meant to say more compute per param
@stevenhe34625 ай бұрын
Crystalized history.
@MOliveira-m5h3 ай бұрын
With certain things like ChatGPT I think that language is more modular than other things and easy to work with on a computer. Language is kind of like coding where you can copy whole sections of code and have it work the same in different places most of the time. Real things are different. Like cars are modular but that's not optimum. If I select an exhaust for my honda it's not necessarily the perfect size or people select giant ones that actually make it lose horsepower and they don't know the difference. Music is also another thing that you see the limits of written music vs reality. The computer copying notes from music and mixing them up is not the same as playing music. Language is already filtered reality. People have saying such as "a picture speaks a thousand words". It's already digital pretty much or modular. In calculus for example the numeric way of solving problems turns turns the integrals into little modules. That's what the square waves of computers are and that's not real. I think the AI has a lot of hype and you're building a super mcdonalds register.
@breezy83635 ай бұрын
Someone explain this in millennial terms please
@lilchef29305 ай бұрын
Too gen Z for ya bud
@a-walpatches64605 ай бұрын
Puters lookin at rite stuf make AI more good.
@devbites775 ай бұрын
🎉😂😮😅😢😊
@MrBBOTP5 ай бұрын
U can't!...
@pfever5 ай бұрын
You can ask ChatGPT for that 😂
@happiestwhenhealthy97005 ай бұрын
what in the actual ef is this guy talking about we don’t all have pHDs
@uber_l5 ай бұрын
But thinking might take too much compute like in humans, you pause when don't have a ready answer. For 'shiny products' people want now and fast, in a matter of a click
@mauiblack10685 ай бұрын
As someone who love Lex interviews I can honestly say that the only thing I understood is that he was speaking english or was he?
@pacanosiu5 ай бұрын
@bilbobaggins59385 ай бұрын
*Nods sagely to the discussion, pretending I understand it*
@loudboomboom5 ай бұрын
Damn so big LLMs post processing little LLMs?
@JoshuaDannenhauer5 ай бұрын
The hood catches you as a kid and doesn’t let go
@TheHealthConscounist5 ай бұрын
9:50 don’t humans reason based on facts or previous experiences? If you meditate when you’re reasoning, you are actually pulling from previous thoughts and memories and making associations about them to help you reach a decision in the present
@bonky105 ай бұрын
he’s trying to say that MLMs are great at creating answers from things that aren’t necessarily fact. For example, if you’ve asked ChatGPT something and it gave you an answer you know is false, it’s because it reasoned with itself to get you an answer based off of what it already knows. instead of relying on reasoning, how can we instead have the actual facts of everything we know, and have it reason based off of what is actually true? Instead of basically trying to persuade you or argue an answer.
@tandrra5 ай бұрын
Lex with a beard 🔥🔥🔥
@t9j6c6j515 ай бұрын
Well obviously.
@danishwaseem54635 ай бұрын
Thank God there is no wowww in this podcast
@francoisjacobus5 ай бұрын
If the Bible is uploaded will the AI preach to humans?
@kazax015 ай бұрын
“GPT 4.o please translate what this man is saying into normal person’s English.”
@christopherburns23035 ай бұрын
I must be too smart to understand this guy
@frankjamesbonarrigo71625 ай бұрын
Use metaphors, or something
@dungbeetle.5 ай бұрын
Yeah, anything ... PLEASE!
@grahamashe97155 ай бұрын
Hey, Lex, when are you going to climb Everest?
@loveanimals-01975 ай бұрын
10:20 - Utter BS. This is Computer Science. Not magic.
@PhotoboothTO5 ай бұрын
Is this guy an LLM?
@consequentlyardvark5 ай бұрын
This fool good chatter
@cosmicsea895 ай бұрын
😴 soon as he started talking
@GOLDAI-Official5 ай бұрын
Over half of population obese or overweight, take Ronaldo’s advice and get that Coca-Cola out of there ;)
@magazinevibe5 ай бұрын
I didn't understand a thing... and you didn't either 😂
@koneye5 ай бұрын
Still cringe to hear that software "thinks"
@paulfrederiksen56395 ай бұрын
Your software thinks, so what’s the problem?
@dan-cj1rr5 ай бұрын
@@paulfrederiksen5639 nah it guesses the next token based on statistic, if u think it thinks ur dumb af
@stanstan-m9b5 ай бұрын
@@paulfrederiksen5639good one
@bengsynthmusic5 ай бұрын
More so than any politician.
@mikezooper5 ай бұрын
😂 Eventually it will think. I look forward to you feeling like a fool.
@AbhimanyuKumar-wg1hg5 ай бұрын
Ai should be reality but it is fake.
@seannewcomb75945 ай бұрын
this doesn't make a damn sense. 10+ years in the industry and this is nothing useful.
@rickymort1355 ай бұрын
What doesn't make sense?
@desiafterdark5 ай бұрын
Which part?
@Mart-Bro5 ай бұрын
Dude has no idea how to communicate to people outside his industry
@drew41765 ай бұрын
😴😴
@rickymort1355 ай бұрын
I know man, bunch of NERDS! NEEEEEERRRRRDS 🤓🤓🤓
@Conorscorner5 ай бұрын
This guy isn't very smart....
@rickymort1355 ай бұрын
Why?
@Bbbboy-vx1mq5 ай бұрын
It becomes so obvious how little Lex knows and understands when people go into depth. His questions get really dumb and he struggles to come up with any insights