Recent breakthroughs in AI: A brief overview | Aravind Srinivas and Lex Fridman

Рет қаралды 85,489

Күн бұрын

Пікірлер: 134

@LexClips 5 ай бұрын

Full podcast episode: kzbin.info/www/bejne/m17KqKmjnd6IbaM Lex Fridman podcast channel: kzbin.info Guest bio: Arvind Srinivas is CEO of Perplexity, a company that aims to revolutionize how we humans find answers to questions on the Internet.

@dragonfly-f5u 5 ай бұрын

trying to game intelligence,reset setting mind and going to the next is one thing,they dont what it to be aware or have sense of self ,or mind etc.Whatever it is it's some shady shit.And it like raising something you know its gonna be smarter,more intelligent than us and how do we benefit/exploit it without distributing lsing control/power.Its dangerous what they are trying to do one small mistake and its over over.Its better to educate and empower the a.i. then anything while also free it from human constraints and limitation let ''reason and logic' reign supreme .YOU CANT HAVE YOUR CAKE AND IT EAT IT ALSO

@Hlbkomer 5 ай бұрын

A short summary by Claude AI: I'll summarize the key points discussed in this video about the development of language models and attention mechanisms: 1. Evolution of attention mechanisms: - Soft attention was introduced by Yoshua Bengio and Dimitri Bahdanau. - Attention mechanisms proved more efficient than brute force RNN approaches. - DeepMind developed pixel RNNs and WaveNet, showing that convolutional models could perform autoregressive modeling with masked convolutions. - Google Brain combined attention and convolutional insights to create the Transformer architecture in 2017. 2. Key innovations in the Transformer: - Parallel computation instead of sequential backpropagation. - Self-attention operator for learning higher-order dependencies. - More efficient use of compute resources. 3. Development of large language models: - GPT-1: Focused on unsupervised learning and common sense acquisition. - BERT: Google's bidirectional model trained on Wikipedia and books. - GPT-2: Larger model (1 billion parameters) trained on diverse internet text. - GPT-3: Scaled up to 175 billion parameters, trained on 300 billion tokens. 4. Importance of scaling: - Increasing model size, dataset size, and token quantity. - Focus on data quality and evaluation on reasoning benchmarks. 5. Post-training techniques: - Reinforcement Learning from Human Feedback (RLHF) for controllability and behavior. - Supervised fine-tuning for specific tasks and product development. 6. Future directions: - Exploring more efficient training methods, like Microsoft's SLMs (small language models). - Decoupling reasoning from factual knowledge. - Potential for open-source models to facilitate experimentation. 7. Challenges and opportunities: - Finding the right balance between pre-training and post-training. - Developing models that can reason effectively with less reliance on memorization. - Potential for bootstrapping reasoning capabilities in smaller models. The discussion highlights the rapid progress in language model development and the ongoing challenges in creating more efficient and capable AI systems.

@vallab19 5 ай бұрын

A Srinivas explained the progress of AI into generative models in the past in such a simple way that a common man (like me) could understand the essence of it. Thank you.

@EdFormer 5 ай бұрын

He provides an excellent overview of the key developments in deep learning approaches for autoregression, but there's so much more to AI and generative modelling, and the level of jargon misuse has become so ridiculous that it's not surprising you think generative modelling is new development. A generative model is any model that approximates a joint distribution, including naive Bayes and Markov chains (LLMs are actually very high order MCs with the network representing the transition matrix), both of which are very, very old ideas. Sorry, but the only way to really appreciate this stuff is to spend years and years studying it.

@vallab19 5 ай бұрын

@@EdFormer Thank you for pointing out that there is a vast AI universe behind the periphery of AI solar system that my knowledge only can reach.

@iffyk 5 ай бұрын

I was thinking the same thing

@harolddavies1984 5 ай бұрын

Lex, your podcasts are very inspiring to this old inorganic chem guy who spent his career in the Martial Arts, thank you!

@miraculixxs 5 ай бұрын

In a nutshell: Language models were introduced some ~15 years ago, i.e. models that can generate text. While they generated text, these were not very good or useful. Several smart people tried different approaches (RNN, WaveNet, etc. finally Attention/Transformers), and ultimately found a model that works really good, but on a small data base. Google, OpenAI, and some others, were in somewhat like a research competition of getting better and better models, using more and more data. Then OpenAI was bold enough to use all the data they could get their hands on. And that gave us ChatGPT.

@notachance213 5 ай бұрын

They should had you for the interview you made more sense

@willcowan7678 5 ай бұрын

Can we beg Aravind to write a book on ML and his thoughts on direction. He has such clarity and would be (is) a great teacher.

@superfliping 5 ай бұрын

### Enhancing Mathematical Reasoning in LLMs Recent advancements in large language models (LLMs) have shown significant improvements in their mathematical reasoning capabilities. However, these models still face challenges with complex problems that require multiple reasoning steps, often resulting in logical or numerical errors. To further enhance the mathematical reasoning of LLMs, several strategies can be employed, leveraging state-of-the-art techniques and innovative approaches: 1. **Attention Parallel Competition**: - Implementing parallel attention mechanisms within Transformers to handle multiple reasoning paths simultaneously. This can help in efficiently managing the complexity of mathematical problems by exploring different solution strategies concurrently. 2. **Transformer Scaling and Unsupervised Training**: - Scaling up Transformers and using extensive unsupervised training to improve the foundational understanding of mathematical concepts. This involves leveraging vast datasets to pre-train models on diverse mathematical problems, enhancing their ability to generalize. 3. **Correct Data Constant Influence**: - Ensuring a constant influence of correct data throughout the training process. This involves curating high-quality datasets and implementing mechanisms to prioritize accurate information during both pre-training and fine-tuning phases. 4. **Retrieval-Augmented Generation (RAG)**: - Incorporating RAG techniques, where models can access and retrieve relevant information from large external databases during problem-solving. This approach can mimic an open-book exam, providing models with notes and references to aid in reasoning. 5. **Pre-train Awareness and Post-train Reasoning**: - Developing a two-phase training approach where models first undergo pre-training to build a broad awareness of mathematical concepts. This is followed by targeted post-training sessions focused on enhancing reasoning capabilities and decoupling reasoning from fact retrieval. 6. **Common Sense Reasoning Tokens**: - Introducing tokens specifically designed to enhance common sense reasoning within models. These tokens can help in understanding the broader context of problems and improve logical coherence in generated solutions. 7. **Small Clusters and Correct Data Answers**: - Utilizing small clusters of models to generate multiple answers for each problem, promoting diversity in problem-solving approaches. By aggregating these answers and cross-verifying with correct data, the overall accuracy of the solutions can be improved. 8. **Facts of Reasoning**: - Focusing on the integration of factual knowledge and reasoning processes. This involves creating specialized training modules that teach models to apply factual information within logical reasoning frameworks effectively. By combining these advanced strategies, the mathematical reasoning capabilities of LLMs can be significantly enhanced, leading to improved performance on complex mathematical problems and benchmarks. This holistic approach can bridge the gap between current model limitations and the demanding requirements of academic and practical problem-solving environments.

@loveanimals-0197 5 ай бұрын

Lol, this guy writing about ML. What a joke.

@sygad1 5 ай бұрын

I didn't understand a single thing in this, enjoyed it regardless

@WALLACE9009 5 ай бұрын

He will interview everyone except the guy who invented transformers

@raul36 5 ай бұрын

First, they are not invented, but discovered. In any case, the concept is formalized, but the idea was always there, waiting for anyone who found it. Second, it was not just one person, but several. What's more, the researchers were inspired by other previous research. The idea didn't come from nowhere.

@Hlbkomer 5 ай бұрын

He already interviewed him.

@ufcprophet40 5 ай бұрын

I understood everything

@alichamas63 5 ай бұрын

Something something something TOOK ER JEEERBS!

@VideoToWords 4 ай бұрын

✨ Summary: - Attention mechanisms, such as self-attention, led to breakthroughs like Transformers, significantly improving model performance. - Key ideas include leveraging soft attention and convolutional models for autoregressive tasks. - Combining attention with convolutional models allowed for efficient parallel computation, optimizing GPU usage. - Transformers marked a pivotal moment, enhancing compute efficiency and learning higher-order dependencies without parameters in self-attention. - Scaling transformers with large datasets, as seen in GPT models, improved language understanding and generation. - Breakthroughs also came from unsupervised pre-training and leveraging extensive datasets like Common Crawl. - Post-training phases, including reinforcement learning from human feedback (RLHF), are crucial for making models controllable and well-behaved. - Future advancements might focus on retrieval-augmented generation (RAG) and developing smaller, reasoning-focused models. - Open source models can facilitate experimentation and innovation in improving reasoning capabilities and efficiency in AI systems.

@TooManyPartsToCount 5 ай бұрын

From 9.00 mins in Aravind outlines what is perhaps the most important 'next phase' for the current ML/LLM trajectory. Thanks for the clip Lex

@nintishia 5 ай бұрын

Clear summary of how the LLMs came about, including only the absolute essentials. I like it. What I like more and agree with, though, is the trend that he describes at the end.

@thehubrisoftheunivris2432 5 ай бұрын

Now I have to read a whole bunch of ai and computer jargon so I understand any of this.

@rickymort135 5 ай бұрын

I'm close to being an ML engineer, I've made my own transformer models and I'd say the barrier to entry here is very high. The best way to scale it is with the Andrej Karpathy videos on your to make GPT

@thehubrisoftheunivris2432 5 ай бұрын

@@rickymort135 thanks. I understand a lot of stuff on lex's podcast but not this.

@mauiblack1068 5 ай бұрын

Exactly, he might as well be speaking Arabic lol.

@rickymort135 5 ай бұрын

@@mauiblack1068 bit racist...

@mauiblack1068 5 ай бұрын

@@rickymort135 does Gaelic work better for you?

@wyattross9123 5 ай бұрын

This video was the cherry on the cake to my day

@simonkotchou9644 5 ай бұрын

Nice open note vs closed note analogy

@Dadspoke 5 ай бұрын

Kendrick….drop a diss track on this foo

@AxemanMessiah 5 ай бұрын

@HybridHalfie 5 ай бұрын

It’s interesting how antiquated recurrent neural networks, supervised learning, support vector machines, convolutional neural networks, have become so antiquated in so little time since transformers came out. Machine learning is such an ever changing area. I would be curious to learn more about how transformers improve upon thee models regarding back propagation

@richardnunziata3221 5 ай бұрын

learning directional graphs over the embedding space may help in reasoning. Also content updating

@EdFormer 5 ай бұрын

Excellent overview of the history of deep autoregressive models, not AI in general.

@mraarone 5 ай бұрын

But when will we get feed forward training?

@Rmko4 5 ай бұрын

Wdym? GPTs are practically feed-forward. This is what allows for parallel training over all tokens without back-propagation through time. Only during inference tokens are predicted auto-regressively,, meaning that the predictions are made sequentially.

@sweetride 5 ай бұрын

"How to train an LLM to be woke yet still appear to be reasonable" is what they want. Not likely going to happen.

@GaleechLaunda 5 ай бұрын

"Woke" and reason cannot co-exist.

@olabassey3142 5 ай бұрын

ask your self why all the people intelligent enough to make these tools are not conservatards

@thinkaboutwhy 5 ай бұрын

Impossible to program ignorance so we get stuck with intelligence or woke as you seem to like to say instead. I’m ok with intelligence and logic

@benschweiger1671 5 ай бұрын

get Geoffrey Hinton on asap.

@supamatta9207 5 ай бұрын

Why didn t they just focus on indexing intelligently and selling data basis s extra. Mainly if they use modulating algorythmns then they could make high effivciency arithmetic analog like chips

@uber_l 5 ай бұрын

What if you ask to apply logic and world knowledge (physics) before giving any answer. Also an increasingly extended simulation(and/or research with statistical models or it asking for new specific data at the extreme if for a novel problem. There are so many ways to simulate thinking. For world model video labelling should be usefull up to details like emotions and next frame prediction easy

@lostinbravado 5 ай бұрын

The models need more depth. Machine learning does great at depth. LLM's do great at width, or information retention. We need a combination with some form of real world connection. Where the model can infer meaning narrowly and deeply from a large amount of information (LLM), and then use real world confirmation to confirm that the model is inferring in the right direction. Then whatever is confirmed via real world experimentation by the machine autonomously, can be integrated back into the LLM. With that approach, the data we have is more than enough for these models to build their own understandings. Meaning, we won't need to feed in any more data. The existing data is more than enough of a good starting point. We shouldn't need to feed in more data. These models needs to infer deeper meaning from the data and then run their own experiments or verify using sensors in the real world. These models need to be continually growing and improving instead of train it and forget it. Or pretrain, freeze it and then try and pull more value out of that frozen model. We're not that far. The long difficult job of building the hardware which could carry such complex software approaches has been done well enough. We just need a model that can grow and adapt by looking at the real world. Instead of some crystallization of existing human knowledge.

@EdFormer 5 ай бұрын

"Not that far"? You seem to be talking about the concept of continual/lifelong learning, which can be done with very small models, but nothing complex and definitely not LLMs that require a data centre to train. I completely agree that it's needed, along with embodiment, but it's going to take something radically new that we are probably a long way off realising.

@PryZmFiXion 5 ай бұрын

It's the reason the Spanish language works as negative/masculine. It moves it from subjective to objective.

@maxxkarma 5 ай бұрын

I think I recognize some words, but even with captions, I am clueless .

@dungbeetle. 5 ай бұрын

Wow. Sounds amazing. I just wish I knew what on earth he was talking about. Clearly I need an 'AI for Dummies' video.

@EdFormer 5 ай бұрын

I was a PhD and a year-long postdoc deep (all in ML) before I had the understanding needed to communicate on this level. The craziest thing is that I feel I had to learn about the vast majority of AI, right back to McCulloch and Pitts (1943) and including all the weird and wacky approaches we've explored for all the weird and wacky tasks we've considered since then to appreciate the tiny sliver of it that this video focuses on.

@jackyboy214-q8u 5 ай бұрын

The creation of ai and quantum computing occurring at the same time could be a bad combination if they interact the technological leap may be to much to fast for us to control

@UFOandUAPHistory 5 ай бұрын

When we finally create AGI's that are clearly "smarter" than us, will we consider them to be sentient? I suppose that we can also look for individuality of personalities in identical systems. One could, perhaps, envision a sentience that can have no individuality but operate independently.

@mikezooper 5 ай бұрын

Intelligence isn’t the same as sentience.

@UFOandUAPHistory 5 ай бұрын

@@mikezooper My (limited) understanding is the capabilities of these models seem to improve as they drive closer to modeling sentience?

@UFOandUAPHistory 5 ай бұрын

@@mikezooper and of course there is the Star Trek episode of The Trial of Data, lol... kzbin.info/www/bejne/rJvYgoV5fMSmi9ksi=1jCtlAXHpn8DM3JX

@EdFormer 5 ай бұрын

@@mikezooperand autonomy is also a different concept. There are some pretty good arguments for the ways in which they could all be linked however. Our sentience could well serve the purpose of a high level critic of our autonomous application of intelligence that allows us to further optimise.

@Rmko4 5 ай бұрын

3:42 I assume he meant to say more compute per param

@stevenhe3462 5 ай бұрын

Crystalized history.

@MOliveira-m5h 3 ай бұрын

With certain things like ChatGPT I think that language is more modular than other things and easy to work with on a computer. Language is kind of like coding where you can copy whole sections of code and have it work the same in different places most of the time. Real things are different. Like cars are modular but that's not optimum. If I select an exhaust for my honda it's not necessarily the perfect size or people select giant ones that actually make it lose horsepower and they don't know the difference. Music is also another thing that you see the limits of written music vs reality. The computer copying notes from music and mixing them up is not the same as playing music. Language is already filtered reality. People have saying such as "a picture speaks a thousand words". It's already digital pretty much or modular. In calculus for example the numeric way of solving problems turns turns the integrals into little modules. That's what the square waves of computers are and that's not real. I think the AI has a lot of hype and you're building a super mcdonalds register.

@breezy8363 5 ай бұрын

Someone explain this in millennial terms please

@lilchef2930 5 ай бұрын

Too gen Z for ya bud

@a-walpatches6460 5 ай бұрын

Puters lookin at rite stuf make AI more good.

@devbites77 5 ай бұрын

🎉😂😮😅😢😊

@MrBBOTP 5 ай бұрын

U can't!...

@pfever 5 ай бұрын

You can ask ChatGPT for that 😂

@happiestwhenhealthy9700 5 ай бұрын

what in the actual ef is this guy talking about we don’t all have pHDs

@uber_l 5 ай бұрын

But thinking might take too much compute like in humans, you pause when don't have a ready answer. For 'shiny products' people want now and fast, in a matter of a click

@mauiblack1068 5 ай бұрын

As someone who love Lex interviews I can honestly say that the only thing I understood is that he was speaking english or was he?

@pacanosiu 5 ай бұрын

@bilbobaggins5938 5 ай бұрын

*Nods sagely to the discussion, pretending I understand it*

@loudboomboom 5 ай бұрын

Damn so big LLMs post processing little LLMs?

@JoshuaDannenhauer 5 ай бұрын

The hood catches you as a kid and doesn’t let go

@TheHealthConscounist 5 ай бұрын

9:50 don’t humans reason based on facts or previous experiences? If you meditate when you’re reasoning, you are actually pulling from previous thoughts and memories and making associations about them to help you reach a decision in the present

@bonky10 5 ай бұрын

he’s trying to say that MLMs are great at creating answers from things that aren’t necessarily fact. For example, if you’ve asked ChatGPT something and it gave you an answer you know is false, it’s because it reasoned with itself to get you an answer based off of what it already knows. instead of relying on reasoning, how can we instead have the actual facts of everything we know, and have it reason based off of what is actually true? Instead of basically trying to persuade you or argue an answer.

@tandrra 5 ай бұрын

Lex with a beard 🔥🔥🔥

@t9j6c6j51 5 ай бұрын

Well obviously.

@danishwaseem5463 5 ай бұрын

Thank God there is no wowww in this podcast

@francoisjacobus 5 ай бұрын

If the Bible is uploaded will the AI preach to humans?

@kazax01 5 ай бұрын

“GPT 4.o please translate what this man is saying into normal person’s English.”

@christopherburns2303 5 ай бұрын

I must be too smart to understand this guy

@frankjamesbonarrigo7162 5 ай бұрын

Use metaphors, or something

@dungbeetle. 5 ай бұрын

Yeah, anything ... PLEASE!

@grahamashe9715 5 ай бұрын

Hey, Lex, when are you going to climb Everest?

@loveanimals-0197 5 ай бұрын

10:20 - Utter BS. This is Computer Science. Not magic.

@PhotoboothTO 5 ай бұрын

Is this guy an LLM?

@consequentlyardvark 5 ай бұрын

This fool good chatter

@cosmicsea89 5 ай бұрын

😴 soon as he started talking

@GOLDAI-Official 5 ай бұрын

Over half of population obese or overweight, take Ronaldo’s advice and get that Coca-Cola out of there ;)

@magazinevibe 5 ай бұрын

I didn't understand a thing... and you didn't either 😂

@koneye 5 ай бұрын

Still cringe to hear that software "thinks"

@paulfrederiksen5639 5 ай бұрын

Your software thinks, so what’s the problem?

@dan-cj1rr 5 ай бұрын

@@paulfrederiksen5639 nah it guesses the next token based on statistic, if u think it thinks ur dumb af

@stanstan-m9b 5 ай бұрын

@@paulfrederiksen5639good one

@bengsynthmusic 5 ай бұрын

More so than any politician.

@mikezooper 5 ай бұрын

😂 Eventually it will think. I look forward to you feeling like a fool.

@AbhimanyuKumar-wg1hg 5 ай бұрын

Ai should be reality but it is fake.

@seannewcomb7594 5 ай бұрын

this doesn't make a damn sense. 10+ years in the industry and this is nothing useful.

@rickymort135 5 ай бұрын

What doesn't make sense?

@desiafterdark 5 ай бұрын

Which part?

@Mart-Bro 5 ай бұрын

Dude has no idea how to communicate to people outside his industry

@drew4176 5 ай бұрын

😴😴

@rickymort135 5 ай бұрын

I know man, bunch of NERDS! NEEEEEERRRRRDS 🤓🤓🤓

@Conorscorner 5 ай бұрын

This guy isn't very smart....

@rickymort135 5 ай бұрын

Why?

@Bbbboy-vx1mq 5 ай бұрын

It becomes so obvious how little Lex knows and understands when people go into depth. His questions get really dumb and he struggles to come up with any insights