Learning to Reason with LLMs

  Рет қаралды 10,895

Simons Institute

Simons Institute

Күн бұрын

Noam Brown (OpenAI)
simons.berkele...
Transformers as a Computational Model
Large language models (LLMs) have demonstrated remarkable capabilities in generating coherent text and completing various natural language tasks. Nevertheless, their ability to perform complex, general reasoning has remained limited. In this talk, I will describe OpenAI's new o1 model, an LLM trained via reinforcement learning to generate a hidden chain of thought before its response. We have found that the performance of o1 consistently improves with more reinforcement learning compute and with more inference compute. o1 surpasses previous state-of-the-art models in a variety of benchmarks that require reasoning, including mathematics competitions, programming contests, and advanced science question sets. I will discuss the implications of scaling this paradigm even further.

Пікірлер: 9
@labsanta
@labsanta 2 ай бұрын
00:03 Learning from AI expert on reasoning and game plays 02:06 Evolution of poker Bots and model scaling in competitions 06:20 Improving search scalability in poker algorithms. 08:30 Use of search and planning in improving poker bot performance 12:36 Search algorithm made a 100,000x difference in poker and go 14:44 Scaling up models for performance improvement 18:42 Consensus or majority voting can improve performance on exams using GBD4. 20:35 Scalability of inference compute leads to significant performance improvements. 24:52 Machine learning moving towards reasoning-based models 26:42 Deciphering difficult codes using reasoning 30:41 Games provide ground truth for verifying winning states. 32:40 Different approaches to compute allocation for model training and testing impacts ELO rating 36:28 Effective algorithms leverage increased compute for long-term success. 38:28 OpenAI launching a new multi-agent reasoning team and hiring strong engineers for research. 42:35 Significant impact of scaling up inference compute 44:26 Need for restructuring academic research 48:25 Exploring different approaches for inference compute 50:11 Introducing controllable thinking time for more effective reasoning.
@jasongrig
@jasongrig 25 күн бұрын
good presentation
@user-wr4yl7tx3w
@user-wr4yl7tx3w 2 ай бұрын
But what does it mean for a model to think? Think like humans? “I think, therefore I am” type of think. Think is not defined.
@souvikbhattacharyya2480
@souvikbhattacharyya2480 Ай бұрын
24:54 sad
@athalais9332
@athalais9332 2 ай бұрын
Originally wrote a comment as a review of the talk, but when I re-read it, it felt a bit too mean. Instead, just have my personal recommendation that if you're limited on time, the other talks in this workshop are a great watch!
@nicksohacki7114
@nicksohacki7114 2 ай бұрын
?
@randomuser5237
@randomuser5237 2 ай бұрын
Literally no one cares about your review or recommendation, stop spamming the comment section.
@user-wr4yl7tx3w
@user-wr4yl7tx3w 2 ай бұрын
It’s not mean if you provide valid reasons. Otherwise, it’s called woke. This is science. Not a tea party.
@islandfireballkill
@islandfireballkill 2 ай бұрын
Here is a meta review that isnt afraid to be mean. Your review sucks and contains little information and contains no useful information. It's like it was generated by some highly censored LLM.
Using recurrence to achieve weak to strong generalization
47:10
Simons Institute
Рет қаралды 1,5 М.
Learning to Reason, Insights from Language Modeling
57:43
MITCBMM
Рет қаралды 6 М.
UFC 310 : Рахмонов VS Мачадо Гэрри
05:00
Setanta Sports UFC
Рет қаралды 1,2 МЛН
Арыстанның айқасы, Тәуіржанның шайқасы!
25:51
QosLike / ҚосЛайк / Косылайық
Рет қаралды 700 М.
BAYGUYSTAN | 1 СЕРИЯ | bayGUYS
36:55
bayGUYS
Рет қаралды 1,9 МЛН
Coding Virtual Memory in C
1:09:42
dr Jonas Birch
Рет қаралды 790
Learning at test time in LLMs
51:02
Machine Learning Street Talk
Рет қаралды 25 М.
Self-Play by Noam Brown
56:24
Cooperative AI Foundation
Рет қаралды 9 М.
OpenAI's Noam Brown Unpacks the Full Release of o1 and the Path to AGI
48:18
Unsupervised Learning: Redpoint's AI Podcast
Рет қаралды 49 М.
Speculations on Test-Time Scaling (o1)
47:56
Sasha Rush 🤗
Рет қаралды 20 М.
UFC 310 : Рахмонов VS Мачадо Гэрри
05:00
Setanta Sports UFC
Рет қаралды 1,2 МЛН