Stable Reasoning in LLMs: A Novel Evaluation Metric and Benchmark

  Рет қаралды 10

AI Papers Podcast Daily

AI Papers Podcast Daily

Күн бұрын

Пікірлер
Enhancing LLM Reasoning with Argumentative Querying
15:51
AI Papers Podcast Daily
Рет қаралды 17
Forest-of-Thought: Scaling Test-Time Compute for Enhanced LLM Reasoning
15:29
Леон киллер и Оля Полякова 😹
00:42
Канал Смеха
Рет қаралды 4,7 МЛН
OpenAI Deliberative Alignment: Reasoning Enables Safer Language Models
30:14
FrontierMath: A Benchmark for Advanced Mathematical Reasoning in AI
15:42
AI Papers Podcast Daily
Рет қаралды 21
Parallelized Autoregressive Visual Generation
16:32
AI Papers Podcast Daily
Рет қаралды 8
FACTS Grounding Leaderboard: Benchmarking LLMs' Factuality
15:05
AI Papers Podcast Daily
Рет қаралды 14
Alignment Faking in Large Language Models
20:50
AI Papers Podcast Daily
Рет қаралды 55
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
16:11
AI Papers Podcast Daily
Рет қаралды 27
Benchmarking Large Language Model Agents on Real-World Tasks
11:15
AI Papers Podcast Daily
Рет қаралды 12
SWE-Bench: Evaluating Language Models on Real-World GitHub Issues
22:37
AI Papers Podcast Daily
Рет қаралды 36
Why Your Brain Sabotages Your Goals (and How to Fix It)
11:56
Productive Peter
Рет қаралды 1 М.
Леон киллер и Оля Полякова 😹
00:42
Канал Смеха
Рет қаралды 4,7 МЛН