Рет қаралды 63,697
Learn more about LLMs & more at ► brilliant.org/TreforBazett to get started for free for 30 days, and to get 20% off an annual premium subscription!
In this video we're going to answer just how good Large Language Models (LLMs) like ChatGPT 4o, Claude 3.5, and Google's Gemini are at mathematics. I'll cite some of the results from the literature using databases such as GSM8k and MATH, and we'll see several math examples along the way. References below.
0:00 How to measure AI at math?
0:56 GSM8k and GSM-Hard
2:44 The MATH Database
4:43 ChatGPT 4o vs Gemini vs Claude 3.5 Sonnet
6:13 My Linear Algebra Exams
8:32 Computational Engines
10:34 Brilliant.org/TreforBazett
References and Citations:
*GSM8k (including graphic at 1:10 ) paperswithcode.com/sota/arith...
*GSM-Hard stats found in here: arxiv.org/abs/2406.07394
*Google Deepmind paper citing MATH database: arxiv.org/pdf/2406.06592
*I first saw the question about the smallest integer here: x.com/ericneyman/status/18041...
*Math Olympiad level problems (5:30): arxiv.org/abs/2406.07394
*Stats for Claude 3.5: www.anthropic.com/news/claude...
*Image of two calculators at 2:30 shared via CC-BY-SA 3 original here: www.wikidata.org/wiki/Q166882...
BECOME A MEMBER:
►Join: / @drtrefor
MATH BOOKS I LOVE (affilliate link):
► www.amazon.com/shop/treforbazett
COURSE PLAYLISTS:
►DISCRETE MATH: • Discrete Math (Full Co...
►LINEAR ALGEBRA: • Linear Algebra (Full C...
►CALCULUS I: • Calculus I (Limits, De...
► CALCULUS II: • Calculus II (Integrati...
►MULTIVARIABLE CALCULUS (Calc III): • Calculus III: Multivar...
►VECTOR CALCULUS (Calc IV) • Calculus IV: Vector Ca...
►DIFFERENTIAL EQUATIONS: • Ordinary Differential ...
►LAPLACE TRANSFORM: • Laplace Transforms and...
►GAME THEORY: • Game Theory
OTHER PLAYLISTS:
► Learning Math Series
• 5 Tips To Make Math Pr...
►Cool Math Series:
• Cool Math Series
SOCIALS:
►X/Twitter: X.com/treforbazett
►TikTok: / drtrefor
►Instagram (photography based): / treforphotography