Using recurrence to achieve weak to strong generalization

  Рет қаралды 1,584

Simons Institute

Simons Institute

Күн бұрын

Tom Goldstein (University of Maryland)
simons.berkele...
Transformers as a Computational Model
Weak-to-strong generalization refers to the ability of a reasoning model to solve "harder" problems than those in its training set. I'll argue that recurrent architectures, in which networks can dynamically scale the level of computation used to solve a problem, are necessary to achieve dramatic weak to strong behavior. I'll present examples where recurrent networks exhibit weak-to-strong generalization for a range of simple reasoning problems. Then I'll show that transformer-based LLMs benefit from recurrence as well, boosting their performance on weak-to-strong arithmetic tasks.

Пікірлер
Towards Understanding Modern Alchemy
37:14
Simons Institute
Рет қаралды 501
99.9% IMPOSSIBLE
00:24
STORROR
Рет қаралды 31 МЛН
We Attempted The Impossible 😱
00:54
Topper Guild
Рет қаралды 56 МЛН
When you have a very capricious child 😂😘👍
00:16
Like Asiya
Рет қаралды 18 МЛН
Overparametrized LLM: COMPLEX Reasoning (Yale Univ)
30:01
Discover AI
Рет қаралды 8 М.
Cryptography: From Mathematical Magic to Secure Communication
1:08:14
Simons Institute
Рет қаралды 35 М.
Multi horizon forecasting for limit order books
37:40
Alpha Events
Рет қаралды 14 М.
Strong Generalization from Small Brains and No Training Data
57:30
Simons Institute
Рет қаралды 507
GPT-2 Teaches GPT-4: Weak-to-Strong Generalization
7:21
The Inside View
Рет қаралды 1,6 М.
Out-of-Distribution Generalization as Reasoning: Are LLMs Competitive?
1:02:01
99.9% IMPOSSIBLE
00:24
STORROR
Рет қаралды 31 МЛН