Which Tokens You Predict Underlie the Reversal Curse and More

  Рет қаралды 1,589

Tunadorable

Tunadorable

Күн бұрын

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
arxiv.org/abs/...
The other video of mine that I mentioned:
• The Pitfalls of Next T...
Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!
/ tunadorable
account.venmo....
Discuss this stuff with other Tunadorks on Discord
/ discord
All my other links
linktr.ee/tuna...

Пікірлер: 19
@andrewsilber
@andrewsilber 3 ай бұрын
That finding is mildly disconcerting. Doesn’t it imply that even at the higher layers of abstraction it doesn’t glean the concept of identity or simple IS-A relationships ? If that’s the case, then what else *isn’t* it understanding?
@mickelodiansurname9578
@mickelodiansurname9578 3 ай бұрын
Harrison Kinsley mentioned this when GPT3.5 was released that he first asked it "Who is Harrison Kinsley?" and it did not know but when he asked it "Who is SentDex it mentioned it s a channel run by Harrison Kinsley." So its probably safe to assume its a reversal curse.
@OpenSourceAnarchist
@OpenSourceAnarchist 3 ай бұрын
12:22 I actually really appreciate the quick break downs. I've been learning all about the guts of neural networks and the math behind them, but only in an ad hoc way through KZbin (cog sci background). The in-line banter and commentary is wonderful :)
@jakeaustria5445
@jakeaustria5445 3 ай бұрын
Don't know yet how masking works, I still need to study that one. But great video as always. I didn't know before that Reversal Curse is a thing before this vid.
@kevon217
@kevon217 2 ай бұрын
lovin what you’re laying down on these paper overviews. Very interesting selections.
@marcfruchtman9473
@marcfruchtman9473 3 ай бұрын
I think explanation is good when things haven't been covered before. Thanks for the video.
@aboubenadhem9066
@aboubenadhem9066 3 ай бұрын
Last paragraph on p3 implies that “entity pre-parsing” would be one way around the issue. Does that mean training the model on parse trees instead of linear text order?
@Tolken00
@Tolken00 3 ай бұрын
This is so cool! Makes me excited for what's possible!
@andybrice2711
@andybrice2711 3 ай бұрын
This further convinces me that we ought to be incorporating some sort of knowledge graph into LLMs.
@alexanderbrown-dg3sy
@alexanderbrown-dg3sy 3 ай бұрын
Without any order enhanced pretraining, would still have the limitation if you consumed that KG using next-token prediction though…but I definitely agree with this sentiment in general.
@BooleanDisorder
@BooleanDisorder 3 ай бұрын
Combine them with graph neural networks that take knowledge as input. AI could put relevant parts in the gnn input itself
@alexanderbrown-dg3sy
@alexanderbrown-dg3sy 3 ай бұрын
@@BooleanDisorder true. I seen research on linearizing and tokenizing KG’s…with any order optimized pretraining, you would get the same benefit as combining a LM + GNN…with the added benefit of the scaling benefit of LM’s.
@wwkk4964
@wwkk4964 3 ай бұрын
Thanks for sharing! This solution along with a dynamic tokenizer that is allowed to have multiple tokens or multi symbol representations in it vocabulary that it is allowed to learn on the fly as it sees new input would be the way to go. I think the Tokenizer can even learn things at the level of lexical units so that the model only has to see abstractions it must solve.
@wwkk4964
@wwkk4964 3 ай бұрын
The Wikireversal table of results was enlightening. 1. The fact that MLM-U trained model had a much more similar backwards vs forwards score, gives me confidence that its learning was probably more conceptual and relational vs pure memorization which we would expect if the learning was strongly influenced by the direction or chain of events.
@sikunowlol
@sikunowlol 3 ай бұрын
oi
@Proprogrammer001
@Proprogrammer001 3 ай бұрын
oi
@shrokompany4611
@shrokompany4611 3 ай бұрын
oi
@waveFunction25
@waveFunction25 3 ай бұрын
Oi
Age of Easy Money (full documentary) | FRONTLINE
1:53:18
FRONTLINE PBS | Official
Рет қаралды 19 МЛН
Spongebob ate Patrick 😱 #meme #spongebob #gmod
00:15
Mr. LoLo
Рет қаралды 19 МЛН
Please Help This Poor Boy 🙏
00:40
Alan Chikin Chow
Рет қаралды 22 МЛН
when you have plan B 😂
00:11
Andrey Grechka
Рет қаралды 67 МЛН
iPhone or Chocolate??
00:16
Hungry FAM
Рет қаралды 43 МЛН
Vectors & Dot Product • Math for Game Devs [Part 1]
3:16:28
Freya Holmér
Рет қаралды 826 М.
CPI and the Fed Decides | Bloomberg Surveillance | June 12, 2024
2:56:18
Bloomberg Television
Рет қаралды 17 М.
ChatGPT for Data Analytics: Full Course
3:35:30
Luke Barousse
Рет қаралды 283 М.
Swift Programming Tutorial for Beginners (Full Tutorial)
3:22:45
CodeWithChris
Рет қаралды 7 МЛН
Vectoring Words (Word Embeddings) - Computerphile
16:56
Computerphile
Рет қаралды 292 М.
This Is Why You Can’t Go To Antarctica
29:30
Joe Scott
Рет қаралды 6 МЛН
Eric Weinstein - Are We On The Brink Of A Revolution? (4K)
3:29:15
Chris Williamson
Рет қаралды 6 МЛН
Will Merrill: The Illusion of State in State-Space Models
45:43
Formal Languages and Neural Networks Seminar
Рет қаралды 1,3 М.
Hands-On Power BI Tutorial 📊Beginner to Pro [Full Course] ⚡
3:05:45
Pragmatic Works
Рет қаралды 2,2 МЛН
Spongebob ate Patrick 😱 #meme #spongebob #gmod
00:15
Mr. LoLo
Рет қаралды 19 МЛН