NEW: Better In-Context Learning ICL, Improved RAG (Harvard)

  Рет қаралды 7,537

Discover AI

Discover AI

Күн бұрын

New research for an improved In-Context Learning (ICL) of Large Language Models. Also improves the Augmentation part of a RAG system.
Deep dive into the learning procedures of a transformer to optimize the learning behavior of AI for ICL. No expensive fine-tuning or pre-training.
All right w/ authors:
ICLR: IN-CONTEXT LEARNING OF REPRESENTATIONS
Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana, Yongyi Yang,
Maya Okawa, Kento Nishi, Martin Wattenberg & Hidenori Tanaka
CBS-NTT Program in Physics of Intelligence, Harvard University
Department of Physics, Harvard University
Physics & Informatics Lab, NTT Research Inc.
SEAS, Harvard University
CSE, University of Michigan, Ann Arbor
#airesearch
#harvarduniversity
#harvard
#coding
#reasoning

Пікірлер: 33
@code4AI
@code4AI Ай бұрын
With the automatic audio dubbing from KZbin /Google you hear a synthetic voice in your regional language. To hear my original voice in English, switch to "Default" or "English" in the settings. Thank you.
@MrRavaging
@MrRavaging 21 күн бұрын
I really enjoy your videos, but the one thing I don't understand (because I'm new to programming) is: It looks like the information is being transformed mathematically throughout the stages of inference and back propagation, but I'm still confused - does in-context learning really mean that the baked-in values of the LLM are permanently affected? Meaning, if I start up a new conversation, will my previous conversation have had a persistent influence on how it process information in the new conversation? And if so - how can I implement that? Are you able to explain it in layman terms (without metaphor, of course) to someone only recently familiar with Ollama and LLMs and the world of computer programming? I'm working through Cursor, using a composer agent to create a hypothalmus/hippocampus module to turn probabilistic transformations into purposeful and structured and self-critical reasoning using Python.
@En1Gm4A
@En1Gm4A Ай бұрын
cant wait for semantic graph memory and task planning on that semantic graph abstraction. This will enable true AGI
@Wotevz
@Wotevz Ай бұрын
Tell me more … running but not released … open to beta testers
@matt.lehodey
@matt.lehodey Ай бұрын
@@WotevzI wanna know more too 🤣
@mrorigo
@mrorigo Ай бұрын
Default voice is the best. Took me a couple of weeks to get used to your English, but now it feels super-natural. Keep it coming, super-appreciate your work!
@En1Gm4A
@En1Gm4A Ай бұрын
lets go - pls more knowledge graph + LLM stuff. This is the future. Think about agents showing a planned path for task execution __BEFORE__ they actually execute it. That path could be displayed on a graph and reviewed and approved :-D Would mean much for agent savety
@wgabrys88
@wgabrys88 Ай бұрын
Dude is sharing knowledge like every day was one year of inprovement❤
@sgttomas
@sgttomas Ай бұрын
thank you for providing all the context for this video and for bringing this research to our attention!
@awakenwithoutcoffee
@awakenwithoutcoffee Ай бұрын
you have an excellent voice & reasoning for these type of videos, great content as usual. Personally I believe the major painpoint in RAG is it's overlooked simpliication of the back-end : the reliance on a single vectorstore is a major contributor to hallucinations (as you point out). We found that the biggest impact in decreasing hallucinations is improved data segregation & preperation pipelines while not solely relying on vectors (fulltext-search, bm25, hybrid etc.). Having said that its still an incomplete puzzle and in-context learning /in-context fine-tuning are very interesting. Cheers!
@xt-89907
@xt-89907 Ай бұрын
This is great. The natural next step is to expand this to more complex tensor decomposition techniques, even autoencoders. Just like with the Anthropic MI paper. If we can get a mapping of this meta knowledge graph, then we can incorporate reinforcement learning to optimize representations dynamically in-context. This could be very powerful for better Test Time Compute, improved self-awareness of the model, and so on. But just solving online learning and making it sample efficient would be a major barrier removal for the usefulness of Agents. What would also be great is to explicitly include a causal graph as an optional bias, writing to change covariate features as necessary. If the TLLM is essentially a kind of causal model, you could make active learning very efficient.
@kevinpham6658
@kevinpham6658 Ай бұрын
Geez, left us on a cliffhanger! Can't wait until the next video.
@TheEtrepreneur
@TheEtrepreneur Ай бұрын
Salutations Mr Discover AI, you're becoming Epic. Keep it up. 🏆🏆🏆 p.s. Apple > bird > sand > sun > plane > opera ! Got it at first sight, DAGs rocks. Is this a 90% computational efficiency on traditional LLM operations? looks like it. 💥💥
@fdavis1555
@fdavis1555 Ай бұрын
Fascinating research!
@dmytroaleinykov4088
@dmytroaleinykov4088 Ай бұрын
Thank you for your amazing videos!
@augmentos
@augmentos Ай бұрын
Goooood morning ❤
@syntaxstreets
@syntaxstreets Ай бұрын
thank you, you are awesome, i recommend your channel when someone talk about ai😀
@gunterstrubinsky9452
@gunterstrubinsky9452 Ай бұрын
'elon' is a 4-letter word in the academic sub-net!
@samarthpatel8377
@samarthpatel8377 Ай бұрын
This is good! Better alignment and the sauce for AGI
@dairin0d
@dairin0d Ай бұрын
Thanks for explaining interesting papers! This kind of reminds me of the the idea that knowing the "distances" between all points (concepts) of a dataset (essentially, a weighted graph) is enough to define its "internal geometry", so maybe these "random/circular walks" dynamically adjust LLM's representation to match the observed "distances" between "nearby" words/pairs? (Just speculating; I haven't yet read the paper in detail, so maybe this is just a differently phrased view on the same mathematics they describe.) By the way (out of curiosity): have you heard of hyperdimensional computing / vector symbolic architectures? It seems to have quite a bit of overlap with what neural networks are doing geometrically, but what I found especially interesting about it is that it provides a formal mathematical approach to define (and operate on) complex data structures in vector space :-)
@tiagotiagot
@tiagotiagot Ай бұрын
So it might be possible to teach in-context spatial relations of arbitrary environments, geometries etc? Any way to compact that so it can be overlaid/added to the existing pre-training without wasting context space every time?
@IdPreferNot1
@IdPreferNot1 Ай бұрын
Energy efficiency in an llm seems like an "obvious" organizing principle. Not sure how that translates to being able to see it visually similar... I guess any further abstraction of the true form would require more energy for a transformation?
@maertscisum
@maertscisum Ай бұрын
Do you plan to cover KAG?
@sndrstpnv8419
@sndrstpnv8419 Ай бұрын
can you share link to code or paper ?
@thingX1x
@thingX1x Ай бұрын
I have a chatbot with a graphrag using word2vec. When I add new info word2vec is retrained on this new info and used for prompt augmentation. Is this ICL? The llm only generates new data semantically similar with the word2vec. WOuld appreciate your input, or if I could even send you it. I even have a structured data .db file that updates structured data per message, file upload, or website scrape.
@sndrstpnv8419
@sndrstpnv8419 Ай бұрын
good question
@RaviPrakash-dz9fm
@RaviPrakash-dz9fm Ай бұрын
Damn!
@minissoft
@minissoft Ай бұрын
Why do we think in 2D and 3D? We should think in n dimensions.
@justinnine4940
@justinnine4940 Ай бұрын
because the input grid structure is 2D, you need to down project the latent structure to the same dimension in order to see it
@VictorGallagherCarvings
@VictorGallagherCarvings Ай бұрын
I don't think that over righting facts with opinions is a particularly good idea.
@stevehall794
@stevehall794 Ай бұрын
nothing useful to learn here
@IanTindale
@IanTindale Ай бұрын
I predict a day in the future where we have ‘emptied’ LLMs (well, not language, but any capturable variable behaviour out there in the outside world, eg, ducks suddenly deciding to move to over there instead of staying here) and these will be like our current LLMs but taken a stage further by ‘emptying’ them of everything they’ve learned, leaving behind only the fact that they’ve had training - these emptied models will then proceed to learn anew like baby animals or people, only containing the minimum or ‘instinctual’ learning, but empty of facts, causal, experiential, observational ‘knowledge’ until it has reached out and filled itself up again - these models will be tiny, just little seeds, and everyone can get their own, or have a few, like pets, and they grow up to have distinct personalities (unless they start networking and sharing their knowledge and discussing things among themselves)
@proterotype
@proterotype 8 күн бұрын
Hellllloooo communitttyy! Eagerly awaiting part 2 of this video “Change Your LLM” 🧊🌒
NEW Knowledge Graph based RAG: SimGRAG (no training)
18:42
Discover AI
Рет қаралды 10 М.
Мем про дорожку
00:38
Max Maximov
Рет қаралды 4,4 МЛН
Banana vs Sword on a Conveyor Belt
01:00
Mini Katana
Рет қаралды 77 МЛН
What if all the world's biggest problems have the same solution?
24:52
AI This Week: Top AI Stories & Quick Recap!
27:08
Discover AI
Рет қаралды 1,9 М.
How language model post-training is done today
53:51
Interconnects AI
Рет қаралды 6 М.
GraphRAG vs In-Context Learning ICL
33:29
Discover AI
Рет қаралды 3 М.
NEW AGENT R: Self-Learning w/ Error Correction
26:37
Discover AI
Рет қаралды 3,6 М.
Finally: Grokking Solved - It's Not What You Think
27:02
Discover AI
Рет қаралды 17 М.
Web Developers Are Disconnected
21:36
ThePrimeTime
Рет қаралды 286 М.
2025 MIT Integration Bee - Finals
33:54
MIT Integration Bee
Рет қаралды 82 М.
Code CoT w/ Self-Evolution LLM: rStar-Math Explained
34:05
Discover AI
Рет қаралды 4,2 М.
The Dark Matter of AI [Mechanistic Interpretability]
24:09
Welch Labs
Рет қаралды 154 М.