NEW: Better In-Context Learning ICL, Improved RAG (Harvard)

Рет қаралды 7,537

Күн бұрын

New research for an improved In-Context Learning (ICL) of Large Language Models. Also improves the Augmentation part of a RAG system.
Deep dive into the learning procedures of a transformer to optimize the learning behavior of AI for ICL. No expensive fine-tuning or pre-training.
All right w/ authors:
ICLR: IN-CONTEXT LEARNING OF REPRESENTATIONS
Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana, Yongyi Yang,
Maya Okawa, Kento Nishi, Martin Wattenberg & Hidenori Tanaka
CBS-NTT Program in Physics of Intelligence, Harvard University
Department of Physics, Harvard University
Physics & Informatics Lab, NTT Research Inc.
SEAS, Harvard University
CSE, University of Michigan, Ann Arbor
#airesearch
#harvarduniversity
#harvard
#coding
#reasoning

Пікірлер: 33

@code4AI Ай бұрын

With the automatic audio dubbing from KZbin /Google you hear a synthetic voice in your regional language. To hear my original voice in English, switch to "Default" or "English" in the settings. Thank you.

@MrRavaging 21 күн бұрын

I really enjoy your videos, but the one thing I don't understand (because I'm new to programming) is: It looks like the information is being transformed mathematically throughout the stages of inference and back propagation, but I'm still confused - does in-context learning really mean that the baked-in values of the LLM are permanently affected? Meaning, if I start up a new conversation, will my previous conversation have had a persistent influence on how it process information in the new conversation? And if so - how can I implement that? Are you able to explain it in layman terms (without metaphor, of course) to someone only recently familiar with Ollama and LLMs and the world of computer programming? I'm working through Cursor, using a composer agent to create a hypothalmus/hippocampus module to turn probabilistic transformations into purposeful and structured and self-critical reasoning using Python.

@En1Gm4A Ай бұрын

cant wait for semantic graph memory and task planning on that semantic graph abstraction. This will enable true AGI

@Wotevz Ай бұрын

Tell me more … running but not released … open to beta testers

@matt.lehodey Ай бұрын

@@WotevzI wanna know more too 🤣

@mrorigo Ай бұрын

Default voice is the best. Took me a couple of weeks to get used to your English, but now it feels super-natural. Keep it coming, super-appreciate your work!

@En1Gm4A Ай бұрын

lets go - pls more knowledge graph + LLM stuff. This is the future. Think about agents showing a planned path for task execution __BEFORE__ they actually execute it. That path could be displayed on a graph and reviewed and approved :-D Would mean much for agent savety

@wgabrys88 Ай бұрын

Dude is sharing knowledge like every day was one year of inprovement❤

@sgttomas Ай бұрын

thank you for providing all the context for this video and for bringing this research to our attention!

@awakenwithoutcoffee Ай бұрын

you have an excellent voice & reasoning for these type of videos, great content as usual. Personally I believe the major painpoint in RAG is it's overlooked simpliication of the back-end : the reliance on a single vectorstore is a major contributor to hallucinations (as you point out). We found that the biggest impact in decreasing hallucinations is improved data segregation & preperation pipelines while not solely relying on vectors (fulltext-search, bm25, hybrid etc.). Having said that its still an incomplete puzzle and in-context learning /in-context fine-tuning are very interesting. Cheers!

@xt-89907 Ай бұрын

This is great. The natural next step is to expand this to more complex tensor decomposition techniques, even autoencoders. Just like with the Anthropic MI paper. If we can get a mapping of this meta knowledge graph, then we can incorporate reinforcement learning to optimize representations dynamically in-context. This could be very powerful for better Test Time Compute, improved self-awareness of the model, and so on. But just solving online learning and making it sample efficient would be a major barrier removal for the usefulness of Agents. What would also be great is to explicitly include a causal graph as an optional bias, writing to change covariate features as necessary. If the TLLM is essentially a kind of causal model, you could make active learning very efficient.

@kevinpham6658 Ай бұрын

Geez, left us on a cliffhanger! Can't wait until the next video.

@TheEtrepreneur Ай бұрын

Salutations Mr Discover AI, you're becoming Epic. Keep it up. 🏆🏆🏆 p.s. Apple > bird > sand > sun > plane > opera ! Got it at first sight, DAGs rocks. Is this a 90% computational efficiency on traditional LLM operations? looks like it. 💥💥

@fdavis1555 Ай бұрын

Fascinating research!

@dmytroaleinykov4088 Ай бұрын

Thank you for your amazing videos!

@augmentos Ай бұрын

Goooood morning ❤

@syntaxstreets Ай бұрын

thank you, you are awesome, i recommend your channel when someone talk about ai😀

@gunterstrubinsky9452 Ай бұрын

'elon' is a 4-letter word in the academic sub-net!

@samarthpatel8377 Ай бұрын

This is good! Better alignment and the sauce for AGI

@dairin0d Ай бұрын

Thanks for explaining interesting papers! This kind of reminds me of the the idea that knowing the "distances" between all points (concepts) of a dataset (essentially, a weighted graph) is enough to define its "internal geometry", so maybe these "random/circular walks" dynamically adjust LLM's representation to match the observed "distances" between "nearby" words/pairs? (Just speculating; I haven't yet read the paper in detail, so maybe this is just a differently phrased view on the same mathematics they describe.) By the way (out of curiosity): have you heard of hyperdimensional computing / vector symbolic architectures? It seems to have quite a bit of overlap with what neural networks are doing geometrically, but what I found especially interesting about it is that it provides a formal mathematical approach to define (and operate on) complex data structures in vector space :-)

@tiagotiagot Ай бұрын

So it might be possible to teach in-context spatial relations of arbitrary environments, geometries etc? Any way to compact that so it can be overlaid/added to the existing pre-training without wasting context space every time?

@IdPreferNot1 Ай бұрын

Energy efficiency in an llm seems like an "obvious" organizing principle. Not sure how that translates to being able to see it visually similar... I guess any further abstraction of the true form would require more energy for a transformation?

@maertscisum Ай бұрын

Do you plan to cover KAG?

@sndrstpnv8419 Ай бұрын

can you share link to code or paper ?

@thingX1x Ай бұрын

I have a chatbot with a graphrag using word2vec. When I add new info word2vec is retrained on this new info and used for prompt augmentation. Is this ICL? The llm only generates new data semantically similar with the word2vec. WOuld appreciate your input, or if I could even send you it. I even have a structured data .db file that updates structured data per message, file upload, or website scrape.

@sndrstpnv8419 Ай бұрын

good question

@RaviPrakash-dz9fm Ай бұрын

Damn!

@minissoft Ай бұрын

Why do we think in 2D and 3D? We should think in n dimensions.

@justinnine4940 Ай бұрын

because the input grid structure is 2D, you need to down project the latent structure to the same dimension in order to see it

@VictorGallagherCarvings Ай бұрын

I don't think that over righting facts with opinions is a particularly good idea.

@stevehall794 Ай бұрын

nothing useful to learn here

@IanTindale Ай бұрын

I predict a day in the future where we have ‘emptied’ LLMs (well, not language, but any capturable variable behaviour out there in the outside world, eg, ducks suddenly deciding to move to over there instead of staying here) and these will be like our current LLMs but taken a stage further by ‘emptying’ them of everything they’ve learned, leaving behind only the fact that they’ve had training - these emptied models will then proceed to learn anew like baby animals or people, only containing the minimum or ‘instinctual’ learning, but empty of facts, causal, experiential, observational ‘knowledge’ until it has reached out and filled itself up again - these models will be tiny, just little seeds, and everyone can get their own, or have a few, like pets, and they grow up to have distinct personalities (unless they start networking and sharing their knowledge and discussing things among themselves)