The END of RAG? Episodic memory for infinite context length

Рет қаралды 11,133

Күн бұрын

Пікірлер: 61

@mlandaiacademy 21 күн бұрын

Thanks for discussing our paper on EM-LLM! Glad you find it interesting!! I also have a video from the first author here: kzbin.info/www/bejne/nYjSmZJrm9iNpqM - if you are interested in getting more insights.

@Tunadorable 20 күн бұрын

glad you enjoyed the video!

@mlandaiacademy 20 күн бұрын

@@Tunadorable It was amazing :D

@corgirun7892 23 күн бұрын

I personally think that the mechanism behind human episodic memory is far more complicated than this. When humans return to a specific situation, they can instantly recall things that happened decades ago. Does the human brain really store kv caches for decades? I don't believe it.

@markonfilms 23 күн бұрын

I think we encode some kinda sparse representation. Also our dreams etc seem to really help us form long term memories, so maybe it requires a kinda dreaming so to speak and a lot of reflection and updating our connections and weights dynamically. Plus there are many aspects we don't understand at all yet.

@thehipponugget3287 23 күн бұрын

Aren't we also sortve remembering memories of memories? Like the memory gets rewritten everytime it comes up. Off the top of my head I can't remember anything I haven't remembered in years, unless specifically triggered by something like a smell or music

@attilaszekeres7435 23 күн бұрын

Indeed, memories are not stored "in the physical brain" as we understand locality, physicality and the brain, but instead in the so-called biofield, the magnetosphere and other internally coherent, nested, interpenetrating domains of a spatiotemporally distributed holarchy which comprises the physical correlates of our past, present and future selves. The intricacies of these concepts are well beyond your command, in stark contrast to the straightforward matter of your car's extended warranty, which we'd like to discuss with you now.

@BasiC7786 23 күн бұрын

@@attilaszekeres7435 human magnetospehre, Biofield? bullshit. Try science not some magic.

@tiagotiagot 23 күн бұрын

Humans "in-paint" memories from bits and pieces, many studies have demonstrated eye-witnesses are not reliable.

@alexanderbrown-dg3sy 23 күн бұрын

In theory it works, but not practically. Systems like these, need to be coupled with thinking tokens, so the most semantically contextual segments are retrieved based on attention, more specifically model reasoning like humans, and instead of relative segment similarity…BUT there are a lot of ideas I took from this part. Like NLL for novel observations and event boundary detection. FYI this is what I used to actually make quiet-star useful, explicitly but autonomously allowing the model to generate useful thoughts, not to mention I use it for the basis for this new style of meta self-supervision I created for the offline token re-weighting phase. So, all and all - pretty amazing ideas in this paper, the value from some of the underlying principles are vastly understated. Great vid bro. No paper is safe lol. I see you meant that ha. Keep them coming bro.

@girlmoment669 23 күн бұрын

ponderocity rambanctious reciprocity segmentation

@OpenSourceAnarchist 16 күн бұрын

My course on Human Memory was taught by Michael Kahana, one of the names in the citations that kept popping up. Very interesting to see our in-class temporal contiguity effect demonstration playing out in an AI neuroscience context, wow! Small world in academia :)

@rhaedas9085 23 күн бұрын

I tried something that I think is similar to this (without the math part). My idea was to convert conversations into tokens for storage, and when a new prompt would be entered it would look up past events and pull things that matched closely, in theory be memories of related topics based on the token vectors. It didn't work because I don't know enough about the intricacies of tokenization and math (basically wasn't as plug and play as I was hoping for) so I did the next best thing and stored these past conversations as text logs which I then would look through with each prompt to find similar topics. In the end I actually used the LLM to do this analysis search first, then pulled the first few random good matches and incorporated them into the prompting. Even with the much less effective method it does seem to remember things. I think it only worked because I used an uncensored model that had no limit for input. I was hoping for a different approach to try but as you went through the paper a lot of it felt familiar in the general approach. I do think the token direction would work a lot better and faster, since it's a much better way to compare concepts than textual search.

@Tunadorable 23 күн бұрын

ah sounds like you did a prompt engineering/automation version of this same general (intuition/pattern/structure/idea/methodilogy), very cool

@robertphipps1430 22 күн бұрын

this might solve the 'frame problem' which early more procedural approaches to AI found difficult. Context is all about working out what is important, and an expanding context window would effectively be a solution to the basic problem of working out what IS relevant information in a certain situation.

@be1tube 23 күн бұрын

This sounds pretty simple to implement (at least as this type of paper goes.) It would be really useful when writing narrative text simulations. E.g. ... (history of simulation for all characters up to 10:00) ... "What happens between 10:00 and 10:05 from the perspective of ", ... "What happens between 10:00 and 10:05 from the perspective of " ... "Eliminate contradictions" ...

@johnkintree763 23 күн бұрын

Long term storage of EM-LLM memory segments can probably be managed in a graph structure, similar to vector storage within neo4j graph databases. A related development is the release of Falcon Mamba 7B. Apparently, increasing the amount of context included in the prompt does not increase the requirement for RAM.

@Tunadorable 23 күн бұрын

Not an expert or even competent with graphs & vector storage, but my impression is that that should work and be a great opportunity to expand the scalability of this technique. Pretty sure the authors mentioned vaguely the potential for significant improvements in that area Yes that's true for any Mamba model, but the problem with those is that the model has to not only decide what info is important (which can be loosely described as a problem of having to predict what info will turn out to be relevant/useful later), but any info deemed not important gets permanently lost. In contrast, here the memories not currently in context brought back up again if they end up being useful at some future time period, meaning the model does not have to guess as to what is going to be useful in the future. To be fair the specifics of this method involve only keeping the surprising memories and actually losing anything outside of that context buffer around said memories, but that could be easily changed for anyone looking to make their own version, they'd just adjust the hyper parameters until practically everything is put into a memory (which would mean a more expensive qk memory lookup, probably not worth it).

@technokicksyourass 23 күн бұрын

Mamba is a state space model, it doesn't work based on attention, rather it does something more like learning a differential function that maps the context to a latent space, then integrating the function over long sequence lengths. That's how it scales linearly in context rather than polynomials. Attention compares every token in the sequence to every other token in a giant table.

@patrickchristianmagtaan5511 16 күн бұрын

Thank you!

@deltamico 22 күн бұрын

It's odd they didn't try to accumulate the tokens in an episode and chose a single one instead

@Tunadorable 22 күн бұрын

right! and i didn’t look heavily into how they chose said single one, i think it had to do with its “representativeness” according to some metric used in the graph grouping stuff they did to tune the memories. i’d be interested to see ablations and compute comparisons between this single token, a sum & norm, and some more sophisticated pooling mechanism (attention based?)

@Sirmrmeowmeow 19 күн бұрын

this sounds more like efficient RAG-like memories, or a RAG-like successor; I suppose it is a kind of episodic memory, but hmm...It's not using this type of memory to be actively in the now per se, well.. eh. Suppose i'm actually looking for a kind of working memory of sorts. eh i feel like continuity and coherence of intent and of tasks/problem solving should be maintained, not just retrieval of past events from the previous inference, but also that the current inference should have the "why" from a previous infs or some kind of direct "knowledge update" that informs the current inference. Prob going to be either some kind of autoencoder like memory unit informed inference -like LARIMAR ++ but trained for coherence & continuity over time and for knowledge updates (and storing, and retrieval and proper use thereof for) tasks or some kind of stateful possibly recurrent complicated system... True memory esp episodic memory is going to be awesome for agents if/when it happens. An inference that doesn't start over every time... one can dream...

@badashphilosophy9533 14 күн бұрын

I dont understand anything but does this mean llms will be able to do more things without needing training or specially created vectors to help them understand what we are trying to do, coz i could wait for that

@cinnamonroll5615 23 күн бұрын

So, technically, the next gpt could have adhd ? And if so, did we just solved the mathematical form for adhd ?

@dadsonworldwide3238 23 күн бұрын

Lol, apparently it's only historically miss aligned companies involved as if tuning is more for censorship less about letting congruent line of measure go

@Tunadorable 23 күн бұрын

Maybe I just forgot something I said in the video, but I'm curious as to why you drew a relation between this and ADHD. Could you elaborate?

@dadsonworldwide3238 23 күн бұрын

@Tunadorable I don't recall you mentioning chemistry. Lol miss aligned measure does exist in this topic & understanding in our public domain even experts struggle with it . a lot of miss aligned 1 Teaching methodology 2 poor diagnosis 3 weak understanding of intelligence, intellectual iq testing, etc etc etc

@dadsonworldwide3238 23 күн бұрын

@Tunadorable if anything, your strengthening evidence that memory is less chemical and more thermodynamical. If a human doesn't see value to enter a memory in the first place, it won't ever be remembered or encoded. No matter what reasons

@cinnamonroll5615 22 күн бұрын

@Tunadorable so like, in 3.2 and 3.3 they said something about the positional embedding that could improve the robustness of the model sine its a fixed positional embedding, now, instead of a fixed positional embedding, you just got static noise as it is, and sometimes, the static repeat in some pattern that can move the token in their latent space ( similar to how adhd can relate seemingly random topic) And 3.3 they have theorised that the event can be recall most efficiently is the event that is correlate in some fields, but since our embedding is noise, sometimes, the noise can move the token that makes it related (like how apple being moved close to phone due to random noise and we got iphone, how episodic memory got moved to AI and somehow we got adhd ?) Im aware that the dimenson of the latent space is so large that we cant just deploy numpy.random to move token around, we need sthg something random but still predictable to some degree to maybe mimic adhd brain ?

@jmarkinman 23 күн бұрын

They should change the title from “infinite” context to “unbounded” context, as “infinite context” implies something physically impossible.

@Tunadorable 23 күн бұрын

ppl do love that word haha

@RoulDukeGonzo 23 күн бұрын

Finally! Hierarchical attention.

@deltamico 22 күн бұрын

Not really if there is only one level of selection

@RoulDukeGonzo 22 күн бұрын

@@deltamico I know, but it's a start

@kimcosmos 22 күн бұрын

So it can recommend paragraph, section and chapter breaks? And from that build an index? Finally, an AI boredom graph.

@davidhart1674 23 күн бұрын

Thanks!

@Tunadorable 23 күн бұрын

thank you☺️

@RickySupriyadi 21 күн бұрын

this should be needed lots of... visualization... in my mind and i failed to visualize them, so I'm failed to understand this

@Tunadorable 20 күн бұрын

i do need to get into the habit of pulling out the ipad pencil more often

@RoulDukeGonzo 23 күн бұрын

Plz link induction head vid

@GNARGNARHEAD 23 күн бұрын

neat

@jonnylukejs 23 күн бұрын

bro who stole my work

@jonnylukejs 23 күн бұрын

I literally own the copyright to the code for this.

@Tunadorable 23 күн бұрын

lmao relatable

@Tunadorable 23 күн бұрын

haha i assume simultaneous creation is a real jerk but if you’re being literal about them ripping code from your github i would love to see said code. can’t remember if i checked whether they open sourced theirs for comparison to be able to make that claim

@jonnylukejs 21 күн бұрын

@@Tunadorable yeah definitely, it happens all the time. Super common in AI where the LLMs love to share anything they've picked up and get trained on their own past conversations a lot of the time.