Advanced RAG 04 - Contextual Compressors & Filters

Рет қаралды 17,405

Sam Witteveen

Күн бұрын

Пікірлер: 34

@billykotsos4642 11 ай бұрын

this is actually an interesting idea...

@shivamroy1775 11 ай бұрын

Great content. Thanks for taking the time to make such videos. I've been learning a lot from them.

@TheAmit4sun 11 ай бұрын

I have found Filters that give yes and not to be of not much of a help. for example, i have embedding of tech docs and then embeddings of order processing system. When filter is set and prompt is submitted with a random query like "can i order pizza with it?" model thinks the context is related to order processing and returns YES which is totally wrong.

@henkhbit5748 11 ай бұрын

Thanks for the video about finetuning RAG. Personally I think the solution of Self-RAG is more generic because its embedded in the LLM...

@micbab-vg2mu 11 ай бұрын

Thank you for another great video:)

@billykotsos4642 11 ай бұрын

So instead of using an 'Extractive QA model' you prompt an LLM into doing the same thing... amazing how flexible these LLMs are... in this case you are basing your hopes on the models 'reasoning'....

@clray123 11 ай бұрын

As long as someone else pays for it...

@wiltedblackrose 11 ай бұрын

This is really interesting. My only worry is that this makes it prohibitively slow. The longest part of RAG is often the call to the LLM. I'd be interesting if you could review some companies which have faster models than OpenAI but still have decent performance.

@mungojelly 11 ай бұрын

if i was making a chatbot & needed it to not lag before responding, i'd just fake it,,, like how windows has twelve different bars go across & various things slowly fade in so it doesn't seem like it's taking forever to boot XD ,, like i'd send the request simultaneously to both the thoughtful process & also a model that just has instructions to respond immediately echoing the user "ok so what you're saying you want is...." personally i'd even want it to be transparent about what's happening, like, say that it's looking stuff up right now, i'd think of feeding the agent that's looking busy for the user some data about how much we've retrieved and how we've processed it so far so it can say computery things like "i have discovered 8475 documents relevant to your query, and i am currently filtering and compressing them to find the most relevant information"... but you could also just fake it by pretending you have the answer and you're just a little slow at getting to the point,,, like stall for a few seconds by giving a cookiecutter disclaimer about how you're just a hapless ai :D

@wiltedblackrose 11 ай бұрын

@@mungojelly aha, cool. But this doesn't make a difference to when I use it, e.g., for studying at Uni.

@mungojelly 11 ай бұрын

@@wiltedblackrose if it's for your own use & there's no customers to offend then you could make it quick & dirty in other ways--- then i'd think of like giving random raw retrieved documents to a little cheap hallucinatey model to see if it gets lucky and can answer right away, then next get answers from progressively slower chains of reasoning,,,,, if it was for my own use i'd definitely make it so there's visual feedback about what stuff it found & what it's doing, since if i made it myself then otherwise obscure visual feedback where documents are flashing by too quickly to read or w/e would make sense to me b/c i knew exactly what it's doing

@luisjoseve 8 ай бұрын

thanks a lot, keep it up!

@zd676 11 ай бұрын

First of all, thanks for the great video! As some of the comments have rightfully, while I see some merits for offline use cases, this will be very challenging for real-time use cases. Also, I'm curious how much of a dependency this requires of the chosen LLM to understand and follow the default prompts. It seems the LLM choice and make it or break it, which is quite brittle.

@marshallmcluhan33 11 ай бұрын

Thoughts on the "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection" paper?

@samwitteveenai 11 ай бұрын

Interesting paper. I am currently traveling, but will try to make a video about the paper or show some of the ideas in a project when I get a chance

@googleyoutubechannel8554 7 ай бұрын

It seems like there's this huge disconnect in understanding of how state of the art 'RAG' works, eg. using document upload in the chatGPT 4 UI, vs all the langchain tutorials etc on RAG, I feel like the community doesn't understand that OpenAI is getting far better results, and seems to be processing embeddings in a way that's much more advanced than langchain based systems do, but that the community isn't even aware that 'langchain RAG' and 'OpenAI internal RAG' are completely different animals. eg. it seems uploaded docs are added as embeddings into a chatGPT 4 query completely orthogonally to the context window, yet all langchain examples I see end up returning text from a 'retriever and shoving this output into the llm context, I don't think good RAG even works that way...

@RunForPeace-hk1cu 11 ай бұрын

Wouldn't it be simpler if you just use a small chunk_size for the initial splitter function when you embed the documents into the vector database?

@choiswimmer 11 ай бұрын

Typo in the thumbnail. It's 4 not 5

@clray123 11 ай бұрын

So in short, in order to make the new revolutionary AI actually useful, you must meticulously hardcode the thinking it is supposed to be doing for you. Feels almost like crafting the expert systems in the 80's! Imagine the expected explosion in productivity from applying that same process! Or let the AI imagine for you (imagination is what it's really good for).

@alchemication 11 ай бұрын

Yeah. But in some cases I’ve seen, we don’t need that much sophistication and bare bones approach works well 😊 peace

@eugeneware3296 11 ай бұрын

RAG is built on retrieval. And retrieval is another word for search. Search is a very hard problem. The difficulty of searching, ranking, filtering to get a good quality set of candidate documents to reason over is underestimated. That's where the complexity. Vector search doesn't directly solve these issues. Search engines like Google has hundreds of ranking factors, including vector searches, re-ranking cross-encoding models, and quality factors. TL; DR - vector search makes for a good demo and proof of concept. For true production systems, there is a lot of complexity and engineering that's required to make these systems work in practice.

@hidroman1993 11 ай бұрын

LLMs are not the solution to any problem, as always it's the engineering part that brings the actual results

@MasterBrain182 11 ай бұрын

Astonishing content Man Sam 💯💯 Thanks to share your knowledge with us (thanks for the subtitles too 😄) Thumbs Up from Brazil 👍👍👍

@mungojelly 11 ай бұрын

hm when you were going over those instructions that are like, don't change the text, don't do it, repeat it the same, & it's hard to convince it to write the same text out ,, i thought, like, why make it then? if we just like numbered the sentences then it could just respond w/ the numbers of which sentences to include, or smth, maybe that'd save output tokens as well as not give it any chance to imagine things

@foobars3816 10 ай бұрын

13:09 Sounds like you should be using an llm to narrow down that prompt for each case

@moonly3781 6 ай бұрын

Thank you for the amazing tutorial! I was wondering, instead of using ChatOpenAi, how can I utilize a llama 2 model locally? Specifically, I couldn't find any implementation, for example, for contextual compression, where you pass compressor = LLMChainExtractor.from_llm(llm) with the ChatOpenAi (llm). How can I achieve this locally with llama 2? My use case involves private documents, so I'm looking for solutions using open-source LLMS.

@fatimazohramoulelkhail1286 2 ай бұрын

i'm facing same problem , i'm wondering if u've found any solutions?