Great content. Thanks for taking the time to make such videos. I've been learning a lot from them.
@TheAmit4sun11 ай бұрын
I have found Filters that give yes and not to be of not much of a help. for example, i have embedding of tech docs and then embeddings of order processing system. When filter is set and prompt is submitted with a random query like "can i order pizza with it?" model thinks the context is related to order processing and returns YES which is totally wrong.
@henkhbit574811 ай бұрын
Thanks for the video about finetuning RAG. Personally I think the solution of Self-RAG is more generic because its embedded in the LLM...
@micbab-vg2mu11 ай бұрын
Thank you for another great video:)
@billykotsos464211 ай бұрын
So instead of using an 'Extractive QA model' you prompt an LLM into doing the same thing... amazing how flexible these LLMs are... in this case you are basing your hopes on the models 'reasoning'....
@clray12311 ай бұрын
As long as someone else pays for it...
@wiltedblackrose11 ай бұрын
This is really interesting. My only worry is that this makes it prohibitively slow. The longest part of RAG is often the call to the LLM. I'd be interesting if you could review some companies which have faster models than OpenAI but still have decent performance.
@mungojelly11 ай бұрын
if i was making a chatbot & needed it to not lag before responding, i'd just fake it,,, like how windows has twelve different bars go across & various things slowly fade in so it doesn't seem like it's taking forever to boot XD ,, like i'd send the request simultaneously to both the thoughtful process & also a model that just has instructions to respond immediately echoing the user "ok so what you're saying you want is...." personally i'd even want it to be transparent about what's happening, like, say that it's looking stuff up right now, i'd think of feeding the agent that's looking busy for the user some data about how much we've retrieved and how we've processed it so far so it can say computery things like "i have discovered 8475 documents relevant to your query, and i am currently filtering and compressing them to find the most relevant information"... but you could also just fake it by pretending you have the answer and you're just a little slow at getting to the point,,, like stall for a few seconds by giving a cookiecutter disclaimer about how you're just a hapless ai :D
@wiltedblackrose11 ай бұрын
@@mungojelly aha, cool. But this doesn't make a difference to when I use it, e.g., for studying at Uni.
@mungojelly11 ай бұрын
@@wiltedblackrose if it's for your own use & there's no customers to offend then you could make it quick & dirty in other ways--- then i'd think of like giving random raw retrieved documents to a little cheap hallucinatey model to see if it gets lucky and can answer right away, then next get answers from progressively slower chains of reasoning,,,,, if it was for my own use i'd definitely make it so there's visual feedback about what stuff it found & what it's doing, since if i made it myself then otherwise obscure visual feedback where documents are flashing by too quickly to read or w/e would make sense to me b/c i knew exactly what it's doing
@luisjoseve8 ай бұрын
thanks a lot, keep it up!
@zd67611 ай бұрын
First of all, thanks for the great video! As some of the comments have rightfully, while I see some merits for offline use cases, this will be very challenging for real-time use cases. Also, I'm curious how much of a dependency this requires of the chosen LLM to understand and follow the default prompts. It seems the LLM choice and make it or break it, which is quite brittle.
@marshallmcluhan3311 ай бұрын
Thoughts on the "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection" paper?
@samwitteveenai11 ай бұрын
Interesting paper. I am currently traveling, but will try to make a video about the paper or show some of the ideas in a project when I get a chance
@googleyoutubechannel85547 ай бұрын
It seems like there's this huge disconnect in understanding of how state of the art 'RAG' works, eg. using document upload in the chatGPT 4 UI, vs all the langchain tutorials etc on RAG, I feel like the community doesn't understand that OpenAI is getting far better results, and seems to be processing embeddings in a way that's much more advanced than langchain based systems do, but that the community isn't even aware that 'langchain RAG' and 'OpenAI internal RAG' are completely different animals. eg. it seems uploaded docs are added as embeddings into a chatGPT 4 query completely orthogonally to the context window, yet all langchain examples I see end up returning text from a 'retriever and shoving this output into the llm context, I don't think good RAG even works that way...
@RunForPeace-hk1cu11 ай бұрын
Wouldn't it be simpler if you just use a small chunk_size for the initial splitter function when you embed the documents into the vector database?
@choiswimmer11 ай бұрын
Typo in the thumbnail. It's 4 not 5
@clray12311 ай бұрын
So in short, in order to make the new revolutionary AI actually useful, you must meticulously hardcode the thinking it is supposed to be doing for you. Feels almost like crafting the expert systems in the 80's! Imagine the expected explosion in productivity from applying that same process! Or let the AI imagine for you (imagination is what it's really good for).
@alchemication11 ай бұрын
Yeah. But in some cases I’ve seen, we don’t need that much sophistication and bare bones approach works well 😊 peace
@eugeneware329611 ай бұрын
RAG is built on retrieval. And retrieval is another word for search. Search is a very hard problem. The difficulty of searching, ranking, filtering to get a good quality set of candidate documents to reason over is underestimated. That's where the complexity. Vector search doesn't directly solve these issues. Search engines like Google has hundreds of ranking factors, including vector searches, re-ranking cross-encoding models, and quality factors. TL; DR - vector search makes for a good demo and proof of concept. For true production systems, there is a lot of complexity and engineering that's required to make these systems work in practice.
@hidroman199311 ай бұрын
LLMs are not the solution to any problem, as always it's the engineering part that brings the actual results
@MasterBrain18211 ай бұрын
Astonishing content Man Sam 💯💯 Thanks to share your knowledge with us (thanks for the subtitles too 😄) Thumbs Up from Brazil 👍👍👍
@mungojelly11 ай бұрын
hm when you were going over those instructions that are like, don't change the text, don't do it, repeat it the same, & it's hard to convince it to write the same text out ,, i thought, like, why make it then? if we just like numbered the sentences then it could just respond w/ the numbers of which sentences to include, or smth, maybe that'd save output tokens as well as not give it any chance to imagine things
@foobars381610 ай бұрын
13:09 Sounds like you should be using an llm to narrow down that prompt for each case
@moonly37816 ай бұрын
Thank you for the amazing tutorial! I was wondering, instead of using ChatOpenAi, how can I utilize a llama 2 model locally? Specifically, I couldn't find any implementation, for example, for contextual compression, where you pass compressor = LLMChainExtractor.from_llm(llm) with the ChatOpenAi (llm). How can I achieve this locally with llama 2? My use case involves private documents, so I'm looking for solutions using open-source LLMS.
@fatimazohramoulelkhail12862 ай бұрын
i'm facing same problem , i'm wondering if u've found any solutions?
@theunknown209011 ай бұрын
Thabks for the video.
@shamikbanerjee996511 ай бұрын
Good ideas Sam 👌
@HazemAzim11 ай бұрын
Great . how about cross-encoders and re-reranking
@adriangabriel321911 ай бұрын
i use it and my experience is that it improves retrieval a lot! The out of fashion SentenceTransformers perform amazing there!
@HazemAzim11 ай бұрын
I am doing some benchmark testing on arabic datasets and the top I am getting super results with ME5 embeddings with cohere reranker
@samwitteveenai11 ай бұрын
Yes I still have a number more coming in this series.