How to Make RAG Chatbots FAST

  Рет қаралды 38,666

James Briggs

James Briggs

Күн бұрын

Пікірлер: 72
@WinsonDabbles
@WinsonDabbles Жыл бұрын
I really appreciate that all if not most of your collabs dont use langchain at all. Really like to see what goes on under the hood to learn from a first principals perspective.
@hughesadam87
@hughesadam87 10 ай бұрын
These videos are such a high quality collection of content for app developers in the AI space who are building apps and not AI experts (nor really care about the AI itself, just wanting to use it)
@xflory26x
@xflory26x Жыл бұрын
Been anticipating this video since seeing the notebook on your github! Thank you so much for your detailed explanations! Would be keen to see your implementations of NeMo Guardrail's moderation pipelines :)
@chrismcdannel3908
@chrismcdannel3908 11 ай бұрын
Great dissection on the "wrapper" visualization to simplify the relationship between the Agent and the Model. I'm going to borrow it, with backlinks of course. Oh, and thanks for the composure in your thumbnails my guy. It's nice to see some professionalism getting the merit it deserves instead of some assclown with his jaw on the floor and pulling his hair up like a chimp on drugs. Classy AF bro. Keep up the good work.
@realCleanK
@realCleanK 6 ай бұрын
Really appreciate you puttin this together 🙏
@cmars7845
@cmars7845 Жыл бұрын
Thanks for the intro to NeMo Guardrails! I kept expecting you to say tools like ... 'google' ... but you seemed to pause and then not say it 😂
@AlgoTradingX
@AlgoTradingX Жыл бұрын
You glowed up like crazy in your video content ! It so cool !!!!
@jamesbriggs
@jamesbriggs Жыл бұрын
thanks Sajid - means a lot coming from you :)
@shortthrow434
@shortthrow434 Жыл бұрын
Excellent thank you James.
@jamesbriggs
@jamesbriggs Жыл бұрын
you're welcome!
@pavellegkodymov4295
@pavellegkodymov4295 Жыл бұрын
Thanks, James, very very useful. Will try to include guardrails into our corporate RAG chat bot.
@uctran1169
@uctran1169 11 ай бұрын
Can you make a video tutorial on creating data from wikipedia?
@joshualee6559
@joshualee6559 11 ай бұрын
I want to build this for my research lab so that we can query information about our protocols, standards, etc. This seems really useful. I presume it wouldn't be that hard to then embed it into a slack chatbot?
@yarikbratashchuk3386
@yarikbratashchuk3386 11 ай бұрын
Would this approach work if vecorized data was shop inventory? And the question was like how many items fo you have? Or about specifics about a group of items?
@ylazerson
@ylazerson Жыл бұрын
Great video as always!
@jamesbriggs
@jamesbriggs Жыл бұрын
Thanks as always!
@drwho8576
@drwho8576 11 ай бұрын
Excellent video as always. Thanks for sharing. Is there a way to setup Colang for an "anything but" scenario? So far, I only seem to be able to program what to detect for a workflow. But can I setup a 'default deny' type thing? Anything different than the topic my bot is designed to handle returns an "I'm sorry, Dave. I'm afraid I can't do that"...
@RichardHamnett
@RichardHamnett 11 ай бұрын
Brilliant mate, also don't forget this could be a massive cost optimizer along with speed :)
@sandorkonya
@sandorkonya 11 ай бұрын
Thank you for the super video. I wonder how can we do chain-of-thoughts (COT) or tree-of-thoughts with Guardrails without langchain?
@unperfectbryce
@unperfectbryce 10 ай бұрын
can't you just do knn with your embeddings to make sure query isn't out of distribution, isn't this a pretty quick euclidean distance operation? why bother with guardrails? Thanks for the great video! keep it up.
@user-hh9do9fn1o
@user-hh9do9fn1o 2 ай бұрын
1.Not all queries are straightforward. Complex queries might need more nuanced understanding and contextual analysis which KNN might not handle well 2.Guardrails can adapt to new rules and policies quickly, while KNN models might need retraining with new data 3.Guardrails can provide more interpretable reasons for why a query is out-of-distribution or not appropriate, aiding in better understanding and transparency however using both of these together might be more robust.
@aravindudupa957
@aravindudupa957 Жыл бұрын
What is the difference in accuracy between reasoning (whether to retrieve) using embeddeding similarity vs giving it to an llm?
@mobime6682
@mobime6682 7 ай бұрын
Great show thank you. Question -t seems awfully similar to your (more recent?) videos about semantic router , or have I got the wrong end of the stick. I know I should do a similarity search on the text for each I guess 😉! Thanks again.
@shaheerzaman620
@shaheerzaman620 Жыл бұрын
awesome video James!
@jamesbriggs
@jamesbriggs Жыл бұрын
Thanks Shaheer!
@user-jj2mo5sl7p
@user-jj2mo5sl7p Жыл бұрын
useful work!
@fabianaltendorfer11
@fabianaltendorfer11 11 ай бұрын
great video! any Idea how to deal with screenshots in the documents?
@RichardBurgmann
@RichardBurgmann 11 ай бұрын
Hi James, Enjoying your series greatly. Question or suggestion for a future video, I've been seeing a lot of articles on the use of graph data structures to build knowledge graphs to address issues such as hallucinations and weaknesses in logical reasoning in LLMs'. I've only found one person who has actually done this and they had mixed results as far as addressing these issues. Wondering what your experience has been in this area? Do you have an opinion? From what I can see there is not much evidence (yet) that it is a better result than well crafted semantic search.
@jamesbriggs
@jamesbriggs 11 ай бұрын
I never tried myself, but everyone I know who tried said it was hard to do and the results were either the same as or worse than using vector search - so haven't had much reason to look into it Maybe at some point if I see it useful for a particular use-case, and it makes sense to use it given trade-offs, I'll try it out
@elrecreoadan878
@elrecreoadan878 10 ай бұрын
When should one opt to RAG, fine tune or just a Botpress knowledge base linked to chatgpt? Thank you !
@ThangTran-rj8gt
@ThangTran-rj8gt 11 ай бұрын
Hey! I am researching the topic of answering questions from an open-domain, so how can I get data from that domain? Thank you
@andriusem
@andriusem Жыл бұрын
This what I was searching for! Thanks James your videos are very informative and easy to follow with google colab! My question would be: Can we use extracted information from vector db to analyse by LLM and provide insights or compare different documents using guardraills or Agent? Thanks, keep a great work!
@jamesbriggs
@jamesbriggs Жыл бұрын
It depends on what you’re comparing, but I see no reason as to why it couldn’t work! You can select an existing doc at random, perform a semantic search for similar docs and feed them into your LLM with instructions on what you’re comparing - there may be other ways of doing it too - I hope that helps!
@OlivierEble
@OlivierEble Жыл бұрын
I want to start using RAG but I want something fully local. What could be an alternative to pinecone?
@rabomeister
@rabomeister Жыл бұрын
Except pinecone, almost all of the vectorstores are opensource. Also, I dont know about pinecone since its not free, but others are mostly similar. I use chromadb for my personal projects since I started working on LLMs recently and it is very user friendly. You will handle it, the problematic part is data.
@satyamwarghat1305
@satyamwarghat1305 Жыл бұрын
Use Deeplake I have been using it for my projects it is pretty good
@jamesbriggs
@jamesbriggs Жыл бұрын
Yeah if you want fully local there are open source alternatives like qdrant or weaviate - for the comment above, Pinecone is free, they have the free/standard tier :)
@drwho8576
@drwho8576 11 ай бұрын
Using pgvector here, directly on top of good-ol Postgres. Works like a charm.
@guanjwcn
@guanjwcn Жыл бұрын
Thanks very much for the sharing, James. May I seek your advice on how I can estimate infrastructure requirements eg number of GPUs, assuming I need to host an open source model with size of 70B on premise and the number of concurrent users being 1000 at most? Thank you very much.
@jamesbriggs
@jamesbriggs Жыл бұрын
You can calculate number of parameters * bytes required for data type of each parameter - people do keep asking about this so I think I can go into more detail in a future video
@reknine
@reknine 11 ай бұрын
Would really appreciate that!@@jamesbriggs
@rabomeister
@rabomeister Жыл бұрын
What do you think about the accuracy and other related metrics while using guardrails? It really sounds nice, but if you use LLMs on fields with high risks (finance), does it promise accuracy also, at least similar to standard approaches? Great videos by the way, I guess i implemented almost all of them. And always nice to learn from a professional.
@rabomeister
@rabomeister Жыл бұрын
Also, (if you are ok with that since you also work for a company), if you can make a video about the hardware side of the LLMs and DBs, that would be great. Because at some point, there are enough information about coding and software (of course, not enough yet, but one can implement something somehow), but hardware side really requires theoretical knowledge. I dont want to just check the tables and go buy some NVIDIA GPU, I want to know why. Thanks in advance.
@jamesbriggs
@jamesbriggs Жыл бұрын
It’s hard to guarantee accuracy, LLM and the broader field of NLP is generally non-deterministic so there’s always that level of randomness, I’m still figuring out the best way of dealing with it myself - we try to add metrics, or extra LLM analysis steps (like asking “is this answer using information from these sources…”) - but it’s a difficult problem I like the GPU hardware idea, would love to jump into it
@aravindudupa957
@aravindudupa957 Жыл бұрын
@James Are there any good "deterministic" ways to check the accuracy of information (by going through the reply and checking for eg) in the reply to that in the context? I've heard of Self-check GPT which takes multiple iterations but it's not 'deterministic'. It would be great to have such a technique!
@chrismcdannel3908
@chrismcdannel3908 11 ай бұрын
@@rabomeister outside of highly specialized & sensitive use cases requiring procurement of a commercial grade GPU or TPU, and the talent & skill to use it effectively in a business process, there is no real advantage in spending $15-$20K or more on the HW unless you just have the insatiable desire & urge to do it for the hell of it and because you want to have your own and that's ok too my friend. Unfortunately the cloud giants, have structured the market in a way that makes getting compute from them is still more economically prudent than buying even 1 of the ASICs they have hundreds of thousands or millions of.
@georgekokkinakis7288
@georgekokkinakis7288 Жыл бұрын
Very informative video. Thanks. Is there any chance that you know any open sourced LLM that supports the Greek language for retrieval augmented generation?
@jamesbriggs
@jamesbriggs Жыл бұрын
cohere have a multilingual embedding model - it probably covers Greek, there will also be multilingual sentence transformers you can use too :)
@georgekokkinakis7288
@georgekokkinakis7288 Жыл бұрын
Thanks for your response @jamesbriggs . For the embedding part I have found a multilingual model which does an excellent job in retrieving the document which is more relevant to the placed question. What I can not find is an open sourced LLM for the generation part which will generate the answer to the user's query based on the retrieved document (I am talking for the Greek language). OpenAI tokenizer is very expensive since, from what I have noticed, it tokenizes the greek words to character level. So using their model does not fit to my task at hand. Any ways , if you ever notice any generative model which supports Greek please mention it to your upcoming videos, which by the way I have to say that they have helped me a lot.
@ashraymallesh2856
@ashraymallesh2856 11 ай бұрын
@@georgekokkinakis7288 what about doing the RAG pipeline in english and then translating to greek for your users? :P
@georgekokkinakis7288
@georgekokkinakis7288 11 ай бұрын
@@ashraymallesh2856 If I am not mistaken, please correct me if I am wrong, applying the RAG pipeline in English would require first to translate the documents from Greek to English. As I mentioned in a previous post the documents contain mathematical definitions and terminology. Using a translation model or google translate api wouldn't work because, for example, google translate translates the words παραπληρωματικές , συμπληρωματικές, both as supplementary which is not correct. On the other hand translate all the documents by a human would be a tedious task. That's why I am looking for an open sourced LLM which supports the Greek language. Any ideas are welcome 😁.
@humayounkhan7946
@humayounkhan7946 Жыл бұрын
This is awesome, thanks James, out of curiosity, do you know if this can be integrated with langchain?
@jamesbriggs
@jamesbriggs Жыл бұрын
absolutely, Langchain is code, and we can execute code via actions like we did with our RAG pipeline here
@kaustubhnegi1838
@kaustubhnegi1838 3 ай бұрын
🎯 Key points for quick navigation: 00:00 *🔍 Introduction to retrieval augmented generation with guardrails for building chatbots.* 00:27 *📂 Utilizing vector database (Pinecone), embedding model (RoBERTa), and documents for retrieval.* 00:54 *🕸️ Two traditional approaches to RAG: naive approach and agent approach.* 02:25 *⌛ Agent approach is slower but potentially more powerful with multiple thoughts and external tools.* 05:23 *🛡️ Guardrails approach: Directly embedding query, checking similarity with defined guardrails, and triggering retrieval tool if needed.* 07:42 *🧩 Guardrails approach combines query and retrieved context, then passes to language model for answer generation.* 08:23 *⚡ Guardrails approach is significantly faster than agent approach while still allowing tool usage.* 09:03 *📋 Step-by-step implementation details, including data indexing, embedding, and Pinecone setup.* 13:12 *🔄 Defining retrieve and RAG functions as guard actions for guardrails.* 14:46 *🚫 Guardrails config to avoid talking about politics.* 15:15 *🤖 Defining guardrail for user asking about LLMs to trigger RAG pipeline.* 17:10 *🔥 Demonstrating RAG pipeline via guardrails, showing its effectiveness in answering LLM-related queries.* 18:04 *🆚 Comparing guardrails without RAG, which lacks information for LLM-related queries.* 19:55 *💡 Guardrails approach allows agent-like tool usage without slow initial LM call, making it faster for triggered tools.*
Жыл бұрын
Have you tried to setup this with gpt-4? I'm getting some errors switching from davinci to gpt-4
@jamesbriggs
@jamesbriggs 11 ай бұрын
Hey Andre! I usually avoid generating output with the built-in LLM function, I usually just use guardrails as a mid decision layer and then use actions to call LLMs like GPT4
@user-ib1st1tm9w
@user-ib1st1tm9w Жыл бұрын
How to use guardrail and RAG with other LLM? Like falcon or Llama?
@jamesbriggs
@jamesbriggs Жыл бұрын
You can modify the model provider and name in the config.yaml file - they have docs on it in the guardrails GitHub repo :)
@AbhayKumar-yh9zs
@AbhayKumar-yh9zs 9 ай бұрын
For implementing lagchain agents with NemoGuardrails do we need to do below? in the colang file first define the action which is calling the function which has the agent execution like this $answer = execute custom_function(query=$last_user_message) and then we register the tails like ? rag_rails.register_action(action=custom_function, name="custom_function") Am I on right track?
@deter3
@deter3 Жыл бұрын
This method is very simple to talk using toy example , but you need lots of hard work in the real business enviorment to build and test whether it's really working or not . Using simple sentences + embedding distance for decision making is not really a reliable solution .
@jamesbriggs
@jamesbriggs Жыл бұрын
I use it in production, it can be more reliable at times than LLMs if you define the semantic vector space that should trigger an action well - typically I view prompt engineering as the broad stroke, and guardrails as the fine-tuning of your chatbot behavior, so when you specific RAG workflows like "refer to HR docs", "refer to eng docs", "refer to company Y DB", guardrails can be very helpful But you're very right, it needs a lot of work, testing, and iterating over the guardrails to get something reliable
@eightrice
@eightrice Жыл бұрын
does this have message history? Does the context carry over from one input to the next?
@jamesbriggs
@jamesbriggs Жыл бұрын
In this example no, but you can bring in a few previous interactions for embedding
@eightrice
@eightrice Жыл бұрын
@@jamesbriggs why would you use embeddings on the previous interactions? Can you just use the ChatCompletion endpoint and pass the array of previous messages as `chat_history` ?
@jamesbriggs
@jamesbriggs Жыл бұрын
@@eightrice ​ ChatCompletion endpoint is more effective, and is what you do for the "agent approach to RAG" - it's just slower. In real-world use-cases I have always used the pure agent approach, but I recently begun experimenting with a mix of both, so I try to capture obvious queries ("user asks about LLMs") with guardrails and send the single query direct to the RAG pipeline, but for more general purpose queries I direct them to the typical agent endpoint (and include conversation history) I'm still experimenting with the best approach, but so far this system seems to be working well for speeding up a reasonable portion of queries
@eightrice
@eightrice Жыл бұрын
@@jamesbriggs yup, that hybrid architecture seems optimal if you need both normal chatbot functionality and subject matter knowledge with low latency. Thank you so much for this, I feel like I should be paying a lot for your code and tutorials :)
@jamesbriggs
@jamesbriggs Жыл бұрын
@@eightrice yeah so far I've liked this approach - haha no worries, I'm happy it's useful :)
@EarningsNest
@EarningsNest Жыл бұрын
Did u smoke something before recording this ?
@jamesbriggs
@jamesbriggs Жыл бұрын
I have a relaxed nature 😂
@chrisalmighty
@chrisalmighty Жыл бұрын
😂😂😂Hilarious
@prashanthsai3441
@prashanthsai3441 10 ай бұрын
Why should I use guardrails? @jamesbriggs I have dialogflow which has all the intents and flows (like in colang file) - I will check the intent confidence and if it is high then I will trigger the corresponding intent flow and if it is low then I will retrieve the data from the source using naive retrieval method?s
ChatGPT Plugins: Build Your Own in Python!
41:06
James Briggs
Рет қаралды 144 М.
Chatbots with RAG: LangChain Full Walkthrough
35:53
James Briggs
Рет қаралды 140 М.
هذه الحلوى قد تقتلني 😱🍬
00:22
Cool Tool SHORTS Arabic
Рет қаралды 95 МЛН
АЗАРТНИК 4 |СЕЗОН 1 Серия
40:47
Inter Production
Рет қаралды 1,3 МЛН
RAG But Better: Rerankers with Cohere AI
23:43
James Briggs
Рет қаралды 58 М.
NVIDIA NeMo Guardrails: Full Walkthrough for Chatbots / AI
21:10
James Briggs
Рет қаралды 26 М.
Intro to RAG for AI (Retrieval Augmented Generation)
14:31
Matthew Berman
Рет қаралды 56 М.
Python RAG Tutorial (with Local LLMs): AI For Your PDFs
21:33
pixegami
Рет қаралды 223 М.
All You Need To Know About Running LLMs Locally
10:30
bycloud
Рет қаралды 148 М.
How to Improve LLMs with RAG (Overview + Python Code)
21:41
Shaw Talebi
Рет қаралды 55 М.
How to set up RAG - Retrieval Augmented Generation (demo)
19:52
Don Woodlock
Рет қаралды 27 М.
ADVANCED Python AI Agent Tutorial - Using RAG
40:59
Tech With Tim
Рет қаралды 140 М.
Semantic Chunking for RAG
29:56
James Briggs
Рет қаралды 22 М.