How to Make RAG Chatbots FAST

Рет қаралды 38,666

Күн бұрын

Пікірлер: 72

@WinsonDabbles Жыл бұрын

I really appreciate that all if not most of your collabs dont use langchain at all. Really like to see what goes on under the hood to learn from a first principals perspective.

@hughesadam87 10 ай бұрын

These videos are such a high quality collection of content for app developers in the AI space who are building apps and not AI experts (nor really care about the AI itself, just wanting to use it)

@xflory26x Жыл бұрын

Been anticipating this video since seeing the notebook on your github! Thank you so much for your detailed explanations! Would be keen to see your implementations of NeMo Guardrail's moderation pipelines :)

@chrismcdannel3908 11 ай бұрын

Great dissection on the "wrapper" visualization to simplify the relationship between the Agent and the Model. I'm going to borrow it, with backlinks of course. Oh, and thanks for the composure in your thumbnails my guy. It's nice to see some professionalism getting the merit it deserves instead of some assclown with his jaw on the floor and pulling his hair up like a chimp on drugs. Classy AF bro. Keep up the good work.

@realCleanK 6 ай бұрын

Really appreciate you puttin this together 🙏

@cmars7845 Жыл бұрын

Thanks for the intro to NeMo Guardrails! I kept expecting you to say tools like ... 'google' ... but you seemed to pause and then not say it 😂

@AlgoTradingX Жыл бұрын

You glowed up like crazy in your video content ! It so cool !!!!

@jamesbriggs Жыл бұрын

thanks Sajid - means a lot coming from you :)

@shortthrow434 Жыл бұрын

Excellent thank you James.

@jamesbriggs Жыл бұрын

you're welcome!

@pavellegkodymov4295 Жыл бұрын

Thanks, James, very very useful. Will try to include guardrails into our corporate RAG chat bot.

@uctran1169 11 ай бұрын

Can you make a video tutorial on creating data from wikipedia?

@joshualee6559 11 ай бұрын

I want to build this for my research lab so that we can query information about our protocols, standards, etc. This seems really useful. I presume it wouldn't be that hard to then embed it into a slack chatbot?

@yarikbratashchuk3386 11 ай бұрын

Would this approach work if vecorized data was shop inventory? And the question was like how many items fo you have? Or about specifics about a group of items?

@ylazerson Жыл бұрын

Great video as always!

@jamesbriggs Жыл бұрын

Thanks as always!

@drwho8576 11 ай бұрын

Excellent video as always. Thanks for sharing. Is there a way to setup Colang for an "anything but" scenario? So far, I only seem to be able to program what to detect for a workflow. But can I setup a 'default deny' type thing? Anything different than the topic my bot is designed to handle returns an "I'm sorry, Dave. I'm afraid I can't do that"...

@RichardHamnett 11 ай бұрын

Brilliant mate, also don't forget this could be a massive cost optimizer along with speed :)

@sandorkonya 11 ай бұрын

Thank you for the super video. I wonder how can we do chain-of-thoughts (COT) or tree-of-thoughts with Guardrails without langchain?

@unperfectbryce 10 ай бұрын

can't you just do knn with your embeddings to make sure query isn't out of distribution, isn't this a pretty quick euclidean distance operation? why bother with guardrails? Thanks for the great video! keep it up.

@user-hh9do9fn1o 2 ай бұрын

1.Not all queries are straightforward. Complex queries might need more nuanced understanding and contextual analysis which KNN might not handle well 2.Guardrails can adapt to new rules and policies quickly, while KNN models might need retraining with new data 3.Guardrails can provide more interpretable reasons for why a query is out-of-distribution or not appropriate, aiding in better understanding and transparency however using both of these together might be more robust.

@aravindudupa957 Жыл бұрын

What is the difference in accuracy between reasoning (whether to retrieve) using embeddeding similarity vs giving it to an llm?

@mobime6682 7 ай бұрын

Great show thank you. Question -t seems awfully similar to your (more recent?) videos about semantic router , or have I got the wrong end of the stick. I know I should do a similarity search on the text for each I guess 😉! Thanks again.

@shaheerzaman620 Жыл бұрын

awesome video James!

@jamesbriggs Жыл бұрын

Thanks Shaheer!

@user-jj2mo5sl7p Жыл бұрын

useful work!

@fabianaltendorfer11 11 ай бұрын

great video! any Idea how to deal with screenshots in the documents?

@RichardBurgmann 11 ай бұрын

Hi James, Enjoying your series greatly. Question or suggestion for a future video, I've been seeing a lot of articles on the use of graph data structures to build knowledge graphs to address issues such as hallucinations and weaknesses in logical reasoning in LLMs'. I've only found one person who has actually done this and they had mixed results as far as addressing these issues. Wondering what your experience has been in this area? Do you have an opinion? From what I can see there is not much evidence (yet) that it is a better result than well crafted semantic search.

@jamesbriggs 11 ай бұрын

I never tried myself, but everyone I know who tried said it was hard to do and the results were either the same as or worse than using vector search - so haven't had much reason to look into it Maybe at some point if I see it useful for a particular use-case, and it makes sense to use it given trade-offs, I'll try it out

@elrecreoadan878 10 ай бұрын

When should one opt to RAG, fine tune or just a Botpress knowledge base linked to chatgpt? Thank you !

@ThangTran-rj8gt 11 ай бұрын

Hey! I am researching the topic of answering questions from an open-domain, so how can I get data from that domain? Thank you

@andriusem Жыл бұрын

This what I was searching for! Thanks James your videos are very informative and easy to follow with google colab! My question would be: Can we use extracted information from vector db to analyse by LLM and provide insights or compare different documents using guardraills or Agent? Thanks, keep a great work!

@jamesbriggs Жыл бұрын

It depends on what you’re comparing, but I see no reason as to why it couldn’t work! You can select an existing doc at random, perform a semantic search for similar docs and feed them into your LLM with instructions on what you’re comparing - there may be other ways of doing it too - I hope that helps!

@OlivierEble Жыл бұрын

I want to start using RAG but I want something fully local. What could be an alternative to pinecone?

@rabomeister Жыл бұрын

Except pinecone, almost all of the vectorstores are opensource. Also, I dont know about pinecone since its not free, but others are mostly similar. I use chromadb for my personal projects since I started working on LLMs recently and it is very user friendly. You will handle it, the problematic part is data.

@satyamwarghat1305 Жыл бұрын

Use Deeplake I have been using it for my projects it is pretty good

@jamesbriggs Жыл бұрын

Yeah if you want fully local there are open source alternatives like qdrant or weaviate - for the comment above, Pinecone is free, they have the free/standard tier :)

@drwho8576 11 ай бұрын

Using pgvector here, directly on top of good-ol Postgres. Works like a charm.

@guanjwcn Жыл бұрын

Thanks very much for the sharing, James. May I seek your advice on how I can estimate infrastructure requirements eg number of GPUs, assuming I need to host an open source model with size of 70B on premise and the number of concurrent users being 1000 at most? Thank you very much.

@jamesbriggs Жыл бұрын

You can calculate number of parameters * bytes required for data type of each parameter - people do keep asking about this so I think I can go into more detail in a future video

@reknine 11 ай бұрын

Would really appreciate that!@@jamesbriggs

@rabomeister Жыл бұрын

What do you think about the accuracy and other related metrics while using guardrails? It really sounds nice, but if you use LLMs on fields with high risks (finance), does it promise accuracy also, at least similar to standard approaches? Great videos by the way, I guess i implemented almost all of them. And always nice to learn from a professional.

@rabomeister Жыл бұрын

Also, (if you are ok with that since you also work for a company), if you can make a video about the hardware side of the LLMs and DBs, that would be great. Because at some point, there are enough information about coding and software (of course, not enough yet, but one can implement something somehow), but hardware side really requires theoretical knowledge. I dont want to just check the tables and go buy some NVIDIA GPU, I want to know why. Thanks in advance.

@jamesbriggs Жыл бұрын

It’s hard to guarantee accuracy, LLM and the broader field of NLP is generally non-deterministic so there’s always that level of randomness, I’m still figuring out the best way of dealing with it myself - we try to add metrics, or extra LLM analysis steps (like asking “is this answer using information from these sources…”) - but it’s a difficult problem I like the GPU hardware idea, would love to jump into it

@aravindudupa957 Жыл бұрын

@James Are there any good "deterministic" ways to check the accuracy of information (by going through the reply and checking for eg) in the reply to that in the context? I've heard of Self-check GPT which takes multiple iterations but it's not 'deterministic'. It would be great to have such a technique!

@chrismcdannel3908 11 ай бұрын

@@rabomeister outside of highly specialized & sensitive use cases requiring procurement of a commercial grade GPU or TPU, and the talent & skill to use it effectively in a business process, there is no real advantage in spending $15-$20K or more on the HW unless you just have the insatiable desire & urge to do it for the hell of it and because you want to have your own and that's ok too my friend. Unfortunately the cloud giants, have structured the market in a way that makes getting compute from them is still more economically prudent than buying even 1 of the ASICs they have hundreds of thousands or millions of.

@georgekokkinakis7288 Жыл бұрын

Very informative video. Thanks. Is there any chance that you know any open sourced LLM that supports the Greek language for retrieval augmented generation?

@jamesbriggs Жыл бұрын

cohere have a multilingual embedding model - it probably covers Greek, there will also be multilingual sentence transformers you can use too :)

@georgekokkinakis7288 Жыл бұрын

Thanks for your response @jamesbriggs . For the embedding part I have found a multilingual model which does an excellent job in retrieving the document which is more relevant to the placed question. What I can not find is an open sourced LLM for the generation part which will generate the answer to the user's query based on the retrieved document (I am talking for the Greek language). OpenAI tokenizer is very expensive since, from what I have noticed, it tokenizes the greek words to character level. So using their model does not fit to my task at hand. Any ways , if you ever notice any generative model which supports Greek please mention it to your upcoming videos, which by the way I have to say that they have helped me a lot.

@ashraymallesh2856 11 ай бұрын

@@georgekokkinakis7288 what about doing the RAG pipeline in english and then translating to greek for your users? :P

@georgekokkinakis7288 11 ай бұрын

@@ashraymallesh2856 If I am not mistaken, please correct me if I am wrong, applying the RAG pipeline in English would require first to translate the documents from Greek to English. As I mentioned in a previous post the documents contain mathematical definitions and terminology. Using a translation model or google translate api wouldn't work because, for example, google translate translates the words παραπληρωματικές , συμπληρωματικές, both as supplementary which is not correct. On the other hand translate all the documents by a human would be a tedious task. That's why I am looking for an open sourced LLM which supports the Greek language. Any ideas are welcome 😁.

@humayounkhan7946 Жыл бұрын

This is awesome, thanks James, out of curiosity, do you know if this can be integrated with langchain?

@jamesbriggs Жыл бұрын

absolutely, Langchain is code, and we can execute code via actions like we did with our RAG pipeline here

@kaustubhnegi1838 3 ай бұрын

🎯 Key points for quick navigation: 00:00 *🔍 Introduction to retrieval augmented generation with guardrails for building chatbots.* 00:27 *📂 Utilizing vector database (Pinecone), embedding model (RoBERTa), and documents for retrieval.* 00:54 *🕸️ Two traditional approaches to RAG: naive approach and agent approach.* 02:25 *⌛ Agent approach is slower but potentially more powerful with multiple thoughts and external tools.* 05:23 *🛡️ Guardrails approach: Directly embedding query, checking similarity with defined guardrails, and triggering retrieval tool if needed.* 07:42 *🧩 Guardrails approach combines query and retrieved context, then passes to language model for answer generation.* 08:23 *⚡ Guardrails approach is significantly faster than agent approach while still allowing tool usage.* 09:03 *📋 Step-by-step implementation details, including data indexing, embedding, and Pinecone setup.* 13:12 *🔄 Defining retrieve and RAG functions as guard actions for guardrails.* 14:46 *🚫 Guardrails config to avoid talking about politics.* 15:15 *🤖 Defining guardrail for user asking about LLMs to trigger RAG pipeline.* 17:10 *🔥 Demonstrating RAG pipeline via guardrails, showing its effectiveness in answering LLM-related queries.* 18:04 *🆚 Comparing guardrails without RAG, which lacks information for LLM-related queries.* 19:55 *💡 Guardrails approach allows agent-like tool usage without slow initial LM call, making it faster for triggered tools.*

Жыл бұрын

Have you tried to setup this with gpt-4? I'm getting some errors switching from davinci to gpt-4

@jamesbriggs 11 ай бұрын

Hey Andre! I usually avoid generating output with the built-in LLM function, I usually just use guardrails as a mid decision layer and then use actions to call LLMs like GPT4

@user-ib1st1tm9w Жыл бұрын

How to use guardrail and RAG with other LLM? Like falcon or Llama?

@jamesbriggs Жыл бұрын

You can modify the model provider and name in the config.yaml file - they have docs on it in the guardrails GitHub repo :)

@AbhayKumar-yh9zs 9 ай бұрын

For implementing lagchain agents with NemoGuardrails do we need to do below? in the colang file first define the action which is calling the function which has the agent execution like this $answer = execute custom_function(query=$last_user_message) and then we register the tails like ? rag_rails.register_action(action=custom_function, name="custom_function") Am I on right track?

@deter3 Жыл бұрын

This method is very simple to talk using toy example , but you need lots of hard work in the real business enviorment to build and test whether it's really working or not . Using simple sentences + embedding distance for decision making is not really a reliable solution .

@jamesbriggs Жыл бұрын

I use it in production, it can be more reliable at times than LLMs if you define the semantic vector space that should trigger an action well - typically I view prompt engineering as the broad stroke, and guardrails as the fine-tuning of your chatbot behavior, so when you specific RAG workflows like "refer to HR docs", "refer to eng docs", "refer to company Y DB", guardrails can be very helpful But you're very right, it needs a lot of work, testing, and iterating over the guardrails to get something reliable

@eightrice Жыл бұрын

does this have message history? Does the context carry over from one input to the next?

@jamesbriggs Жыл бұрын

In this example no, but you can bring in a few previous interactions for embedding

@eightrice Жыл бұрын

@@jamesbriggs why would you use embeddings on the previous interactions? Can you just use the ChatCompletion endpoint and pass the array of previous messages as `chat_history` ?

@jamesbriggs Жыл бұрын

@@eightrice ChatCompletion endpoint is more effective, and is what you do for the "agent approach to RAG" - it's just slower. In real-world use-cases I have always used the pure agent approach, but I recently begun experimenting with a mix of both, so I try to capture obvious queries ("user asks about LLMs") with guardrails and send the single query direct to the RAG pipeline, but for more general purpose queries I direct them to the typical agent endpoint (and include conversation history) I'm still experimenting with the best approach, but so far this system seems to be working well for speeding up a reasonable portion of queries

@eightrice Жыл бұрын

@@jamesbriggs yup, that hybrid architecture seems optimal if you need both normal chatbot functionality and subject matter knowledge with low latency. Thank you so much for this, I feel like I should be paying a lot for your code and tutorials :)

@jamesbriggs Жыл бұрын

@@eightrice yeah so far I've liked this approach - haha no worries, I'm happy it's useful :)

@EarningsNest Жыл бұрын

Did u smoke something before recording this ?

@jamesbriggs Жыл бұрын

I have a relaxed nature 😂

@chrisalmighty Жыл бұрын

😂😂😂Hilarious

@prashanthsai3441 10 ай бұрын

Why should I use guardrails? @jamesbriggs I have dialogflow which has all the intents and flows (like in colang file) - I will check the intent confidence and if it is high then I will trigger the corresponding intent flow and if it is low then I will retrieve the data from the source using naive retrieval method?s