RAG in Production - LangChain & FastAPI

Рет қаралды 12,523

Coding Crash Courses

Күн бұрын

Пікірлер: 75

@Challseus 10 ай бұрын

Thankful for channels like this that go above and beyond the standard tutorials 💪🏾

@codingcrashcourses8533 10 ай бұрын

Thanks for your motivating comment :)

@saurabhjain507 10 ай бұрын

Another helpful video! please create more videos of langchain in production

@codingcrashcourses8533 10 ай бұрын

Next monday i will release one about Monitoring with langfuse

@delgrave4786 2 ай бұрын

so i have some doubt regarding the digest generation. the code generates digest for all the document uploaded to it right? and the advantage is that same document will generate same digest which can be checked with existing digest and be excluded, I guess a better way would be to generate digest for each pages of a pdf in case a new pdf is uploaded with only single page difference? Currently the code does not handle any cases like this right? it only generates digest as part of the metadata and stores it without checking anything if i haven't missed anything?

@codingcrashcourses8533 2 ай бұрын

I generally agrew. That is a different approach since you need to use sind kind of observer pattern, which is not so easy, since you have to rely on your data Provider to offer that.

@navaneeth44 2 ай бұрын

Great content! But I have a question here, does the post method accepts large files as input?

@codingcrashcourses8533 2 ай бұрын

@@navaneeth44 no, i used application/json. But fastapi also has classes which allow to accept files. What do you mean with large? The default is not large, a few mb

@omaralhory8065 10 ай бұрын

Hi, I am following your codebase, and I really like it. I am still unsure why do we need to update the data via an API, if we can have an ETEL (Extract, Transform, Embed, Load) Data Pipeline that runs on a schedule if new data comes on. Why do we give such access to the client, + why is it an API that gives access to deleting records. What would you do differently here? Would you develop a CMS in order to maintain the relationship between the client and the db?

@codingcrashcourses8533 10 ай бұрын

You could also do it that way, but in this repo I dont have a pipeline or anything. There is more than one way to do it :-). I currently have no good solution for updating data without the additional API Layer.

@omaralhory8065 10 ай бұрын

@@codingcrashcourses8533 Thank you for being responsive! your channel is a gem btw. Usually RAGs data sources aren't predictable, maybe a data lake (delta lake by Databricks) can be quite beneficial here, you can utilize pyspark to do the pipeline and it will be great when it is connected to Airflow for example for scheduling.

@awakenwithoutcoffee 5 ай бұрын

did you find a solution ?

@pvrajanrk 9 ай бұрын

Great video. Can you add your thoughts on including state management for maintaining the chat window for the different chat sessions? This is another area I see as a gap in Langchain Production.

@codingcrashcourses8533 9 ай бұрын

I did another video about this and cover that in my Udemy course. The answer for me is Redis, where you set key value pairs. Key is the conversation id and value is the stringified conversation

@Pure_Science_and_Technology 10 ай бұрын

When processing a file for RAG, I save its name, metadata, and a unique ID in a structured database. This unique ID is also assigned to each chunk in the vector database. If a file needs updating or deleting, the unique ID in the database is used to modify or remove the corresponding entries in the vector database.

@codingcrashcourses8533 10 ай бұрын

Yes, very robust solution :)

@RobertoFabrizi 10 ай бұрын

Just to see if I understood you right, let's assume that you have a file (product catalog, functional software specification, your pick) that is a doc with 100 pages. You use a document loader to load it, then split it with a recursive character text splitter with a chunk size of 1000 and overlap of 100, then embed those chunks and store them in a vector db, saving thousands of them all created from the file to ingest. Then a single line around the start of that file changes, but that has repercussions to all later chunks that even though are technically the same data, they are partitioned differently from before (assuming that the change before them caused the chunking process to creare different chunks, maybe the modificed row is longer than before). How do you efficiently update your vector db in this scenario? Thank you!

@codingcrashcourses8533 10 ай бұрын

@@RobertoFabrizi You wont just read a whole catalog in memory at once. You should have each page seperate as raw data. Then you split each page into smaller chunks. I would even argue against a fixed chunk size, but this is something I will cover in my next (small) video.

@nicolascr181 Ай бұрын

Hello, I don't understand how to upload the document since the endpoint receives only a json format. thanks for your content

@codingcrashcourses8533 Ай бұрын

Document is just a class. It looks like this: Document(content="xxx", metadata={}) You can serialize this to: { "content": "xxx", "metadata": {} } then you have the JSON format. Code looks like this (dummy implementation): class Document: def __init__(self, content, metadata): self.content = content self.metadata = metadata def to_dict(self): return { "content": self.content, "metadata": self.metadata } Does this help? :)

@zendr0 10 ай бұрын

have you thought about caching implementation in RAG based systems? Curious.

@codingcrashcourses8533 10 ай бұрын

Yes, but currently it is one of the less prioritized topics, since when you use the whole conversation history, the amount of Times we you can use the Cache is not too often. Have you worked with Cache before?

@zendr0 10 ай бұрын

I have used in memory cache. Can we do something like... we use cache to store the embeddings and then do cosine similarity on the new input query embeddings and the ones in the cache. If the score is more than a threshold then it is somewhat obvious that the query has been asked previously, so we just use the cache to answer that. What do you think? @@codingcrashcourses8533

@mcdaddy42069 10 ай бұрын

why do you have to put your vectorstore in a docker contrainer?

@codingcrashcourses8533 10 ай бұрын

Containers are just the way to go. You dont have to, but it makes everything so much easier.

@alchemication 10 ай бұрын

Very nice. Did you consider langchain serve before trying an inhouse solution? Just curious..

@codingcrashcourses8533 10 ай бұрын

Langserve is more about prototyping in my opinion:)

@alchemication 10 ай бұрын

Interesting take on it, I think they promote it as a prod env API, but as usual, without actually trying for real - you won’t know 😅 best!

@codingcrashcourses8533 10 ай бұрын

@@alchemication Well, I am quite good with FastAPI and use it since a very long time, so I would in general prefer not to add an abstraction layer on top of it. My first glance on it was like "ok, that´s quick, but robust code is something different"

@daniel_avila 9 ай бұрын

Hi thanks for this! I have a question about digest specifically. I understand that would be a great way to compare page_content for changes, but I'm not sure where to do this programmatically, or where to inspect where this is happening already. As far as I know, this is not happening already and maybe more on this would be helpful to someone new to pgvector. Following how documents are added, it seems embeddings are created regardless.

@codingcrashcourses8533 9 ай бұрын

There is the indexing api to do this. Or do you mean visually like a git diff?

@daniel_avila 9 ай бұрын

@@codingcrashcourses8533 I was unaware this would involve indexing API but makes sense, however there's no official async pgvector implementation for the indexing manager: langchain-ai/langchain, issue #14836

@entrepreneurialyt 10 ай бұрын

Thank you for videos! Can you please make a video about tools that can be used for both performance measurement and accuracy tracking? Basically how to build test environment for bot before realising to production

@codingcrashcourses8533 10 ай бұрын

RAG Performance? Performance of service with a Load Test? What would interest you?

@entrepreneurialyt 10 ай бұрын

@@codingcrashcourses8533 Performance of service with a Load Test will be super cool!

@entrepreneurialyt 10 ай бұрын

@@codingcrashcourses8533 RAG Performance will be cool!

@entrepreneurialyt 10 ай бұрын

@@codingcrashcourses8533 Rag perfomance will be great!

@entrepreneurialyt 10 ай бұрын

@@codingcrashcourses8533 attempting to respond to your question 101 time: please make video about r-a- g performance (please KZbin don't ban my reply). I am developing a bot that is able to answer question based on the transript of the video lecture and other course materials to speed up learning proces. if I am not misteken first one will be more relevant? How can I deduce that my bot is ready for production? Thank you :)

@DePhpBug 7 ай бұрын

Still new with all the concept here , saw the video about having API on top of the model's API is this correct? For having an abstraction layer on top of model. Am i correct to say , my model need to sit in let;s say server A , then i need to create the API in server B to connect to the A ?

@codingcrashcourses8533 7 ай бұрын

Exactly. Adding one Layer is crutial nornally, Adding more can but must not make sense for your usecase

@DePhpBug 7 ай бұрын

@@codingcrashcourses8533 thanks

@swiftmindai 10 ай бұрын

As always excellent content. I have learned from your previous content about use of langchain index api (SqlRecordManager). Now, I've learned about using of hashing function (generate_digest). I believe both are for same purpose. I'm wondering which one would be better coz I don't see the way to measure performance for both methodology. Appreciate your suggestion.

@codingcrashcourses8533 10 ай бұрын

Thank you! I think its just important to understand the concepts of WHY Langchain introduces something like that and learn about the limitations. I found it hard to use the indexing API when there are a large amount of documents.

@swiftmindai 10 ай бұрын

It took me literally few days to understand and implement the indexing API concept. I even had to switch to PGvector from other vector store provider which i was using earlier since indexing api was only applicable to sql based vector store. But now, I love PGVector more than any other. I thank you alot for your production implementation video as I literally use this as the basis of my latest project.

@sskohli79 10 ай бұрын

great video thanks! can you please also add requirements.txt to your repo

@codingcrashcourses8533 10 ай бұрын

can add that today, yes :)

@sskohli79 10 ай бұрын

Thanks!

@sskohli79 10 ай бұрын

@@codingcrashcourses8533 not there 😞

@omaralhory8065 10 ай бұрын

Can yo add it please? I checked and its not there

@codingcrashcourses8533 10 ай бұрын

@@omaralhory8065 really sorry, forgot about it

@picklenickil 8 ай бұрын

Just came to comment that, maintaining a backend for this will be hard!

@codingcrashcourses8533 8 ай бұрын

What do you exactly mean?

@YerkoMuñoz-q7u Ай бұрын

Isn't quite the opposite? having a backend like this allows you to have a maintainable infrastructure

@yazanrisheh5127 10 ай бұрын

Can you show us how to implement memory with LCEL and if possible, caching responses? Thanks

@codingcrashcourses8533 10 ай бұрын

The Memory classes from Langchain are not a good way to work in production, they are just for prototyping. In real world apps you probably want to handle all of that in Redis

@say.xy_ 10 ай бұрын

Best best best!!!

@codingcrashcourses8533 10 ай бұрын

Thank you :)

@Pure_Science_and_Technology 10 ай бұрын

Will Gemini 1.5 and beyond kill RAG?

@codingcrashcourses8533 10 ай бұрын

Highly doubt that with Gemini 1.5, but beyond hopefully. Currently Answers still are bad then your context is larger than 20 Documents or so

@xiscosmite6844 10 ай бұрын

@@codingcrashcourses8533Curious why you think answers are bad after that size and how Gemini could solve that in the future. Thanks for the great video!

@codingcrashcourses8533 10 ай бұрын

@@xiscosmite6844 I dont trust gemini after i tried it on my own :)

@dswithanand 8 ай бұрын

How to integrare langchain chat memory history with fastapi

@codingcrashcourses8533 8 ай бұрын

You don´t. You want your API to be stateless normally.

@dswithanand 8 ай бұрын

@@codingcrashcourses8533 I understand that.. I am working on a sqlbot and using fastapi along with it. But this bot is not able to retrieve the context memory. Can you help with that. Langchain has ChatMemoryHistory library which can be used for this.