Lets not split the chunk wise. It loses structure. Just structurise it in format. Title, content, tables in document. Also if images. Generate embeddings of all columns. When user asks query decide whether its based on table or content based on score. We can use bm25 and semantics search scores and then reciprocal rank fusion to calculate all scores. If the score of table col iss more then query is based on table. When passing data to the llm we will pass all data. Suppose if the query based on table col. We will take high score result on table. And we will pass that table as well as corresponding columns. That is title and content. This really works great and it maintains the structure as welll. We can as query like give me content insode the title named “example title “. It will be based on title so we will pass the title as well as corresponding columns.
@KumR3 ай бұрын
Hey Sid.. Great one. Can u expand this to reading not only pdf but also csv, docx and txt?
@WiseCoder-rp2zn12 күн бұрын
so, if we are changing documents, we have to run the vectorizer file again right?
@ruznyma3 ай бұрын
Great contents ❤️ Could you please make videos on lang graph and AI agents ? Thanks for your valuable tutorials 👍
@Siddhardhan3 ай бұрын
Sure
@SUDA-u2r3 ай бұрын
I have implemented Conversation rag chain taking reference of langchain documentation. I'm not getting how to implement reciprocal rank fusion or reranking when you are using History aware retriever
@KumR3 ай бұрын
Thanks Sid. What if we want to upload docs & nt just point to data folder ? And what if we have multiple pdf, csv, docx, txt etc where data lies ?
@HarshitaAmbre11 күн бұрын
how to proceed with multipe csv files ?
@gauravwankhede92633 ай бұрын
Can Chunksize can be spotted by using " " so that chunks will be splitted into paragraph wise. hence it cannot be mess up with the structure.
@yazanrisheh51273 ай бұрын
Yes using RecursiveCharacterTextSplitter
@gauravwankhede92633 ай бұрын
@@yazanrisheh5127 Thank you for comment, I did this, and successfully it works.
@gamekhela2 ай бұрын
how to implement cache mechanism here using gptcache or langchain inmemory or sqllite cache
@KumR3 ай бұрын
Can we use agent based on type of document?
@lnstagrarm3 ай бұрын
Also i don’t suggest using langchain. As we cant configure it according to our needs. Also it need so much things in backend to run so it makes responses slower.
@KumR3 ай бұрын
Will Llamaindex help ?
@lnstagrarm3 ай бұрын
@@KumR helps but u will spend so much time if u have data in other formats. Suppose u parsed data in a way u want and wantt to feed to llmama index. U need to make in that format first. There are so many things. As some functions in llmaindex need documents directly. And it will parse in its way. So i think u should manually write code. But for some tasks u can use. I don’t suggest langchain. U will struggle to write the code. As their code modularity is. Not good.
@KumR3 ай бұрын
@@lnstagrarm Thanks for the perspective. Helps newbies in this area..
@junaidamin2 ай бұрын
what u suggest then?
@lnstagrarm2 ай бұрын
@@junaidamin i suggest llamaindex. Easy to conduct with anything. Like langchain always has openai suppose when it comes to use openai api. But when it comes to use other models then its really hard to figure it out how to do it. But llamaindex has good documentation also can be configured easily according to our needs.
@kavururajesh17602 ай бұрын
Great content but this doesn't perform well when we are comparing two documents and asking the questions
@PraveenYadavgaming63973 ай бұрын
magic.from_file(file_path, mime=True) AttributeError: module 'magic' has no attribute 'from_file' even after pip install python-magic-bin