Scanned PDF Langchain Hybrid Search

Рет қаралды 4,212

AI With Tarun

Күн бұрын

Пікірлер: 28

@ramareddymathsacademy 10 ай бұрын

Great Going. Keep up good work. Let me introduce to my students. It's really benefits their career growth. Thanks

@chiragjain157 11 ай бұрын

I tried this method , It worked for use case I was working. Thank You

@AIwithTarun 11 ай бұрын

Glad it helped

@bydmatt4499 10 ай бұрын

Is insane!!! amazing video man! when will you get the entire course? I can't wait to check it out!

@vladimirolezka3482 11 ай бұрын

Great video.❤ Bhai, could you please make a video using Qdrant as vectorstore instead of ChromaDB. Thanks 🙏

@AIwithTarun 11 ай бұрын

Noted.

@RashomonAI 11 ай бұрын

Quality video as always

@AIwithTarun 11 ай бұрын

Glad you enjoyed it

@int8float64 11 ай бұрын

good tutorial! I just have a small problem plz help the llm prints generates the whole template as well as the answer together. also the answer generated is not complete, it stops at the middle. Any help?

@AIwithTarun 11 ай бұрын

Use model_kwargs and define max_new_tokens to be 512 to avoid answer generation in middle. Also regarding the answer printing the entire context, it's seems there is kind of changes in prompt template. Initially following template fixes this. But from the latest update it seems you can directly pass the prompt to the chain.

@PsPappu-j3o 11 ай бұрын

thanks for great video and efforts , can you please help me how to print only assistant answers rather than whole information in the output.

@AIwithTarun 11 ай бұрын

Usually answer should not contain the context information if you follow the prompt template as per the LLM. But if you followed it in right way, use regex Or Python List to get response after

@CryptoMaN_Rahul 8 ай бұрын

underrated

@naveenpandey9016 11 ай бұрын

can we implement this for JSON which contains large amount of data

@AIwithTarun 11 ай бұрын

Yes you can. Refer: python.langchain.com/docs/modules/data_connection/document_loaders/json

@naveenpandey9016 11 ай бұрын

@@AIwithTarunwhich LLM model is good for generating summaries from large data. Will mistral perform well?

@Hellow_._ 11 ай бұрын

I dont understand why we have used keyword retriever here, bcz we dont have multiple documents here that you mentioned in the slides i.e (smartphone, VR, AI). cant we use only vector retriever here?

@AIwithTarun 11 ай бұрын

The intention here is to show the implementation of Ensemble Retriever, once we complete the RAG video series, we have 2-3 end-to-end project implementation. During that time we will have multiple documents. I guess I will mention this in next video, thanks for pointing this concern, really appreciate it.

@Hellow_._ 11 ай бұрын

@@AIwithTarun thank you for replying. Yesterday Interview asked me one Question - suppose we are building application for publisher house and they have data in magazines, podcasts and in video format. And if user ask query then it should give answer from the documents. So i guess in this keyword retriever will work, right?.

@AIwithTarun 11 ай бұрын

@@Hellow_._ In this case you also need to check how you index the data(create embeddings for the chunks) and what metadata to include. With the help of metadata, you can then use a keyword retriever. But again, I would prefer testing it with Hybrid Search.

@Hellow_._ 11 ай бұрын

@@AIwithTarun okay. Thanks. Appreciated 🙏

@tirthbhatt3340 10 ай бұрын

Hi Tarun, i have multiple json files and i want to create q and a out of it also want to fine tune it with gpt4 could you please help me which approach and tools i chose for that?

@AIwithTarun 10 ай бұрын

Use TRL library along with Open Source LLMs

@nitinsharma4922 11 ай бұрын

Sir please make a video on proper roadmap for ML and AI. If possible then start a program from scratch.

@AIwithTarun 11 ай бұрын

That's nice suggestion. If channel hits 3K subscribers I will start AI/ML video series from scratch for free based on industry standards

@scitechtalktv9742 10 ай бұрын

What to change in the code if my Scanned pdf is in the German language? Certainly the embedding model has to change? And perhaps something else ? Also I need a solution for when I have a whole set of PDF’s in English and German, and also Scanned and normal digital PDF’s amongst them. What to do in that case?

@AIwithTarun 10 ай бұрын

You need to check if there is any Bilingual embedding model for English + German. If not use any Multilingual. Also you need to update the System Prompt. Unstructured based loading helps with both scanned and normal pdf data