Scanned PDF Langchain Hybrid Search

  Рет қаралды 4,212

AI With Tarun

AI With Tarun

Күн бұрын

Пікірлер: 28
@ramareddymathsacademy
@ramareddymathsacademy 10 ай бұрын
Great Going. Keep up good work. Let me introduce to my students. It's really benefits their career growth. Thanks
@chiragjain157
@chiragjain157 11 ай бұрын
I tried this method , It worked for use case I was working. Thank You
@AIwithTarun
@AIwithTarun 11 ай бұрын
Glad it helped
@bydmatt4499
@bydmatt4499 10 ай бұрын
Is insane!!! amazing video man! when will you get the entire course? I can't wait to check it out!
@vladimirolezka3482
@vladimirolezka3482 11 ай бұрын
Great video.❤ Bhai, could you please make a video using Qdrant as vectorstore instead of ChromaDB. Thanks 🙏
@AIwithTarun
@AIwithTarun 11 ай бұрын
Noted.
@RashomonAI
@RashomonAI 11 ай бұрын
Quality video as always
@AIwithTarun
@AIwithTarun 11 ай бұрын
Glad you enjoyed it
@int8float64
@int8float64 11 ай бұрын
good tutorial! I just have a small problem plz help the llm prints generates the whole template as well as the answer together. also the answer generated is not complete, it stops at the middle. Any help?
@AIwithTarun
@AIwithTarun 11 ай бұрын
Use model_kwargs and define max_new_tokens to be 512 to avoid answer generation in middle. Also regarding the answer printing the entire context, it's seems there is kind of changes in prompt template. Initially following template fixes this. But from the latest update it seems you can directly pass the prompt to the chain.
@PsPappu-j3o
@PsPappu-j3o 11 ай бұрын
thanks for great video and efforts , can you please help me how to print only assistant answers rather than whole information in the output.
@AIwithTarun
@AIwithTarun 11 ай бұрын
Usually answer should not contain the context information if you follow the prompt template as per the LLM. But if you followed it in right way, use regex Or Python List to get response after
@CryptoMaN_Rahul
@CryptoMaN_Rahul 8 ай бұрын
underrated
@naveenpandey9016
@naveenpandey9016 11 ай бұрын
can we implement this for JSON which contains large amount of data
@AIwithTarun
@AIwithTarun 11 ай бұрын
Yes you can. Refer: python.langchain.com/docs/modules/data_connection/document_loaders/json
@naveenpandey9016
@naveenpandey9016 11 ай бұрын
​@@AIwithTarunwhich LLM model is good for generating summaries from large data. Will mistral perform well?
@Hellow_._
@Hellow_._ 11 ай бұрын
I dont understand why we have used keyword retriever here, bcz we dont have multiple documents here that you mentioned in the slides i.e (smartphone, VR, AI). cant we use only vector retriever here?
@AIwithTarun
@AIwithTarun 11 ай бұрын
The intention here is to show the implementation of Ensemble Retriever, once we complete the RAG video series, we have 2-3 end-to-end project implementation. During that time we will have multiple documents. I guess I will mention this in next video, thanks for pointing this concern, really appreciate it.
@Hellow_._
@Hellow_._ 11 ай бұрын
@@AIwithTarun thank you for replying. Yesterday Interview asked me one Question - suppose we are building application for publisher house and they have data in magazines, podcasts and in video format. And if user ask query then it should give answer from the documents. So i guess in this keyword retriever will work, right?.
@AIwithTarun
@AIwithTarun 11 ай бұрын
@@Hellow_._ In this case you also need to check how you index the data(create embeddings for the chunks) and what metadata to include. With the help of metadata, you can then use a keyword retriever. But again, I would prefer testing it with Hybrid Search.
@Hellow_._
@Hellow_._ 11 ай бұрын
@@AIwithTarun okay. Thanks. Appreciated 🙏
@tirthbhatt3340
@tirthbhatt3340 10 ай бұрын
Hi Tarun, i have multiple json files and i want to create q and a out of it also want to fine tune it with gpt4 could you please help me which approach and tools i chose for that?
@AIwithTarun
@AIwithTarun 10 ай бұрын
Use TRL library along with Open Source LLMs
@nitinsharma4922
@nitinsharma4922 11 ай бұрын
Sir please make a video on proper roadmap for ML and AI. If possible then start a program from scratch.
@AIwithTarun
@AIwithTarun 11 ай бұрын
That's nice suggestion. If channel hits 3K subscribers I will start AI/ML video series from scratch for free based on industry standards
@scitechtalktv9742
@scitechtalktv9742 10 ай бұрын
What to change in the code if my Scanned pdf is in the German language? Certainly the embedding model has to change? And perhaps something else ? Also I need a solution for when I have a whole set of PDF’s in English and German, and also Scanned and normal digital PDF’s amongst them. What to do in that case?
@AIwithTarun
@AIwithTarun 10 ай бұрын
You need to check if there is any Bilingual embedding model for English + German. If not use any Multilingual. Also you need to update the System Prompt. Unstructured based loading helps with both scanned and normal pdf data
Faster response with Langchain and Streamlit
46:17
AI With Tarun
Рет қаралды 3,5 М.
Каха и дочка
00:28
К-Media
Рет қаралды 3,4 МЛН
Мен атып көрмегенмін ! | Qalam | 5 серия
25:41
1% vs 100% #beatbox #tiktok
01:10
BeatboxJCOP
Рет қаралды 67 МЛН
Beat Ronaldo, Win $1,000,000
22:45
MrBeast
Рет қаралды 158 МЛН
Better RAG: Hybrid Search in Chat with Documents | BM25 and Ensemble
16:08
Prompt Engineering
Рет қаралды 23 М.
Redis as a Vector Database Explained
6:35
Redis
Рет қаралды 4,1 М.
Marker: This Open-Source Tool will make your PDFs LLM Ready
14:11
Prompt Engineering
Рет қаралды 63 М.
Agentic Framework LangGraph explained in 8 minutes | Beginners Guide
8:04
Local GraphRAG with LLaMa 3.1 - LangChain, Ollama & Neo4j
15:01
Coding Crash Courses
Рет қаралды 36 М.
Anthropic MCP with Ollama, No Claude? Watch This!
29:55
Chris Hay
Рет қаралды 16 М.
Python RAG Tutorial (with Local LLMs): AI For Your PDFs
21:33
pixegami
Рет қаралды 339 М.
Каха и дочка
00:28
К-Media
Рет қаралды 3,4 МЛН