Build a PDF Document Question Answering LLM System With Langchain,Cassandra,Astra DB,Vector Database

Рет қаралды 63,277

Күн бұрын

Пікірлер: 75

@krishnaik06 9 ай бұрын

Today marks my 5th year anniversary of uploading videos related to Data Science and AI in my yourube channel.I have definitely learnt a lot and I hope I was able to add values in others life. Looking to continue this work till jab tak hai jaan 😀 Lets keep the likes target to 1000 on this occasion :)😊

@hari-415 9 ай бұрын

you deserve a lot more sir❤

@squadgang1678 9 ай бұрын

Congratulations 🎉 appreciated sir

@AshuChhabra3 9 ай бұрын

Many congratulations Krish...keep up the good work 👏👏

@madhugarg7499 5 ай бұрын

Getting error: 'OpenAI' object has no attribute 'embed_query' - After executing the "astra_vector_store = Cassandra(....". Can you please tell how to fix this ?

@CrusadeVoyager 9 ай бұрын

Nice Krish, Your Next Vid will be based on Indian AI Model, OLA has released Krutrim - feb 24 ;) . Glory to Bharat ❤

@out-of-sight 9 ай бұрын

Hi, can we use different embeddings for this project? I don't want to use OpenAI. I want to use open source one.

@gauravpatil8351 9 ай бұрын

Get hugging face embedding

@yashkumar2716 29 күн бұрын

i know i am replying a little late but if you still haven't found any then go for GoogleGenerativeAIEmbeddings

@samuelnikhade5612 9 ай бұрын

Sir can you please create a video with FAISS index. Thank you for this video this is amazing information

@vaishnaviumesh8590 9 ай бұрын

Hi Krish, love all your videos..Can you make a video on whisper and Distil whisper and how to use them in a project please?

@CW-dt2xk 2 ай бұрын

Next Level, Thank You!

@krishnansankaranarayanan1969 5 ай бұрын

It is helpful for my capstone. Thanks

@umeshkumarasamy6608 4 ай бұрын

Also I have another doubt. Why are the similarity search results returning the same results twice. Does that mean there are duplicate entries in the vector database? If so how do you avoid such duplication.

@madhugarg7499 5 ай бұрын

Hi All..Anyone is facing the error: 'OpenAI' object has no attribute 'embed_query' - After executing the "astra_vector_store = Cassandra(....". please tell how to fix this ?

@dipankardandapat5495 7 ай бұрын

Sir in pdf if some image and graph present how to store it , and how to ask question based on graph or image ?

@Balajinicky 9 ай бұрын

hi, Krish what if the pdf has graph, images and scanned copies how to embed them and push into the VectorDB. It will be full fledged project idea. why are you not using milvus which actually works well

@namanagarwal6549 9 ай бұрын

this is a major problem while extracting data from pdf or docx file

@gaurangabhuyan3760 9 ай бұрын

What if i ask to this model "tell me something about this pdf/ a summary of it", how will it do the similarity search as it requires information from all the chunks?

@aaryaparikh6447 Ай бұрын

I am dealing with same problem, if you get answer for this please help me

@mahendrareddy4573 2 сағат бұрын

It's possible through RAG

@chanchalsahu5023 4 ай бұрын

What is the purpose of giving a provider while generating the DB

@krishj8011 3 ай бұрын

Awesome Tutorial...

@umeshkumarasamy6608 4 ай бұрын

@ exactly 14:40 you have imported something called Concatenate from typing_extensions library (which I think is now deprecated) but I don't see the imported function/Object used anywhere in the code. Can you explain the rationale as to why the import was done? Am I missing something here?

@darshandodamani6682 8 ай бұрын

I am learning Machine Learning I have bunch of PDFs . Can I use this tutorial to create a pdf question answering. I also need that the pdf has sall short cuts notes mentioned. I want to use the pdfs and get the detailed information once I ask the question to it. Is it possible sir?

@charanya5209 8 ай бұрын

i am planning on doing it. Have you done with it? If so guide me

@darshandodamani6682 8 ай бұрын

@@charanya5209 I tried it but couldn't complete full. I think it couldn't be done without the GPT free trail. We have to buy the token.

@charanya5209 8 ай бұрын

@@darshandodamani6682 That's ok. Thank you

@RamaChandran-fc3hp 9 ай бұрын

Sir is it similar to pine cone vector database?

@infobarbosa 6 ай бұрын

Awesome content! Congrats!!!

@CelestiaSecurity 7 ай бұрын

It give me the details from outside of the pdf also. How can I restrict this with only PDF?

@aniruddhajoshi4055 4 ай бұрын

can someone plz help me.. 1. in just last lecture, we had used Pinecone for embeddings, and now casendraa and all that .. Why? what is the diff? 2. In order to read pdf, why can't we use the same approach : def read_doc(directory): file_loader = pyPDFDirectoryLoader(directory) documents = file_loader.load() return documents Beacause this code also goes with the same approach: read data -> divide into chunks -> embeddings

@sunilkumarpradhan.4376 9 ай бұрын

I am facing issues with rate limit exceeded even after putting new API key of open ai , how can I fix this ? Also does pdf size matter in this ?

@moussaouikhouloud 2 ай бұрын

did you find a solution?

@machinesmarts 6 ай бұрын

is it ok to use your code for personal, education and or commercial application?

@NerdNetArcade 9 ай бұрын

Hi Sir, I am getting this error - "InvalidRequest: Error from server: code=2200 [Invalid query] message="Unknown keyspace articles" But i am using the Keyspace as None only, Please help me astra_vector_store = Cassandra( embedding=embedding, table_name="qa_mini_demo", session=None, keyspace=None, )

@mananmaini6968 7 ай бұрын

i am getting the same issue ...any tips to resolve that?

@BhuvanWebOsmo 6 ай бұрын

Can we use gemini embedding for this project? (Cause it's free)

@datadumpp 5 ай бұрын

Just amazing

@izainonline 9 ай бұрын

Sir Like the size is very huge like in tbs to use Cassandra dB. 2nd open ai embedding is not free if we use hugging face embediing

@manujmehrotra2771 9 ай бұрын

What is the software that is being used here to draw and elaborate.

@NoDoglapan 9 ай бұрын

hi there can you let me know what change I need to make to the code if i want to use multiple pdfs as my data ? thanks in advance!

@kanagadavid931 7 ай бұрын

why so many probably ?

@bossmanish2879 3 ай бұрын

😂😂😂

@priyanshupandey3148 8 ай бұрын

Does it handle images and tables as well?

@2771237 9 ай бұрын

Thanks Alot for Video dear :)

@NotesandPens-ro9wx 9 ай бұрын

Sir why you contenate raw_text with content variable ? I didnt get it

@bablusingh-ij1sw 8 ай бұрын

can you create a video without using the OpenAI

@ashoksamrat8486 8 ай бұрын

I want to design a ATS system so that I can filter the resumes based on job description. Suppose there are 10000 resume of candidates and I want to get Top 50 or 100 resume that can be best suited for job description. input:- 10000 resumes in pdf format. Output: Top 50/100 resume that best suited for job description. How can I achieve this using LLM and Streamlit for UI?

@afannouni1837 7 ай бұрын

Hi what version of python did you use

@lalitjoshi6528 9 ай бұрын

Error i encountered is token limit is 4169 and you are using 4925 token in your prompt and 200 in question. So how to fix that or how to modify that code for larger text files of 2mb

@krishnaik06 9 ай бұрын

U need to change the into multiple documents chunks and then send

@shraddhamishra5642 9 ай бұрын

sir can we used for pdf folder which include atleast 50+pdf so how we do for this??? please sir tell this ..

@Balajinicky 9 ай бұрын

Yes, you put them in s3 bucket call them directly and process it. It will work

@AI__Spectrum 9 ай бұрын

Good work man

@liquid_pro2344 8 ай бұрын

how can i find openAI_API_KEY

@VinodKumar-ku7yi 9 ай бұрын

Thanks Krish for sharing this video. Is it possible to do query with multiple PDF. in our case we have pool of Resumes in PDF format and HR wants to query using LLM and they want to get the relevant CV. Suggest if you have any idea.

@krishnaik06 9 ай бұрын

Yes it is possible

@Pubba_ 9 ай бұрын

@@krishnaik06Can you tell me what are the changes we need to do if we do query with multiple documents?

@akashramdham7704 9 ай бұрын

I'm also working on a same task but till now I haven't found anything interesting

@shraddhamishra5642 9 ай бұрын

yess please tell me also..

@Mariooo-n1r 9 ай бұрын

Hey dude can I get ur twitter or telegram i have lot of doubts related to this field I guess ur the genius to answer those !

@squadgang1678 9 ай бұрын

This is RAG right?

@krishnaik06 9 ай бұрын

Yes

@esotericwanderer6473 5 ай бұрын

Great content, just one suggestion, stop using "probably" word in every single sentence unnecessarily, it is just irritating and distracting. otherwise great video, learnt a lot. keep going Thank you

@madhugarg7499 5 ай бұрын

@krishnaik06 - Please take some time to answer your KZbin videos, at least for the recent & new technology videos.

@HrisavBhowmick 9 ай бұрын

pls do a video where we wont use any API key

@krishnaik06 9 ай бұрын

Use llama2 its an open source model

@AvaniPatel-s5t 9 ай бұрын

In above video is LLM model is responsible to interact with Cassandra DB and get the output for desired query ? Also, do we need to train the LLMs model with the pdf data or it just integrate with DB and give us desired output? Please explain about Promote templates in above videos where I can use it ? Thank you in advance.