Build a PDF Document Question Answering LLM System With Langchain,Cassandra,Astra DB,Vector Database

  Рет қаралды 58,064

Krish Naik

Krish Naik

Күн бұрын

I'll show you how to build a powerful Query PDF Question Answering System application. We're combining the forces of Apache Cassandra, DataStax's Astra DB as a vector database, LangChain, Streamlit, Python and the incredible GPT-4 to create a game-changing solution.
▶ Link to the code :colab.research.google.com/dri...
▶ To create the cassandra db signup in the datastax platform:https: dtsx.io/4adwXRK Use your business email address and get from $1000- $3000 in free credits and consulting with your subscription.
Link in bio: dtsx.io/4adwXRK

Пікірлер: 71
@krishnaik06
@krishnaik06 7 ай бұрын
Today marks my 5th year anniversary of uploading videos related to Data Science and AI in my yourube channel.I have definitely learnt a lot and I hope I was able to add values in others life. Looking to continue this work till jab tak hai jaan 😀 Lets keep the likes target to 1000 on this occasion :)😊
@hari-415
@hari-415 7 ай бұрын
you deserve a lot more sir❤
@squadgang1678
@squadgang1678 7 ай бұрын
Congratulations 🎉 appreciated sir
@AshuChhabra3
@AshuChhabra3 7 ай бұрын
Many congratulations Krish...keep up the good work 👏👏
@madhugarg7499
@madhugarg7499 2 ай бұрын
Getting error: 'OpenAI' object has no attribute 'embed_query' - After executing the "astra_vector_store = Cassandra(....". Can you please tell how to fix this ?
@krishnansankaranarayanan1969
@krishnansankaranarayanan1969 2 ай бұрын
It is helpful for my capstone. Thanks
@CW-dt2xk
@CW-dt2xk 3 күн бұрын
Next Level, Thank You!
@infobarbosa
@infobarbosa 4 ай бұрын
Awesome content! Congrats!!!
@krishj8011
@krishj8011 19 күн бұрын
Awesome Tutorial...
@2771237
@2771237 7 ай бұрын
Thanks Alot for Video dear :)
@vaishnaviumesh8590
@vaishnaviumesh8590 7 ай бұрын
Hi Krish, love all your videos..Can you make a video on whisper and Distil whisper and how to use them in a project please?
@datadumpp
@datadumpp 2 ай бұрын
Just amazing
@samuelnikhade5612
@samuelnikhade5612 7 ай бұрын
Sir can you please create a video with FAISS index. Thank you for this video this is amazing information
@CrusadeVoyager
@CrusadeVoyager 7 ай бұрын
Nice Krish, Your Next Vid will be based on Indian AI Model, OLA has released Krutrim - feb 24 ;) . Glory to Bharat ❤
@AI__Spectrum
@AI__Spectrum 7 ай бұрын
Good work man
@gaurangabhuyan3760
@gaurangabhuyan3760 7 ай бұрын
What if i ask to this model "tell me something about this pdf/ a summary of it", how will it do the similarity search as it requires information from all the chunks?
@izainonline
@izainonline 7 ай бұрын
Sir Like the size is very huge like in tbs to use Cassandra dB. 2nd open ai embedding is not free if we use hugging face embediing
@NoDoglapan
@NoDoglapan 6 ай бұрын
hi there can you let me know what change I need to make to the code if i want to use multiple pdfs as my data ? thanks in advance!
@out-of-sight
@out-of-sight 7 ай бұрын
Hi, can we use different embeddings for this project? I don't want to use OpenAI. I want to use open source one.
@dipankardandapat5495
@dipankardandapat5495 5 ай бұрын
Sir in pdf if some image and graph present how to store it , and how to ask question based on graph or image ?
@RamaChandran-fc3hp
@RamaChandran-fc3hp 7 ай бұрын
Sir is it similar to pine cone vector database?
@priyanshupandey3148
@priyanshupandey3148 6 ай бұрын
Does it handle images and tables as well?
@umeshkumarasamy6608
@umeshkumarasamy6608 Ай бұрын
@ exactly 14:40 you have imported something called Concatenate from typing_extensions library (which I think is now deprecated) but I don't see the imported function/Object used anywhere in the code. Can you explain the rationale as to why the import was done? Am I missing something here?
@sunilkumarpradhan.4376
@sunilkumarpradhan.4376 6 ай бұрын
I am facing issues with rate limit exceeded even after putting new API key of open ai , how can I fix this ? Also does pdf size matter in this ?
@user-gu5fz3vn7t
@user-gu5fz3vn7t 3 ай бұрын
is it ok to use your code for personal, education and or commercial application?
@rishiraj2548
@rishiraj2548 7 ай бұрын
🙏🙂👍💯
@umeshkumarasamy6608
@umeshkumarasamy6608 Ай бұрын
Also I have another doubt. Why are the similarity search results returning the same results twice. Does that mean there are duplicate entries in the vector database? If so how do you avoid such duplication.
@afannouni1837
@afannouni1837 5 ай бұрын
Hi what version of python did you use
@manujmehrotra2771
@manujmehrotra2771 6 ай бұрын
What is the software that is being used here to draw and elaborate.
@NotesandPens-ro9wx
@NotesandPens-ro9wx 7 ай бұрын
Sir why you contenate raw_text with content variable ? I didnt get it
@chanchalsahu5023
@chanchalsahu5023 2 ай бұрын
What is the purpose of giving a provider while generating the DB
@user-fo7vm9px8f
@user-fo7vm9px8f 4 ай бұрын
It give me the details from outside of the pdf also. How can I restrict this with only PDF?
@BhuvanWebOsmo
@BhuvanWebOsmo 4 ай бұрын
Can we use gemini embedding for this project? (Cause it's free)
@Balajinicky
@Balajinicky 7 ай бұрын
hi, Krish what if the pdf has graph, images and scanned copies how to embed them and push into the VectorDB. It will be full fledged project idea. why are you not using milvus which actually works well
@namanagarwal6549
@namanagarwal6549 6 ай бұрын
this is a major problem while extracting data from pdf or docx file
@VinodKumar-ku7yi
@VinodKumar-ku7yi 7 ай бұрын
Thanks Krish for sharing this video. Is it possible to do query with multiple PDF. in our case we have pool of Resumes in PDF format and HR wants to query using LLM and they want to get the relevant CV. Suggest if you have any idea.
@krishnaik06
@krishnaik06 7 ай бұрын
Yes it is possible
@Pubba_
@Pubba_ 7 ай бұрын
@@krishnaik06Can you tell me what are the changes we need to do if we do query with multiple documents?
@akashramdham7704
@akashramdham7704 7 ай бұрын
I'm also working on a same task but till now I haven't found anything interesting
@shraddhamishra5642
@shraddhamishra5642 7 ай бұрын
yess please tell me also..
@user-sz2mi4bk4l
@user-sz2mi4bk4l 7 ай бұрын
Hey dude can I get ur twitter or telegram i have lot of doubts related to this field I guess ur the genius to answer those !
@madhugarg7499
@madhugarg7499 2 ай бұрын
Hi All..Anyone is facing the error: 'OpenAI' object has no attribute 'embed_query' - After executing the "astra_vector_store = Cassandra(....". please tell how to fix this ?
@aniruddhajoshi4055
@aniruddhajoshi4055 Ай бұрын
can someone plz help me.. 1. in just last lecture, we had used Pinecone for embeddings, and now casendraa and all that .. Why? what is the diff? 2. In order to read pdf, why can't we use the same approach : def read_doc(directory): file_loader = pyPDFDirectoryLoader(directory) documents = file_loader.load() return documents Beacause this code also goes with the same approach: read data -> divide into chunks -> embeddings
@darshandodamani6682
@darshandodamani6682 6 ай бұрын
I am learning Machine Learning I have bunch of PDFs . Can I use this tutorial to create a pdf question answering. I also need that the pdf has sall short cuts notes mentioned. I want to use the pdfs and get the detailed information once I ask the question to it. Is it possible sir?
@charanya5209
@charanya5209 5 ай бұрын
i am planning on doing it. Have you done with it? If so guide me
@darshandodamani6682
@darshandodamani6682 5 ай бұрын
@@charanya5209 I tried it but couldn't complete full. I think it couldn't be done without the GPT free trail. We have to buy the token.
@charanya5209
@charanya5209 5 ай бұрын
@@darshandodamani6682 That's ok. Thank you
@NerdNetArcade
@NerdNetArcade 6 ай бұрын
Hi Sir, I am getting this error - "InvalidRequest: Error from server: code=2200 [Invalid query] message="Unknown keyspace articles" But i am using the Keyspace as None only, Please help me astra_vector_store = Cassandra( embedding=embedding, table_name="qa_mini_demo", session=None, keyspace=None, )
@mananmaini6968
@mananmaini6968 5 ай бұрын
i am getting the same issue ...any tips to resolve that?
@bablusingh-ij1sw
@bablusingh-ij1sw 5 ай бұрын
can you create a video without using the OpenAI
@lalitjoshi6528
@lalitjoshi6528 7 ай бұрын
Error i encountered is token limit is 4169 and you are using 4925 token in your prompt and 200 in question. So how to fix that or how to modify that code for larger text files of 2mb
@krishnaik06
@krishnaik06 7 ай бұрын
U need to change the into multiple documents chunks and then send
@liquid_pro2344
@liquid_pro2344 5 ай бұрын
how can i find openAI_API_KEY
@ashoksamrat8486
@ashoksamrat8486 5 ай бұрын
I want to design a ATS system so that I can filter the resumes based on job description. Suppose there are 10000 resume of candidates and I want to get Top 50 or 100 resume that can be best suited for job description. input:- 10000 resumes in pdf format. Output: Top 50/100 resume that best suited for job description. How can I achieve this using LLM and Streamlit for UI?
@madhugarg7499
@madhugarg7499 2 ай бұрын
@krishnaik06 - Please take some time to answer your KZbin videos, at least for the recent & new technology videos.
@kanagadavid931
@kanagadavid931 4 ай бұрын
why so many probably ?
@bossmanish2879
@bossmanish2879 18 күн бұрын
😂😂😂
@shraddhamishra5642
@shraddhamishra5642 7 ай бұрын
sir can we used for pdf folder which include atleast 50+pdf so how we do for this??? please sir tell this ..
@Balajinicky
@Balajinicky 6 ай бұрын
Yes, you put them in s3 bucket call them directly and process it. It will work
@squadgang1678
@squadgang1678 7 ай бұрын
This is RAG right?
@krishnaik06
@krishnaik06 7 ай бұрын
Yes
@HrisavBhowmick
@HrisavBhowmick 7 ай бұрын
pls do a video where we wont use any API key
@krishnaik06
@krishnaik06 7 ай бұрын
Use llama2 its an open source model
@user-mj7xr5nf7b
@user-mj7xr5nf7b 7 ай бұрын
In above video is LLM model is responsible to interact with Cassandra DB and get the output for desired query ? Also, do we need to train the LLMs model with the pdf data or it just integrate with DB and give us desired output? Please explain about Promote templates in above videos where I can use it ? Thank you in advance.
@esotericwanderer6473
@esotericwanderer6473 2 ай бұрын
Great content, just one suggestion, stop using "probably" word in every single sentence unnecessarily, it is just irritating and distracting. otherwise great video, learnt a lot. keep going Thank you
@Nikhil-dp2mw
@Nikhil-dp2mw 7 ай бұрын
yellow
@enough200
@enough200 4 ай бұрын
Can you stop using the word "it will probably do this"? It will or it will not.why probably , basicall
@nnajijuliet5024
@nnajijuliet5024 3 ай бұрын
You are m@d Create your channel and use what you want
Python RAG Tutorial (with Local LLMs): AI For Your PDFs
21:33
pixegami
Рет қаралды 150 М.
ОСКАР vs БАДАБУМЧИК БОЙ!  УВЕЗЛИ на СКОРОЙ!
13:45
Бадабумчик
Рет қаралды 5 МЛН
Survival skills: A great idea with duct tape #survival #lifehacks #camping
00:27
Can You Draw A PERFECTLY Dotted Circle?
00:55
Stokes Twins
Рет қаралды 36 МЛН
Vector Database Explained | What is Vector Database?
6:52
codebasics
Рет қаралды 70 М.
The Harsh Reality of Being a Data Analyst
7:39
Sundas Khalid
Рет қаралды 563 М.
Generative AI / LLM - Document Retrieval and Question Answering
13:21
Marker: This Open-Source Tool will make your PDFs LLM Ready
14:11
Prompt Engineering
Рет қаралды 41 М.
STOP Learning These Programming Languages (for Beginners)
5:25
Andy Sterkowitz
Рет қаралды 663 М.
An Introduction to using LangChain with Astra DB
25:18
DataStax Developers
Рет қаралды 1,5 М.
ОСКАР vs БАДАБУМЧИК БОЙ!  УВЕЗЛИ на СКОРОЙ!
13:45
Бадабумчик
Рет қаралды 5 МЛН