BERT Question Answering System on PDF files using Python

  Рет қаралды 43,648

AIEngineering

AIEngineering

Күн бұрын

#datascience #machinelearning #deeplearning
CDQA is an End-To-End Closed Domain Question Answering System. Built on top of the HuggingFace transformers library. In this video we will use pretrained BERT model on documents to more analyze and search document using human based dialogue system

Пікірлер: 92
@swagatmishra9350
@swagatmishra9350 3 жыл бұрын
Thank you so much sir for the amazing tutorial .. It really helped me a lot in completing one of my projects.... Thank you so much.. Also a note for those who are trying to create a dataframe from the csv file instead of the pdfs...as in my case where I had a csv file with list of tweets and had to build a q and a on that.. The columns name has to be 'paragraphs' and each row of the paragraphs columns must "MUST BE A LIST".. then only it will work and the other column has to be named as "title"..
@40leenan65
@40leenan65 2 жыл бұрын
how did you do that?
@ashishbhatnagar9590
@ashishbhatnagar9590 3 жыл бұрын
Excellent tutorial Sir, Very informative
@vijaygharge2414
@vijaygharge2414 3 жыл бұрын
As always awesome information. Kudos sir for all the efforts!
@dabo6758
@dabo6758 Жыл бұрын
hello, so I am trying the code, but it seems that the cdqa is obsolete now, do you know anything about that? thank you, great content
@buhutimanakal
@buhutimanakal 2 жыл бұрын
Hi cdqa is now not working I think it's outdated can you make the same video using haystack and elasticsearch
@shivamlahane6870
@shivamlahane6870 3 жыл бұрын
This was so helpful, can I have a colab file link or git repository link please?
@adityanjsg99
@adityanjsg99 Жыл бұрын
Had to log in to like the video.. great info
@rohitchitte5614
@rohitchitte5614 3 жыл бұрын
You Earned a subscriber buddy thanks a lot for this great help.
@abheetsingh7238
@abheetsingh7238 3 жыл бұрын
In the video u have said that u have fine tuned the model by passing Ur own pdf , how can we customize the BERT model for fine tuning
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
You can check on fine tuning final model itself. github.com/cdqa-suite/cdQA
@ShaileshSarda-m6z
@ShaileshSarda-m6z Жыл бұрын
Not able to install the cdqa. @AIEngineering could you please assist?
@shravanraikar9039
@shravanraikar9039 2 жыл бұрын
pip install cqda doesnt work ERROR: No matching distribution found for cqda anything can be done?
@abhimalyachowdhury7835
@abhimalyachowdhury7835 3 жыл бұрын
Thank you sir
@ayshafathima4162
@ayshafathima4162 3 жыл бұрын
Hi AIEngineering, You have always been my go-to videos for learning. Can i have access to the code for this video ?
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
Should be in this repo - github.com/srivatsan88/KZbinLI
@ANUMALASETTYPOORNAKUMAR2022-CS
@ANUMALASETTYPOORNAKUMAR2022-CS 18 күн бұрын
Can you please tell us Where we can Able to get All PDF's if iam doing the Question answering System on the Agricultural Sector
@nikolacubric2061
@nikolacubric2061 3 жыл бұрын
Great video! I'm trying to follow along but I'm having issues with the !pip install cdqa, my output is: ERROR: Could not find a version that satisfies the requirement torch==1.2.0 (from cdQA) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 1.7.1, 1.8.0, 1.8.1, 1.9.0) ERROR: No matching distribution found for torch==1.2.0 (from cdQA)
@jorgerios4091
@jorgerios4091 Жыл бұрын
"cdqa" is not maintained anymore, there is a similar named "haystack", hopefully @AIEngineering will make a video with this.
@surajprusty6904
@surajprusty6904 2 жыл бұрын
Why cant we install cdqa anymore??
@shikhasingh-c1m8y
@shikhasingh-c1m8y 7 ай бұрын
I am unable to install cdqa ? ERROR: Could not find a version that satisfies the requirement cdqa (from versions: none)
@brijboda
@brijboda Жыл бұрын
Very imformative. Thanks a lot. Can you please explain, where can I find the colab notebook ?
@awesomenpc2729
@awesomenpc2729 2 жыл бұрын
I have undertaken a similar project as my Final Year Project where as I search for answers in PDF. Can you please guide me as to how to approach this project using Machine Learning?
@belhadjkhadija506
@belhadjkhadija506 2 жыл бұрын
Hello AwesomeNPC i have the same project for my final year project , can u share yours with me so i can get an idea. thanks
@awesomenpc2729
@awesomenpc2729 2 жыл бұрын
@@belhadjkhadija506 do you have any social where we could get together? Discord?
@Aman-wi9qo
@Aman-wi9qo Жыл бұрын
Hi i have same problem did you find any solution?
@Aman-wi9qo
@Aman-wi9qo Жыл бұрын
​@@belhadjkhadija506 hi did you got any information for this one?
@henkhbit5748
@henkhbit5748 3 жыл бұрын
Like the video and very informative. How to user other BERT implementations for other language? Could u give an example how to import a different BERT language? For example BERT-NL. QCDA is not supported anymore, any alternatives?
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
I will try to implement it with a different package and upload it. But you can check this one meanwhile - github.com/deepset-ai/haystack
@henkhbit5748
@henkhbit5748 3 жыл бұрын
@@AIEngineeringLife Thanks, I see many NLP examples but they all focused only in English. Luckily we live in a multi lingual word ;-). ps: I did found the link about haystack too. It seems a good alternative.
@srijitasaharoy2228
@srijitasaharoy2228 3 жыл бұрын
How can I make retrain any model such that it can answer question from 100 page pdf of any application like answering questions from washing machine manual pdf?
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
Have you tried feeding your pdf files into the pretrained model similar to the video and see if it works.. Ideally it must be able to understand based on context in question. if you face any problem let me know and can check on that part
@srijitasaharoy2228
@srijitasaharoy2228 3 жыл бұрын
@@AIEngineeringLife It is working fine. Thank you so much for this video. Please make more videos on NLP.
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
@@srijitasaharoy2228 I have a separate playlist on NLP in case if you have not seen it - kzbin.info/www/bejne/Z5KlgYuNgcunfNU
@manishnoola7334
@manishnoola7334 Жыл бұрын
checking this video now in 2023 and getting this error while installing cdqa ERROR: Could not find a version that satisfies the requirement cdqa (from versions: none) ERROR: No matching distribution found for cdqa any solutions please
@valerysalov8208
@valerysalov8208 3 жыл бұрын
I had some questions to ask 1) Do you think data science has a long term future? I currently work as a data scientist in a small startup and I see that many of the processes and infrastructure is not present yet for a AI first company. I have a passion for data science that's why I got into this field and not for the hype created around it. 2) Do you have people in your network who only with a BTech degree worked for top product based companies and not FAANG specific, cause I see a huge bias for ML/DS roles being offered for MS students.
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
data science has been in existence for quiet long and what we see now is everyone using the term data science to sell their product. Does it have long term future?. Yes it does Infrastructure requires lot of funds and typically startups go slow on it until they find some initial clients or strong investors. Even cloud can overtime become expensive. Second aspect is non practitioners typically lack knowledge on importance of infrastructure but things might get better over time I do have my friends with btech in this industry and I am myself holding the same. If someone is new to industry yes companies go for MS recruitment. Sadly that will still happen. For someone with bachelors I would say work on good and unique portfolio to get attention from employers
@VLM234
@VLM234 3 жыл бұрын
@@AIEngineeringLife thank you so much for such a nice suggestion. Sir, for freshers it's really very tough to get job, we don't even get a chance to sit in a interview, resumes get rejected all the times... On LinkedIn, recruiters view resume even download them but no further response..... What do they expect from freshers.... There is too much hype for DS but reality seems very strange.... I have kind request, in your videos at least in few please use to talk about time serves, DS, ML, DL, NLP opportunities..... Thank you again for so frequent and knowledgeable videos....
@bhavyakrishnabalasubramani8300
@bhavyakrishnabalasubramani8300 3 жыл бұрын
Hi did you do any preprocessing of the pdf to eliminate the context, title page etc.,?
@satwikram2479
@satwikram2479 3 жыл бұрын
Also, show us how to do the same using a haystack as CDQA is not maintained.
@moreirat
@moreirat Жыл бұрын
Do you still have this notebook to share? I'm studying models and it's a good use case to be explored!
@divi133
@divi133 3 жыл бұрын
Hi I have a quiz question and respective answers for the same. I have around 100 q and a. Can I build a front end design such that if the user types a question he should get the answer from the pdf . How to implement the same.
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
You can try fitting the q&a document instead of pdf I have given and see if it works. If it is in decent format to relate question and answer then it must work. Other option is to annotate and fine tune the bert model. Have you tried elastic search ?. I feel that must be a easier solution
@divi133
@divi133 3 жыл бұрын
@@AIEngineeringLife I'll try to fit my question and answer instead of Pdf and let you know the results. No i have not tried elastic. I'm not as exper as @AIEngineering . Also with respect to the application i asked is it better to use Bert or it can be trained as a chatbot ? Please your suggestion is awaited .
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
Bert might be overkill for this unless you have say structured paragraphs where you want to retrieve answers from it. Elastic search might be simplest one. You can also find some sentence similarity functions that can find closest matched question and retrieve answers. Chatbot I am not quiet sure how you can train for questions you might have
@myu-musicisuniversal2554
@myu-musicisuniversal2554 3 жыл бұрын
hello sir. trying to run this code under Ubuntu VM / Python 3.7. I got stuck with an error related to pdf_converter line - errorr is : Unexpected error: Unable to process file . any suggestions please?
@sadmanskhan7084
@sadmanskhan7084 9 ай бұрын
cdqa package can't install. How to install cdqa?
@mayureshkamble9946
@mayureshkamble9946 Жыл бұрын
Sir I am not able to install cdqa in colab... can anyone help?
@skyleong8497
@skyleong8497 2 жыл бұрын
hello, any idea why couldn't install cdqa? it is showing the below error message: ERROR: Could not find a version that satisfies the requirement cdqa (from versions: none) ERROR: No matching distribution found for cdqa
@belhadjkhadija506
@belhadjkhadija506 2 жыл бұрын
Hello Sky Leong, i have the same problem, did u find a solution ?
@rohitchitte5614
@rohitchitte5614 3 жыл бұрын
Pls attach notebook link in description.
@ayaelghaysh9778
@ayaelghaysh9778 3 жыл бұрын
Is it possible to do this model on Arabic pdf files or not ?
@santoshladi6249
@santoshladi6249 Жыл бұрын
Can't able to install cdqa
@gohelvivek8481
@gohelvivek8481 3 жыл бұрын
Sir how can we deploy this..?
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
You can save the fitted model using - cdqa_pipeline.dump_reader This can be deployed using any web app or anyway as you deploy your regular model
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
Ignore above use below to deploy joblib.dump(cdqa_pipeline, './models/bert_qa_custom.joblib')
@georgeye2759
@georgeye2759 3 жыл бұрын
Thank you very much for sharing this video. It is really helpful. I was wondering if you could help me with this: When I use the latest transformers package (4) -> I get a problem with "no module named 'transformers.tokenization_bert'. I resolved this by installing transformers3.5 However, I now run into : AttributeError: module 'transformers.modeling_bert' has no attribute 'gelu' Any thoughts on this?
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
cdqa is a slightly older module. I am trying to get a new video as well but for this to work is you see dependency they use transformers 2.1 - github.com/cdqa-suite/cdQA/blob/master/requirements.txt
@haidaralinehme1644
@haidaralinehme1644 3 жыл бұрын
You need to work with python 3.7.10 and install cdqa version 1.3.9 with torch version 1.2.0 and i think transformer version needed is 1.2 not 3.5 check it on colab what is the version running there, type "!pip list" it will give you the library and there versions.
@akhila9413
@akhila9413 2 жыл бұрын
​@@AIEngineeringLife can you tell me the newest and the best approach to this problem
@my_opiniondemocracy6584
@my_opiniondemocracy6584 2 жыл бұрын
can you pleaseshare this notebook?
@MahendraSingh-je8ym
@MahendraSingh-je8ym 2 жыл бұрын
pdf conversion errors aa rhe h
@shivangisrivastava9839
@shivangisrivastava9839 3 жыл бұрын
How you find the directory of the model?
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
Which one?. I have provided the directory when I downloaded the model
@shivangisrivastava9839
@shivangisrivastava9839 3 жыл бұрын
@@AIEngineeringLife When you download the model in the code you have defined dir='./models' .I am using ''bert-large-uncased-whole-word-masking-finetuned-squad'' this model but how I can define directory of this model?.
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
@@shivangisrivastava9839 .. cdqa package supports only these 2 models `bert-squad_1.1` `distilbert-squad_1.1`
@shivangisrivastava9839
@shivangisrivastava9839 3 жыл бұрын
@@AIEngineeringLife ok thank you I got it.
@rakeshmk281
@rakeshmk281 Жыл бұрын
Can I use this with pdf of a book?
@srenisandoori1256
@srenisandoori1256 3 жыл бұрын
How can i contact u
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
Using LinkedIn messages. LinkedIn profile is in my youtube channel home page
@michaelanderson5177
@michaelanderson5177 9 ай бұрын
hi, it seems I am install cdqa and i get an error saying it cannot be found
@shravanacharya4376
@shravanacharya4376 2 жыл бұрын
I am getting error while installing cdQA package..can you please tell what can done or any other alternative?
@atulprajapati7073
@atulprajapati7073 2 жыл бұрын
Can you share the notebook please?
@Milley-zo4ed
@Milley-zo4ed Жыл бұрын
I don't want to use pretrained model, i want to train my own model how can i do that?
@nospamman4443
@nospamman4443 Жыл бұрын
buy a really expensive machine or time on a cloud box and run your own models... you'll likely never beat pre-trained models tho
@abhishekprakash9803
@abhishekprakash9803 2 жыл бұрын
hey, this models doesn't generate long answers....its generate only factoid question.....how to genrate long answers...
@sanilpanicker3262
@sanilpanicker3262 3 жыл бұрын
Great work Sri 👍. Have you worked on question answering on database tables?
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
I have tried google AI table based QA just for POC but not to great extent
@sanilpanicker3262
@sanilpanicker3262 3 жыл бұрын
@@AIEngineeringLife thanks, would be great if you can pay some attention to this area. Really appreciate your priceless contributions to academia!
@SudheerKumarKonanki
@SudheerKumarKonanki Жыл бұрын
I am unable to install the cdqa package , please help me
@vaibhavparekh2951
@vaibhavparekh2951 10 ай бұрын
Where can i get the code for this?
@user-or7ji5hv8y
@user-or7ji5hv8y 3 жыл бұрын
Did anybody get this error, after this python code df = pdf_converter(directory_path='./docs/')? ImportError: cannot import name '_is_url'
@12345aniluap
@12345aniluap 3 жыл бұрын
Did you restart runtime after running pip install cdqa? Probably wrong pandas version.
@srinivaspradeep9190
@srinivaspradeep9190 3 жыл бұрын
Hi, i am getting the same error while running in colab. how did you resolve this? thanks
@sangeethamr917
@sangeethamr917 3 жыл бұрын
@@srinivaspradeep9190 hi , am getting the same while running in colab, how did you resolve this issue?
@flywithsufi
@flywithsufi 3 жыл бұрын
Hi sir, if I want to extract all the addresses mentioned in my unstructured PDFs then how to do it. regular expression are not helping. Kindly share your thoughts.. it'll be very helpful
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
Can you not convert it to text and run a Named entity to extract address ?
@flywithsufi
@flywithsufi 3 жыл бұрын
@@AIEngineeringLife THANKS alot sir for replying. I can convert it to raw text using pdfminer like libraries. But I doubt there is any entity which identify full addresses. I mean I don't need to extract only city ,loc, gpe available in spacy . I need to extract full addresses. Kindly guide.
@srenisandoori1256
@srenisandoori1256 3 жыл бұрын
Did u work on natural question corpus?
@moApps
@moApps 3 жыл бұрын
New to machine learning and BERT but this video is amazing! Thank you. I run it in my Colab (by copy/paste each cell since I did not know how to just open the all Notebook directly on Colab :) Anyway i have the following error when i run this cell. The strange thing is if comment those two lines I can still get the correct result when I query the PDF at the bottom of of the code? In any event, thank you for your video and code. Excellent! pd.set_option('display.max_colwidth', None) #BY THE WAY I HAD CHANGE -1 TO NONE (SAME RESULT) df.head() --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 1 ----> 2 pd.set_option('display.max_colwidth', None) 3 df.head() 2 frames /usr/local/lib/python3.6/dist-packages/pandas/_config/config.py in inner(x) 770 if type(x) != _type: 771 msg = "Value must have type '{typ!s}'" --> 772 raise ValueError(msg.format(typ=_type)) 773 774 return inner ValueError: Value must have type ''
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
Ideally None will not work as it expects int. This works for me pd.set_option('display.max_colwidth', -1) Said that that line even if it not there will not impact much as it is only for aligning the output. You can run the code without those lines as well
@moApps
@moApps 3 жыл бұрын
@@AIEngineeringLife THANK YOU! Wow, i did not realize I will get an answer so fast. I really appreciated it. Yes commenting those two lines seems fine. I am having a lot of fun playing with some pdf of mine :)
BERT Neural Network - EXPLAINED!
11:37
CodeEmporium
Рет қаралды 396 М.
Do you choose Inside Out 2 or The Amazing World of Gumball? 🤔
00:19
How do Cats Eat Watermelon? 🍉
00:21
One More
Рет қаралды 8 МЛН
Win This Dodgeball Game or DIE…
00:36
Alan Chikin Chow
Рет қаралды 34 МЛН
Крутой фокус + секрет! #shorts
00:10
Роман Magic
Рет қаралды 18 МЛН
Build a PDF Document Question Answering System with Llama2, LlamaIndex
19:36
Applying BERT to Question Answering (SQuAD v1.1)
21:13
ChrisMcCormickAI
Рет қаралды 58 М.
Learn How To Query Pdf using Langchain Open AI in 5 min
10:22
Krish Naik
Рет қаралды 100 М.
Exploratory Data Analysis with Pandas Python
40:22
Rob Mulla
Рет қаралды 474 М.
Extract PDF Content with Python
13:15
NeuralNine
Рет қаралды 216 М.
ML Was Hard Until I Learned These 5 Secrets!
13:11
Boris Meinardus
Рет қаралды 306 М.
How I Would Learn Python FAST in 2024 (if I could start over)
12:19
Thu Vu data analytics
Рет қаралды 347 М.
Question Answering using Transformers Hugging Face Library || BERT QA Python Demo
9:57
Do you choose Inside Out 2 or The Amazing World of Gumball? 🤔
00:19