How To Read PDF Files in Python using PyPDF2

Рет қаралды 77,140

Күн бұрын

Пікірлер: 53

@himankshekher8645 4 жыл бұрын

Keep doing the great work Mukesh.. I have learnt alot from your channel.. will continue to do so.. you upload small videos which is good to me since I get bored in one hour kinda video and loose interest in the topic Thanks for all you efforts..

@Mukeshotwani 4 жыл бұрын

Hi Himank thank you so much, I am glad you liked Python series.

@snicket87 2 жыл бұрын

Thank you! Helped my project from across the world! Greeting from Brazil!

@wendyesquivel8179 2 жыл бұрын

this is exactly what I was looking for! Thanks :D

@elizabethhanks1042 3 жыл бұрын

YESSSS thank you for your help!!

@tenzindorjee7689 2 жыл бұрын

Thank you somkuch bhaiya, really helped my project 🙏🙏🙏❤️

@Mukeshotwani 2 жыл бұрын

Thanks Dorjee

@KrishnaReddy-zz4yu 3 жыл бұрын

What is the difference between jupyter notebook and Pycharm? In Jupyter, where we use Pandas to open a PDF/CSV/TXT files. Which is efficient to learn and apply in real time?

@emanuelalves593 Жыл бұрын

Very useful! Thank you!

@nitinkumarshukla6967 Жыл бұрын

Can we do the same thing with uploaded pdf by user?

@bigbro1231000 2 жыл бұрын

Hello thank u for the informative video I have a problem compiling the code the pypdf gives me error progressbar not recognised how to solve it please

@verkar1965 3 жыл бұрын

Hi, is it a way to get splitted lines instead of just one merged line ? thank you

@KhalilYasser 4 жыл бұрын

Thanks a lot. Very useful.

@Mukeshotwani 4 жыл бұрын

You are welcome! Yasser

@Kenneth-f5d Жыл бұрын

How to get the title of the PDF's content such as "A Simple PDF File"?

@SandeepKumar-px4kf 2 жыл бұрын

please can you make one lacture that is taken input as excel file and output is docx file for that excel file

@sidstarsiddhu9275 3 жыл бұрын

sir i am having attribution error at line 3: reader=PyPDF2.pdfFileReader(file) AttributeError: module 'PyPDF2' has no attribute 'pdfFileReader'

@Mukeshotwani 3 жыл бұрын

Hi Sidstar seems you have not installed lib properly. please try installing again and if you are working in Pycharm then do add in Pycharm too.

@Kmysiak1 3 жыл бұрын

its PdfFileReader() not pdfFileReader()

@KkdvPrasad 3 жыл бұрын

Mukesh can we have an similar logic in Eclipse using Java?

@en_coded 2 жыл бұрын

can you do a simple write? I haven't seen a write 'hello world' its always read this pdf and write it on another pdf. what if I just want to start a pdf with strings and images...??

@LeZinZin95 2 жыл бұрын

It worked, thank you

@Mukeshotwani Жыл бұрын

You're welcome!

@subhransupanda7052 4 жыл бұрын

This will work for a simple PDF file but for a complex PDF where we have tables ,multiple pages,images,non English character ,there it would not work...could u plz show us reading a complex PDF file..

@Mukeshotwani 4 жыл бұрын

Hi Subhranshu current approach will fetch the data from tables as well. Multiple pages already covered in video. When it comes to image you can open pdf in rb (read binary) mode which will return binary data. For non english char you can change the enconding. I will try to make video on this.

@subhransupanda7052 4 жыл бұрын

@@Mukeshotwani Thanks a lot Mukesh..plz make a video on dis..u r a great inspiration for us..we are waiting for dat video ..

@NOTHING-j2h 4 жыл бұрын

How we can compare two pdfs where contents on both pdfs are same but they positioned in different locations of pdfs. We can’t compare line by line.

@Mukeshotwani 4 жыл бұрын

Hi Ankur, above video is when you need to validate specific String or keyword in pdf. When it comes to comparing two pdfs then we have many lib in python which can help you. Please explore the same. One of the lib is pypi.org/project/diff-pdf-visually/

@kamal3777 3 жыл бұрын

İ try to extract the text but it just gives an empty string

@Mukeshotwani 3 жыл бұрын

Please debug your code. I have dedicated video on How to debug your code.

@ahmadrahmatulloyev162 Жыл бұрын

very useful. Thanks

@nazishsultana5273 3 жыл бұрын

Sir plz help me my code is not work it give warning xref table not zero index .I'd no for object will be corrected [pdf.py:1736]😢😢😢😢

@suhelmallick Жыл бұрын

a lot is old syntax. mine is the newer ayntax import PyPDF2 file=open('Ansible+Roles.pdf', 'rb') reder=PyPDF2.PdfReader(file) print(len(reder.pages)) page1=reder.pages[1] #print(page1.extract_text()) pdfdata=page1.extract_text() assert "PRINCE" in pdfdata print("PRINCE" in pdfdata)

@srirajid 3 жыл бұрын

May I know where you are writing the code

@Mukeshotwani 3 жыл бұрын

Hi Raji I am using Pycharm kzbin.info/aero/PL6flErFppaj3FhVG-3RGGQx-Mvj7DXrpX

@centrodoreforco-aulasderef7743 2 жыл бұрын

Perfect!

@ayushmittal2754 2 жыл бұрын

what to do for extracting all pages of pdf. I have been searching for this solution for last 24 hours

@Mukeshotwani 2 жыл бұрын

Hi Ayush, you can run a loop which will iterate all the pages one by one.

@wilianuhlmann5284 2 жыл бұрын

@@Mukeshotwani You can give an example plz?

@freedoom4090 2 жыл бұрын

very nice! does it work without java?

@Mukeshotwani 2 жыл бұрын

Yes for Java we have diff lib

@freedoom4090 2 жыл бұрын

@@Mukeshotwani thanks!

@lasnroo 2 жыл бұрын

how can I read line by line?

@lokusok5080 2 жыл бұрын

from PyPDF2 import PdfReader reader = PdfReader("file.pdf") all_pages = reader.pages for page in range(len(all_pages)): text = all_pages[page].extract_text() for line in text.split(" "): print(line)

@logapriyas6911 2 жыл бұрын

When I follow the above instructions I get superflous whitespace error 🙂 can any one help me with this issue

@monicalelli5369 2 жыл бұрын

import isn't working, any hint please?

@Jason-ot6jv 2 жыл бұрын

make sure yo do 'pip3 install PyPDF2' in the terminal

@thekyreefuller 2 жыл бұрын

@@Jason-ot6jv Hi Jason, I did this an in my terminal it says "Requirement already satisfied". I'm still getting the same "No module named PyPDF2" issue. Any thoughts?