Extract Text from PDF with Python

Рет қаралды 39,997

Chart Explorers

Күн бұрын

Пікірлер: 35

@TheBtrivedi Жыл бұрын

Amazing. Clear explanation of what's being done. Subscribed.

@MURALIKRISHNAhai 3 жыл бұрын

We love the work you do, Probably you might save someone's day everytime when you upload a new video Thank you 😊The way you include all the viewers in those appreciations after completing a task is awesome,ohhhoooo... we did great👍

@darrenu2939 20 күн бұрын

Nicely done!

@abdullah07757 Жыл бұрын

Fun to watch hard to comprehend

@shilpakale2699 2 жыл бұрын

Very nicely explained, I would like to know if page in pdf has header or footer and extract Page No's which has header/footer. Can we have this scripted using pyPDF2?Please advise

@cbao97 11 ай бұрын

Amazing tutorial. I noticed there is a /n at the end of each line. Is there anything we can do to detect the whole paragraph?

@johnkhan174 2 жыл бұрын

Hello, I found a small bug in the code. If 'Waldo' exists in two places on the same page (in different sentences, the second 'Waldo' is not found. Can you provide a fix? Thanks!

@KyroAtelerix 2 жыл бұрын

And if I only eant to extraxt some keywords across multiplepages in 100 pdfs what I might do for it? I dont want all the text, only few words

@kingfunny4821 Жыл бұрын

How can take only highlight text in pdf

@ajsunofficial6798 2 жыл бұрын

In case of page extraction say we want to extract page 2 and page 5..do we use in getobj.pageS(2,5)

@JM-fr9bc 3 жыл бұрын

How about using a for loop to extract a text title to another for multiple pdfs?

@ChartExplorers 3 жыл бұрын

Hi Johan, good question. I'm not sure exactly how to do this. I'll see what I can figure out but I wont be able to get to it for a few days. Can you send me another message in a few days time and see what I have come up with? Thanks!

@JM-fr9bc 3 жыл бұрын

@@ChartExplorers Thanks so much for your response. Just to re-state, I'm trying to use a loop to extract a particular section, let's say called "lessons", from a file of PDFs. and output into an excel. Any help would be great! Thanks so much!

@JM-fr9bc 3 жыл бұрын

@@ChartExplorers Hi! Were you able to come up with anything? :)

@KyroAtelerix 2 жыл бұрын

@@ChartExplorers do you solved it?

@saurabhverma2155 2 жыл бұрын

Sir, can you guide me how to extract text (of specific coordinates) from pdf file ?

@ChartExplorers 2 жыл бұрын

Hmmmm, good question. I'm assuming by coordinates you mean on a specific page at a specific location (ex: page 2, 30 mm down and 20 mm right to 90 mm down to 50 mm right? Or something to that effect?

@srikrithibharadwaj6779 3 жыл бұрын

Well explained. Thank you so much.

@yizzi25 2 жыл бұрын

Does the code work if there are multiple keywords in the same sentence?

@vrbaac1641 3 жыл бұрын

hi, very nice video ^^... but will this procedure work if I need to extract certain text strings in PDFs generated from Autocad Drawings? thanks ^^

@ChartExplorers 3 жыл бұрын

Hi VR Bacc, good question. I'm assuming the text in the Autocad PDFs are "image" like which makes things trickier. Can you highlight the text on the PDF? If so, you should be able to grab the text using the methods described in this video. If not, then you will have to use something called pytesseract to get text from the pdf. Let me know if this is the case for you and I can try to make a video on how to do this.

@vrbaac1641 3 жыл бұрын

hi @@ChartExplorers thank you for your reply.. Actually I have tried some of the python procedures such as PyPDF2, 4 and others... but with no success... the script runs but there is no output... yes the text in the PDF from AutoCAD drawings are selectable regardless of the orientation... so I am thinking somehow there should be a way to get those texts but I am not sure how... We will greatly appreciate if you could make a tutorial about this ^^... thank you so much...

@ChartExplorers 3 жыл бұрын

I'll start working on a video. Would you feel comfortable sending me the pdf. If so, I'll send you my email and I'll practice on your PDF to everything works for your situation. If not, that's perfectly fine I have a few other examples I can use.

@vrbaac1641 3 жыл бұрын

@@ChartExplorers oh!!!! thank you so much... i think I can send you a portion of the drawing, specifically the title block area. The text on the pdf is selectable... I can send the file to you by email ^^...

@ChartExplorers 3 жыл бұрын

@VR Baac, awesome. This way I can make sure it works in your specific situation. My email is bradonvalgardson@gmail.com

@cars_worldcw488 3 жыл бұрын

Please how to avoid the line break problem for some paragraphs in your result ??

@adamrassi3516 2 жыл бұрын

Hey, I'm new to this and using Thonny to edit and run code. When I get to exacting the text, a notepad file is opened but the text from the PDF is not written there. Any clue why this would happen.

@Atharv-wm3vr 2 жыл бұрын

how you installed pypdf, when i wrote it it says not found

@bryanl6300 Жыл бұрын

New to Python coding. Sorry for the stupid questions: I have ran the following CMD: pip install PyPDF2 Collecting PyPDF2 Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB) ---------------------------------------- 232.6/232.6 kB 4.8 MB/s eta 0:00:00 Installing collected packages: PyPDF2 Successfully installed PyPDF2-3.0.1 IDLE throws this error: from PyPDF2 import PdfFileReader ModuleNotFoundError: No module named 'PyPDF2' What am I missing???