Extract text from Any PDF File (even scanned ones) using OCR pytesseract in 3 SIMPLE STEPS!

Рет қаралды 23,102

Күн бұрын

Пікірлер: 53

@swetharamshetty109 Жыл бұрын

HI! Thanks a lot for the extraction , i want to convert a scanned pdf to editable word doc.In the above video the accuracy is 97% only

@techwithzoum Жыл бұрын

Hi Swetha, You're welcome! Can you please elaborate more your question?

@sarasa971 Жыл бұрын

how to add other language in the code ? Thank you for the great explanation 👏🏼

@dyzy2203 2 жыл бұрын

Thanks a lot. The code works smoothly. Nice. Can you find, extract a table from a scanned PDF and save it into a dataframe ? Thx

@amanrohada9008 2 жыл бұрын

Did you find something to extract table from scanned PDF?

@harshvardhanmishra1256 3 ай бұрын

Did you both have found that? If yes then please help me out with this I am reaching the deadline and have to complete the task.

@kenvinmq Жыл бұрын

Thank you bro, I’ll try that out

@techwithzoum Жыл бұрын

You are welcome!

@zsuzsannakristof2117 Жыл бұрын

Hi, can you modify the code that way, that the new file ext to the text contains the orginal page settings and structur of the orginal pdf. Like the text is in the same place where it was in the orginal pdf

@techwithzoum Жыл бұрын

Hi Zsuzsanna, I am not sure I understand your request. Can you please elaborate for better assistance?

@omuskaikar-gs1cs Жыл бұрын

there is a OCRmyPDF force -ocr library it retains the original format of pdf

@sjohn-777 6 ай бұрын

Thank you!

@techwithzoum 6 ай бұрын

You're welcome!

@RunRonaldRun Жыл бұрын

Works great, thank you so much.

@techwithzoum Жыл бұрын

You're very welcome, Charl!

@kibtiachowdhury6011 2 жыл бұрын

Thanks a lot. The code works. I want to get paragraphs and titles without any tables or figures. How can I solve this?

@easylife891 Жыл бұрын

fantastic work

@techwithzoum Жыл бұрын

Thank you!

@davisengelis272 Жыл бұрын

thanks a lot!

@techwithzoum Жыл бұрын

You're very welcome, Davis!

@hrishishetty9322 3 жыл бұрын

Thank you so much for the help!

@techwithzoum 3 жыл бұрын

You're welcome! Do net hesitate to drop ideas of video!

@cherlynang2965 Жыл бұрын

does this work on folder with multiple PDF files?

@techwithzoum Жыл бұрын

Yes, it does Cherlynang

@chepkoechfancy7553 3 жыл бұрын

Can this code work with pdf in url format? If so, kindly help lines of code to handle such

@ravimakwana5290 Жыл бұрын

Sir can you make a video on that like we have to extract the paragraph under the title from pdf.

@techwithzoum Жыл бұрын

Sure, Ravi! I will explore that!

@sivachaitanya6330 9 ай бұрын

what version used in this, when i use it gives me poppler path error and tesseract install in pc and path settting error.....

@jeyapauldavid5596 5 ай бұрын

Unable to get page count. Is poppler installed and in PATH? the errror is comming

@techwithzoum 5 ай бұрын

This may be because your system can not access the 'poppler' module. Here is how to set up on a Windows machine: 1. Download the poppler package from this website: poppler.freedesktop.org/ 2. Unzip it in the C:\Program Files (x86) folder 3. Provide the bin folder into a variable you name as follows poppler_path= r"C:\Program Files (x86)\poppler-24.02.0\bin" I hope this helps.

@mohammednisar1458 3 жыл бұрын

PDFPageCountError: Unable to get page count.I/O Error: Couldn't open file 'C:\Users\Naseer\Desktop\OCR-main\data\First Cry Image.pdf': No error.

@avinashkrishna8695 Жыл бұрын

i'm getting an error, Output exceeds the size limit. Open the full output data in a text editor

@techwithzoum Жыл бұрын

Hi Avinash, Can you tell more about which line the error occurs?

@jardanijonovich1951 Жыл бұрын

Hi, came across ur video after multiple failed attempts of converting my file. Can I somehow ignore the Headers and footers. Also, I have bulletins in my documents and some of the bulletins are on the next page; how do I take care of that? Thanks in advance!!

@shainialakumbura5829 3 жыл бұрын

PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? why am I getting this error

@kiranvanukuri9382 3 жыл бұрын

U have load many PDFs at a time??

@mallikarjunyadav591 2 жыл бұрын

I am getting same error

@emanuelcalderon2912 6 ай бұрын

brew install poppler for macs, or install popler somehow for windows.

@QorQar Жыл бұрын

هل يمكن مثال على استعمال الكود واين يوضع وكيف اشغله

@avbendre Жыл бұрын

: Failed to activate VS environment: Could not find C:\Program Files (x86)\Microsoft Visual Studio\Installer\vswhere.exe any solution to the above error please telll

@techwithzoum Жыл бұрын

Can you please refer to this discussion on stackoverflow? It might be similar to what you are facing stackoverflow.com/questions/54305638/how-to-find-vswhere-exe-path

@avbendre Жыл бұрын

@@techwithzoum thank you the error resolved when added path in sys variables of poppler and pytesseract and installed pytesseract.exe

@techwithzoum Жыл бұрын

@@avbendre congratulations!