Extract Tables from Image Documents | Paddle Paddle | Paddleocr | OCR

Extract Tables from Image Documents | Paddle Paddle | Paddleocr | OCR | Text Extraction |

Рет қаралды 53,213

Karndeep Singh

Күн бұрын

Пікірлер: 61

@MitvHome-x8y Жыл бұрын

Thank you so much . I do the same as you but I always get an empty Excel. Why would it be?

@hemu2723 10 ай бұрын

Hey, have you got the mistake?

@ram_rahim_creations_officials Жыл бұрын

Hi @karndeep Thank you for sharing. Will it work if my table doesn't have vertical and horizontal lines?

@ShreyasG-d2n 4 ай бұрын

it should

@nomuchohan Жыл бұрын

Dude, please explain how to use the PPstructure from paddle paddle into our own custom code

@ajithn7336 10 ай бұрын

I tried and i always get an empty excel only.

@xy4611 24 күн бұрын

same

@niroshiniedayaratne4066 2 жыл бұрын

My output is always empty xlsx file. What could be the reason? Thanks in advance!

@karndeepsingh 2 жыл бұрын

May be OCR is unable to read the table content

@kishoripawar2522 Жыл бұрын

@@karndeepsingh Is there any prerequisite for input image? Like resolution more than X or something like that? Because for me as well, output is empty.

@kishoripawar2522 Жыл бұрын

@@karndeepsingh Eve with high resolution image output is empty, when I checked show.html, the blue box is not able to correctly locate the table in image. So I think as there is no text inside blue box, there is empty csv. Please correct me if I am wrong.

@pavitrabiradar6334 Жыл бұрын

@@kishoripawar2522 even iam getting empty xlsx as output did you found any solution?

@보라색사과-l1r Жыл бұрын

any update for this issue? I am facing this issue after trying another ocr model... please help

@avikalchauhan9907 Жыл бұрын

when I am running the code predict_table.py file is not there

@kiddicode6897 2 жыл бұрын

How can I apply Google Vision after table is recognized?

@venkatesanr9455 2 жыл бұрын

Thanks for the great explanation and video. I have some doubts like 1. Is paddleocr is open source library and anyone can use? 2.Whether we can fine-tune ocr models like easyocr, paddleocr libraries, Kindly reply and share links that will be useful for reading/learning purpose. 3. Whether huggingface library has ocr models?

@karndeepsingh 2 жыл бұрын

1. Yes, paddlepaddle is an open source library. 2. You can train OCR model using paddleocr 3. Huggingface may not have OCR models.

@venkatesanr9455 2 жыл бұрын

@@karndeepsingh Thanks for your kind replies.Can you share any links for finetuning models of easyocr/paddleocr( I hav searched for easyocr but not obtained proper links for finetuning tasks)

@karndeepsingh 2 жыл бұрын

@@venkatesanr9455 you can check paddleocr github for the same.

@venkatesanr9455 2 жыл бұрын

@@karndeepsingh Ok thanks a lot

@NickWindham 2 жыл бұрын

@@venkatesanr9455 Watch his video titled OCR Text from PDFs and Image Documents using docTR | Better than Tesseract OCR | Text Extraction

@ganeshrajv130 2 жыл бұрын

wont this support long image table

@eliaweiss1 11 ай бұрын

Thanks, all I get is empty cells

@jayeshnikam3279 Жыл бұрын

This is kind of urgent. What if on some page half of the table is in one page and other half is on 2nd page. What can be done on such situation? Will the model recognize it??. i highly expect your answer as I am currently working on it. Thank you! :)

@karndeepsingh Жыл бұрын

In such situations, you need to search identifier in the page that consider that half of the information in going to next page. Model can only help you extract or detect table but on top of that you need to apply logic to know whether its full information or half information

@poojabhandari631 2 жыл бұрын

getting this error error: legacy-install-failure × Encountered error while trying to install package. ╰─> PyMuPDF what to do??

@Smddlvvs 2 жыл бұрын

How to make this code work on pdf files with multiple pages

@karndeepsingh 2 жыл бұрын

Pass each page of PDF to the model

@Smddlvvs 2 жыл бұрын

@@karndeepsingh i have tried but i am unable to iterate

@texasfossilguy 2 жыл бұрын

you need to write code to iterate each page of it. Ask chatgpt or google that, ive seen it. If I find it Ill let you know.

@Smddlvvs 2 жыл бұрын

@@texasfossilguy yaaaa pls let me know if you find one

@AliAlias 2 жыл бұрын

Use other python libraries to extract pdf to images then ocr it one by one using loop 😊

@louieelumbaring1790 2 жыл бұрын

how did you get the vqa folder? Sorry I was trying to do all the steps you did and find error on the last line, i have no idea to fix it. Thanks in advance! [Errno 2] No such file or directory: 'PaddleOCR/ppstructure' /content/PaddleOCR/ppstructure/inference Traceback (most recent call last): File "/content/PaddleOCR/ppstructure/table/predict_table.py", line 230, in main(args) File "/content/PaddleOCR/ppstructure/table/predict_table.py", line 149, in main image_file_list = get_image_file_list(args.image_dir) File "/content/PaddleOCR/ppocr/utils/utility.py", line 60, in get_image_file_list raise Exception("not found any img file in {}".format(img_file)) Exception: not found any img file in /content/PaddleOCR/ppstructure/table/image1.png

@rivamalik9575 Жыл бұрын

Provide absolute path to the image that is placed in drive. For example /content/gdrive/MyDrive/PaddleOCR/ppstructure/table/image1.png and also ensure that the image is place in the table folder that you have mentioned in the exception statement.

@pavitrabiradar6334 Жыл бұрын

Hello Iam always getting output as empty xlsx file could you please help me here.

@karndeepsingh Жыл бұрын

May be OCR is not working that great. You may consider replacing OCR.

@ShivShankarDutta1 2 жыл бұрын

getting this error executing #%cd PaddleOCR/ppstructure !python3 /content/PaddleOCR/ppstructure/table/predict_table.py --det_model_dir=inference/en_PP-OCRv3_det_infer --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=/content/PaddleOCR/ppstructure/table_2.png --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ./output/table Traceback (most recent call last): File "/content/PaddleOCR/ppstructure/table/predict_table.py", line 30, in import tools.infer.predict_det as predict_det File "/content/PaddleOCR/tools/infer/predict_det.py", line 31, in from ppocr.data import create_operators, transform File "/content/PaddleOCR/ppocr/data/__init__.py", line 35, in from ppocr.data.imaug import transform, create_operators File "/content/PaddleOCR/ppocr/data/imaug/__init__.py", line 47, in from .ct_process import * File "/content/PaddleOCR/ppocr/data/imaug/ct_process.py", line 22, in import Polygon as plg ModuleNotFoundError: No module named 'Polygon'

@rohithuria1168 2 жыл бұрын

how to fix this error ?

@goswamidivyang2010 2 жыл бұрын

@@rohithuria1168 Did you get any fix for that? I am also facing the same error

@luisvite3766 2 жыл бұрын

Me too

@xy4611 24 күн бұрын

maybe !pip install polygon

@shobhitsadwal6081 11 ай бұрын

it is not working for me .

@rajeshroyal5922 2 жыл бұрын

i have tried with vs code and colab but iam getting this error python3: can't open file '/PaddleOCR/ppstructure/table/predict_table.py': [Errno 2] No such file or directory

@thepresistence5935 2 жыл бұрын

change the path bro

@rajeshroyal5922 2 жыл бұрын

@@thepresistence5935 I tried with change of path also getting same error

@thepresistence5935 2 жыл бұрын

@@rajeshroyal5922 It's working fine for me, put quotes.

@vogel2499 2 жыл бұрын

I suspect text ocr is independent from table detection/recognition. You could replaced it with easyocr/pytesseract without ruining the structure.

@shwetabhilare9473 Жыл бұрын

[Errno 2] No such file or directory: 'PaddleOCR/ppstructure' /content/PaddleOCR/ppstructure/inference Traceback (most recent call last): File "/content/PaddleOCR/ppstructure/table/predict_table.py", line 30, in import tools.infer.predict_det as predict_det File "/content/PaddleOCR/tools/infer/predict_det.py", line 31, in from ppocr.data import create_operators, transform File "/content/PaddleOCR/ppocr/data/__init__.py", line 35, in from ppocr.data.imaug import transform, create_operators File "/content/PaddleOCR/ppocr/data/imaug/__init__.py", line 47, in from .ct_process import * File "/content/PaddleOCR/ppocr/data/imaug/ct_process.py", line 22, in import Polygon as plg ModuleNotFoundError: No module named 'Polygon' getting this error please help.

@madhavkumarpancholi9842 Жыл бұрын

get to the point dude.

@anouaraadoud58 Жыл бұрын

Errno 2] No such file or directory: 'PaddleOCR/ppstructure' /content/PaddleOCR/ppstructure/inference Traceback (most recent call last): File "/content/PaddleOCR/ppstructure/table/predict_table.py", line 230, in main(args) File "/content/PaddleOCR/ppstructure/table/predict_table.py", line 153, in main table_sys = TableSystem(args) File "/content/PaddleOCR/ppstructure/table/predict_table.py", line 67, in __init__ self.text_detector = predict_det.TextDetector(copy.deepcopy( File "/content/PaddleOCR/tools/infer/predict_det.py", line 141, in __init__ self.predictor, self.input_tensor, self.output_tensors, self.config = utility.create_predictor( File "/content/PaddleOCR/tools/infer/utility.py", line 199, in create_predictor raise ValueError( ValueError: not find model.pdmodel or inference.pdmodel in inference/en_PP-OCRv3_det_infer

@SaniyaFarash Жыл бұрын

I am getting the same error. please tell how to solve this

@rajeshroyal5922 2 жыл бұрын

i can't open predict_table.py file getting the same error python3: can't open file '/PaddleOCR/ppstructure/table/predict_table.py': [Errno 2] No such file or directory how can i resolve

@kiddicode6897 2 жыл бұрын

%cd /content/PaddleOCR: go to the Path !mkdir inference: create folder "inference" inside the Path below "/content/PaddleOCR" %cd /content/PaddleOCR/inference: go to the PATH download and unzip file inside "inference"