GPT-4 Vision API: Best Way to Copy Text from Image (OCR in Python)

Рет қаралды 13,901

Күн бұрын

OpenAI recently released the GPT Vision API allowing developers to use the amazing vision analysis capability available inside ChatGPT plus. I wanted to test the results of doing text extract from a picture of a form to see how accurate the OCR capabilities were. Also, how well it structured the data file that was outputted. As you will see in the video, the results were very impressive.
Link to project in GitHub:
github.com/AI-...

Пікірлер: 48

@maciejlegowicz5834 6 ай бұрын

Lovely. Really inspiring. With little knowledge of Python, I managed to do this. Seeing only 5.1k views in 3 months make me happy as it looks like not so many people are interested in this subject ;-) a lot of business opportunities

@aiunleashed509 6 ай бұрын

Really glad you got it going! I'm going to have a similar video on using new google gemini to do this coming up

@murilodsa 8 күн бұрын

Very good. It helped a lot!

@jonathanvandenberg3571 9 ай бұрын

Great stuff, and cool ideas at the end!

@kevon217 9 ай бұрын

Super impressive capabilities these days.

@aiunleashed509 9 ай бұрын

It really is. Google started rolling out Gemini yesterday which will have vision capability as well...getting crazier by the day!

@anasbahttoch0913 5 ай бұрын

Thanks Man, it helped a LOT!

@micbab-vg2mu 9 ай бұрын

This API is great - thank you for the video. I wonder if it is able to recognise hand writing in diffrent languges than ENG.

@aiunleashed509 9 ай бұрын

Thanks! Generally speaking OpenAI says "The models are optimized for use in English, but many of them are robust enough to generate good results for a variety of languages." I have found in my testing that it will usually work pretty well on languages whose character set is close to English (like French for example). But if it's totally different characters like Japanese it doesn't perform as well.

@Great_Muzik 4 ай бұрын

Great video! Please do a tutorial on how to convert scanned pdf files in a Google Drive folder to Excel using GPT-4 Vision. Thanks!

@aiunleashed509 3 ай бұрын

Great suggestion! I will look to add that to future video

@eugenl4340 8 ай бұрын

Thanks for the video!

@benjaminsaladin1345 6 ай бұрын

I had the exact same idea today, especially with the functions calling. Did you manage to get that to work? Cool video, btw🎉

@aiunleashed509 6 ай бұрын

I haven't had a chance but will be doing more auto classification on a new project I am working on so will do some followup videos soon

@vadymivanenko8591 7 ай бұрын

In my case it cannot recognize some characters. He confuses the 2 with Z, 6 with G, etc. It happens only in lines of random characters like (G12300HO). I don't even know how to teach it. I have set the temperature to zero.

@DuneKraftwerk 7 ай бұрын

Nice video btw. Sorry i do not share your excitement as I tried with more beefy images, like engineering drawings like P&ID, Location or connection diagram. I am able to get some information when I zoom the images to a certain level, but full scale GPT tell me to use a OCR software. Also, you cannot get the bonding rectangles of your text for further processing with html and css. So I will stick with Google Vision API for now to do this, i guess it is less expensive than GPT anyway offering a free tier of 1000 images/month and much faster.

@aiunleashed509 7 ай бұрын

Fair enough, my experience has been more on automating corporate document processing. I had good luck with those type of docs like invoices and forms. It seems from the comments it falls apart quickly with more complicated use case like you describe. But remember this API is still in preview, and will get much better from here

@pabloenzozanitti3411 6 ай бұрын

Great video. Quality appears to have degraded heavily since this video. Sometimes it outright refuses to scan images containing names as they're personal information

@aiunleashed509 6 ай бұрын

Thanks for watching. I agree, and have had many comments that the quality has degraded since initial release which is disappointing (it's supposed to get better over time right). That is definitely the advantage of using a local open source model for this purpose. I've got my eye on this for future videos

@i.am.rossalex 7 ай бұрын

If an image is not good quality, but readable for human, it can recognize a text with mistakes. Tested on estimates that was sent via Whatsapp.

@aiunleashed509 7 ай бұрын

Yes that's a great use case. It's also nice in that it can automatically correct spelling and format from the estimate. Lets say its handwritten and the estimator spelled a product wrong, it can usually detect and fix that leading to better data and analytics

@i.am.rossalex 7 ай бұрын

@@aiunleashed509 but it made mistakes in digits, not in words. So it will be difficult to fix.

@uplifthabesha754 Ай бұрын

'message': 'The model `gpt-4-vision-preview` has been deprecated

@aiunleashed509 Ай бұрын

you can just use gpt-4o now, its multimodal including vision. Check my channel I did a video on it.

@kirk7784 5 ай бұрын

I'm curious what are strategies for parsing born digital pdfs, the data is already there so it just needs to go and grab it without ocr right? How would that work?

@aiunleashed509 5 ай бұрын

Yes you are right. If the PDF already has a text layer being digital born, you can check that first. However, sometimes this text layer isn't good, and misses some text that is displayed as graphics. If you are optimizing for quality I would probably use both methods and let the AI structure from both datasets. This will cost much more in API credits though....

@glowmarkdesigns 7 ай бұрын

Hey! I have an excellent use for this to help small business but have no idea how to make it work. Could we talk about it to see if it's something that could be done?

@aiunleashed509 7 ай бұрын

For sure, just email: int.unleashed@gmail.com and we can chat

@remisanlis5344 2 ай бұрын

Hi, very interesting video ! Watching from Paris, France ! I'm currently developing a solution based on that, is it possible to talk together briefly on messenger or something else ?

@aiunleashed509 Ай бұрын

Hey, would love to chat email me: int.unleashed@gmail.com

@SunilSamson-w2l 2 ай бұрын

Does it also work with PDF files instead of .img ?

@aiunleashed509 2 ай бұрын

it says: We currently support PNG (.png), JPEG (.jpeg and .jpg), WEBP (.webp), and non-animated GIF (.gif). So no you would have to convert it first

@SunilSamson-w2l 2 ай бұрын

@@aiunleashed509 thank you !

@adammessier4431 8 ай бұрын

Hi, is it necessary to pay to get access to this API.

@aiunleashed509 8 ай бұрын

Yes it is billed like the other APIs. Check out the details here under GPT-4 Turbo -> gpt-4-1106-vision-preview. openai.com/pricing It even has a pricing calculator where you put in the resolution of the image and it says how much it would cost to analyze

@pabloenzozanitti3411 6 ай бұрын

It is. You need credits in your OpenAI account, but free credits won't do, you need to fund at least $5 to enable this model

@kurniadrajat4267 7 ай бұрын

this is free api from chatgpt? or pay? , i cant run this program

@aiunleashed509 7 ай бұрын

This is a pay per usage API. because it deals with images which are more token intensive it is more expensive than the standard

@tommydavies2426 7 ай бұрын

Hi there, I have tried to implement something similar and I get the response saying things like this: I'm sorry, but I am unable to access external links or view images, so I cannot analyze the image or read any text from it. My capabilities are limited to processing and generating text-based information. If you can provide the text from the image, I'd be happy to help analyze or discuss it with you. However, if I use GPT4 chat window as normal, upload my invoice, it can read it no problem. Have you came across this?

@aiunleashed509 7 ай бұрын

Hi, I haven't seen that myself, but haven't used the vision API in a few weeks. I heard a new version is coming soon, so maybe some degradation on this preview. Could you send me a link to one of the test images and I will give it a try

@pabloenzozanitti3411 6 ай бұрын

Make sure you have set the vision model instead of a regular gpt model

@abhishekgaikwad6105 6 ай бұрын

Will it work for bad hand written text ??

@aiunleashed509 6 ай бұрын

In my testing it did pretty well, but of course their is limits. I am actually working on testing out the different Vision APIs with handwriting and should have a video about it soon