Automate OCR with GPT-4o and Power Automate

  Рет қаралды 12,221

DamoBird365

DamoBird365

Күн бұрын

Пікірлер: 68
@lumbinibhw4463
@lumbinibhw4463 5 ай бұрын
Great work, Damien! Inspired by your example, I upgraded my existing expense app with GPT-4 OCR capabilities. Thank you so much!
@mannymorales977
@mannymorales977 6 ай бұрын
Damo, nice work - thank you for producing and sharing this excellent information. Kudos for throwing in a bit of debugging insight as well! Looking to continued content!
@mannymorales7913
@mannymorales7913 6 ай бұрын
Great work, Damien! Thank you for producing and sharing the awesome video. It inspires me to keep pressing on with Power Automate integration with AI.
@DamoBird365
@DamoBird365 5 ай бұрын
Fantastic to hear 👍
@duygusalsacmaliklar
@duygusalsacmaliklar 8 ай бұрын
Hi Damo, thanks for the content. It is highly professional with chapters, description, extra links and the way you explain how to do it is really amazing.
@DamoBird365
@DamoBird365 8 ай бұрын
Thank you very much for kind words 👍
@JonDoesFlow
@JonDoesFlow 8 ай бұрын
Damien is the master.
@toluvictor
@toluvictor 8 ай бұрын
Thanks Damo, this is awesome. This is something I wanted to get started with and I was just looking for an ideal use case
8 ай бұрын
Multimodal LLMs just made the OCR simpler... Great use cases Damien and as always superb demo, great you kept the base64 error as well 🔥🚀😍
@DamoBird365
@DamoBird365 8 ай бұрын
Yeah, pretty cool stuff isn’t it. Glad you liked the inclusion of the error. The amount of errors that I edit out too, you wouldn’t believe it.
8 ай бұрын
@@DamoBird365 No doubt. Respect for your time and effort to create such engaging content 😍
@XEQUTE
@XEQUTE 8 ай бұрын
@@DamoBird365 haha
@paolovr1970
@paolovr1970 2 ай бұрын
Amazing thanks 😊
@DamoBird365
@DamoBird365 2 ай бұрын
The low code version has just arrived. I am very close to releasing a video albeit there are troubleshooting issues to overcome.
@tylerkolota
@tylerkolota 7 ай бұрын
I started exploring upgrading some flows based on my OCR & GPT3.5Turbo template to GPT4o and I'm finding there are some limitations not mentioned in the video. First, GPT4o requires the upload documents to be images, so png or jpg. That I expected. But the built-in Microsoft convert PDF to jpg action only converts the 1st page of a PDF to an image. So to use GPT4o on multi-page invoices or other documents would require a subscription to a 3rd party service like Encodian or Adobe in order to convert each multi-page PDF into a set of several images that could be fed to GPT4o. This is likely a no-go as my org doesn't want all our documents with sensitive data going through a 3rd party service. Second, GPT4o takes longer to process images than the OCR & GPT3.5Turbo set-up. So if an application has a user waiting in real-time on the document to be processed before proceeding, then it may be a hinderance to user experience. Overall, I think I'd only switch over to GPT-4o if going from like 96% to 99% field extract accuracy was necessary or if the use case requires processing something not typed text related like whether the document has a signature/stamp on it or not. Otherwise I may wait to see if later models enable uploading PDFs directly.
@DamoBird365
@DamoBird365 7 ай бұрын
Noted Tyler, feel free to explore other options. I’ve also covered AI Builder and Azure Doc Intelligence kzbin.info/www/bejne/nH2rnnitmMxrgNE. Hopefully you gleamed what’s currently possible from my exploration of the use case, but realise it’s a proof of concept, the model is all but 2 weeks live. This video and others will have inevitably given many new ideas of what’s possible now or in the future and I’m sure it won’t be long before agents use these models to orchestrate what we had previously considered to be possible with digital process automation.
@freshy1097
@freshy1097 7 ай бұрын
Can't you just ask chagpt to convert the pdf to jpeg?
@tylerkolota
@tylerkolota 7 ай бұрын
@@freshy1097 You’d have to feed in the base64 string or something of the PDF & ask it to output an array of base64 strings of images for each page. I’m not sure it can handle that type of request as it requires a decent chunk of code & libraries to do. Also base64 strings are really long. I doubt GPT could ever output then in it’s limited output tokens. But I actually just completed a prototype that allows me to do this in Azure without 3rd party connectors. So I’ll put up a community blog post on how to add that to this template.
@freshy1097
@freshy1097 7 ай бұрын
@@tylerkolota Thank you for the response. Looking forward to it!
@DamoBird365
@DamoBird365 7 ай бұрын
⁠@@freshy1097you could probably use a low or pro code OCR model and then sent the text to an LLM. I covered structured options here: Two options for Invoice Processing in Power Platform | AI Builder or Azure Document Intelligence kzbin.info/www/bejne/nH2rnnitmMxrgNE.
@S0ulH0und
@S0ulH0und 7 ай бұрын
Would it be possible to feed chatgpt inside power automate both jpeg and pdf for it to extract data from?
@DamoBird365
@DamoBird365 7 ай бұрын
I don’t believe pdf can be ingested directly but you could convert to text via ocr first.
@rishikeshmishal6986
@rishikeshmishal6986 8 ай бұрын
Hi Damo, Thank you for this great video. I implemented this in my environment, but facing issue "Request Header Fields Too Large" for a small image file content. Do you know how to solve this?
@DamoBird365
@DamoBird365 8 ай бұрын
Yes, it’s the date version in the url. If you jump to earlier in the video you’ll see the date. I got the same error.
@stuartlittle4433
@stuartlittle4433 4 ай бұрын
What 365 license do you need to allow this work flow to use ChatGPT?
@DamoBird365
@DamoBird365 4 ай бұрын
You would need a premium license to be able to access the API. You would also need a gpt4o service which can be deployed in Azure and costs pennies to use. You can also explore: Power Automate Invoice Processing Tutorial AI Builder and Azure kzbin.info/www/bejne/nH2rnnitmMxrgNE
@stuartlittle4433
@stuartlittle4433 4 ай бұрын
@@DamoBird365 Thank you Damien, appreciate your quick reply and help. Keep up the amazing videos!
@radekleusz
@radekleusz 21 күн бұрын
Hi great video, as i see .pdf is still not allowed in this workflow. What effective and error proved workflow would you suggest? Maybe just change pdf into JPEG?
@DamoBird365
@DamoBird365 21 күн бұрын
You can ocr a pdf and send the text to gpt. Here’s a video using prompts: AI Builder and Power Automate for SharePoint File Summaries kzbin.info/www/bejne/ZoO9dI2to72HmZo
@ChandraChandra-wx9em
@ChandraChandra-wx9em 7 ай бұрын
This is just awesome. Thank you
@GeorgeBrown-gi2ku
@GeorgeBrown-gi2ku 8 ай бұрын
Thanks!
@DamoBird365
@DamoBird365 8 ай бұрын
Thank you George 😍
@peterpetrou1904
@peterpetrou1904 8 ай бұрын
Hello Damien, thanks for great video. Can I ask does this solution require premium license in terms of the connector you are using? Thank you so much.
@DamoBird365
@DamoBird365 8 ай бұрын
Yes it would. Either via the http action or as a custom connector.
@bmassimo1966
@bmassimo1966 7 ай бұрын
Hi, very nice video. An OCR recognition would be possible with Mistral Large LLM too?
@DamoBird365
@DamoBird365 7 ай бұрын
I’m not sure to be honest. I’ve used Mistral for text here 👉 Deploy Mistral Large, Integrate with Power Platform #Mistral #MistralAI #PowerPlatform kzbin.info/www/bejne/ZqmkoaGlqpd6ppo
@bmassimo1966
@bmassimo1966 7 ай бұрын
@@DamoBird365 because for Azure OpenAI models are requested company subscriptions
@hammadyounas2688
@hammadyounas2688 7 ай бұрын
How to use it for pdf data extraction?
@DamoBird365
@DamoBird365 7 ай бұрын
I’m not sure right now but you could check Two options for Invoice Processing in Power Platform | AI Builder or Azure Document Intelligence kzbin.info/www/bejne/nH2rnnitmMxrgNE 👍
@hammadyounas2688
@hammadyounas2688 7 ай бұрын
@@DamoBird365 But i have the pdf which is basically a resume. I want to give the pdf file to chatgpt to give the summary of skills and years of experience and degree. Can you provide a tutorial on it.
@hammadyounas2688
@hammadyounas2688 7 ай бұрын
@@DamoBird365 do you have any idea or code for this idea?
@DamoBird365
@DamoBird365 7 ай бұрын
You could use OCR with either low code/pro code and then send the text to an LLM. But I’ll note your request for the multimodal LLM 👍
@hammadyounas2688
@hammadyounas2688 7 ай бұрын
@@DamoBird365 how to code in power automate? I means do you have flow for that?
@Musti1911
@Musti1911 4 ай бұрын
I have an accounting company in Germany and would like to implement it, what are the costs for the programs, do you know that?
@DamoBird365
@DamoBird365 4 ай бұрын
If you’ve got Microsoft 365 you might be halfway there. I demonstrate using GPT4o which is charged separately too. The Power Platform also has AI Builder and Azure Document Intelligence. Check out: Power Automate Invoice Processing Tutorial AI Builder and Azure kzbin.info/www/bejne/nH2rnnitmMxrgNE
@johng5295
@johng5295 16 күн бұрын
Awesome.
@geralddahl9159
@geralddahl9159 8 ай бұрын
Thanks for introducing me to the concept of developing a model in Azure that allows me to harness powerful chat gpt abilities. Can bing ai be used to refine the prompt for the model or must one use chat gpt proper for best results? PS - if my question misses the mark it’s because I have a sloppy understanding of large language model learning etc. PPS - 0.008 seems like 0.01 aka one hundredth of a dollar aka one penny but perhaps I’m getting that wrong too? (No need to spell out any long math remediation) - bottom line, I’m thrilled that you take the time to cost out approximate Azure pay as you go costs. Knowing that I can test something 500 times and only incur as little as 50 cents or as much as $5 is helpful. Thank you.
@DamoBird365
@DamoBird365 8 ай бұрын
I believe you’re right. It’s about a cent a run 👍 clearly my math brain wasn’t working this morning 😂 thanks for the spot.
@logeshkannan9333
@logeshkannan9333 4 ай бұрын
Hi your explanation is too good. I try to connect azure open Ai using powerautomate. but in Http request, im getting retry error. Please suggest any way to solve this issue.
@TrueSpeaker-gx5tw
@TrueSpeaker-gx5tw 7 ай бұрын
A video request: Hi Damien, In our organisation we work in teams. And for new projects we create a channel with an predefined file structure, with a onenote for that specific project and a planner board for that project.... But thats a lot of manuell work... Is there a possibility to automate all these processes?
@DamoBird365
@DamoBird365 7 ай бұрын
Definitely doable. If you drop me more details via dm, I can consider it as a video. damobird365.com/contact-me/
@TrueSpeaker-gx5tw
@TrueSpeaker-gx5tw 7 ай бұрын
@@DamoBird365 I did just now😊
@JonDoesFlow
@JonDoesFlow 8 ай бұрын
This is weapons grade brilliant.
@DamoBird365
@DamoBird365 8 ай бұрын
Cheers Jon, really enjoyed exploring this and the capabilities are impressive. Can’t wait for audio to come to the api too.
@JonDoesFlow
@JonDoesFlow 8 ай бұрын
@@DamoBird365 Hey Damo, so I did initially get an error using preview API end point, "Request headers too large", I changed to the api-version=2024-02-01 and I no longer got that message
@DamoBird365
@DamoBird365 8 ай бұрын
@@JonDoesFlow exactly the same. Hope the video helped. I am glad it wasn’t just me. Now what are you creating 😍
@JonDoesFlow
@JonDoesFlow 8 ай бұрын
@@DamoBird365 I saw another post somewhere about that being a problem. Well at the moment I’ve built similar to you. It would be interesting to check the difference between AI credit cost and the cost of these calls to the API. Could be cheaper on higher volumes going this way.
@theaunsyed
@theaunsyed 8 ай бұрын
Amazing. Can you give it as a subscription? I have clients that want to use it. What would be accuracy for Arabic receipts?
@DamoBird365
@DamoBird365 8 ай бұрын
Interesting 🤔 it does appear that Arabic is supported. openai.com/index/hello-gpt-4o/ whilst my demo is hosted in Azure, you/customer could use the OpenAI api.
@theaunsyed
@theaunsyed 8 ай бұрын
@@DamoBird365 CanI get access to your demo so I can play around with it? Would that be viable for you?
@johnfromireland7551
@johnfromireland7551 7 ай бұрын
@@DamoBird365 That said, the Model still needs to be hosted in the US.
@cbau0809
@cbau0809 8 ай бұрын
You have referred to Chatgpt 4o as optima, and the o stands for omni.
@DamoBird365
@DamoBird365 8 ай бұрын
Very good shout 👍 openai.com/index/hello-gpt-4o thanks for pointing that out. Noted.
@stevedaregmailcom
@stevedaregmailcom 7 ай бұрын
This is a gamechanger! thx for sharing! have you tried to make it work for PDF files? I tried replacing image/png;base64 with application/pdf;base64 but it doens't work, I think the type 'image url' has to be changed in that case?
@DamoBird365
@DamoBird365 7 ай бұрын
I’ve not yet. I couldn’t find it in the docs yet. If you do get it working, let us know 👍
@johnfromireland7551
@johnfromireland7551 7 ай бұрын
Try using Power Automate to, first, convert your pdf into an image and run from there. Though, for multi page PDfs will it convert to one long image? Or perhaps you will have to manage the individual PDF pages as separate images. It's starting to look tricky!. :-(
Power Automate Invoice Processing Tutorial AI Builder and Azure
27:58
Power Automate Solutions - Learn How and Why?
23:56
DamoBird365
Рет қаралды 5 М.
Непосредственно Каха: сумка
0:53
К-Media
Рет қаралды 12 МЛН
УНО Реверс в Амонг Ас : игра на выбывание
0:19
Фани Хани
Рет қаралды 1,3 МЛН
The Lost World: Living Room Edition
0:46
Daniel LaBelle
Рет қаралды 27 МЛН
I Sent a Subscriber to Disneyland
0:27
MrBeast
Рет қаралды 104 МЛН
Fine-Tune GPT-4o Model Step by Step
15:16
Pradip Nichite
Рет қаралды 6 М.
Automate EVERYTHING Through ChatGPT ✨
29:13
No-Code Ireland
Рет қаралды 41 М.
Connect Power Automate & ChatGPT-4o with Custom Connector
20:55
Andrew Hess
Рет қаралды 7 М.
Build A Custom Connector API in Power Platform
24:52
DamoBird365
Рет қаралды 6 М.
GPT PDF & Image Data Extraction (Power Automate)
22:15
Tyler Kolota
Рет қаралды 15 М.
DeepSeek R1 Just Revolutionized AI Forever
21:06
Cole Medin
Рет қаралды 12 М.
Непосредственно Каха: сумка
0:53
К-Media
Рет қаралды 12 МЛН