Get Data from PDFs and Send to EXCEL with Power Automate Desktop!

  Рет қаралды 76,309

Christine Payton

Christine Payton

Күн бұрын

Пікірлер: 161
@kidd32888
@kidd32888 Жыл бұрын
omg you saved my organization hundred of hours and this is not a hyperbole! THANK YOU SO MUCH!!
@bi-ome
@bi-ome Жыл бұрын
aw that's so great to hear!
@PowerBeaver
@PowerBeaver Жыл бұрын
Awesome, thanks for posting this. I started doing such kind of thing using vba, which was quite a journey😅. Didn't think PA Desktop could do the trick. 👍🏻
@bi-ome
@bi-ome Жыл бұрын
Yeah!! I didn't realize VBA could do this, so we both learned something. 😆
@aam_insaan1412
@aam_insaan1412 3 ай бұрын
Thank you so much for saving me hours of copying and pasting data. You're a lifesaver! 😇
@hvidsteen89
@hvidsteen89 10 ай бұрын
Thank you very much for a great tutorial. But do you have any idea, if I can get this to work with all the files combined into a single PDF instead of all my source material being split into seperate files?
@bi-ome
@bi-ome 10 ай бұрын
Probably, you can use a regular expression to collect all matches for your query - I had GPT write one to try, this looks for the word "balance" and gets anything after it so you can see if something similar would work: balance:\s*\$?(\d+[\.,]?\d*) It should put all the matches in a list, then you can loop over those to do things after splitting out whatever you don't need. BUT you might also try connecting to the file with Power Query (it can use PDFs as a source) and see what that gives you. The only reason I didn't use it in the example is because multiple files usually don't have the exact same structure between the files, but if yours is all in one file it might work. Alternately there's a ton of new AI options in the cloud Power Automate for entity extraction that use AI Builder credits, and you likely have credits if you have Power Automate licenses to try.
@kama-j1l
@kama-j1l 2 ай бұрын
@@bi-ome I believe I have similar issue. Based on your example, what if there were multiple lines of Institution Name and Interest Income somewhere in the file, how could I extract those as well? Right now only 1st found match is returned. Thank you for creating this amazing video. Edit: I managed to create variable that covers all examples of "Institution Name" simply by changing from only 1st iteration in Parse text but now im stuck when It writes in excel its only 1st one instead of all.
@bi-ome
@bi-ome 2 ай бұрын
@@kama-j1l The parse text action has an option to allow multiple matches. It sticks them all in an array, so if you want it to get more than one you, you'd probably want to add actions to insert in Excel for each array position (by number in square brackets). If you have a variable number of matches, you might want to just use an OCR extract instead of using this technique because AI will be much better at handling the irregularity.
@craigjenkins6917
@craigjenkins6917 9 ай бұрын
Great video. How would you extract the data from the text above rather than after the line. Thanks
@bi-ome
@bi-ome 9 ай бұрын
I had a conversation with ChatGPT about this, because I'm not a regex expert, it says that regex is designed around finding text after the target and not before. Is there static text before that you can use instead? The wrench is that Power Automate doesn't seem to accept "lookbehinds" in regex like some applications can, so you have to get creative.
@Steph-ho2gl
@Steph-ho2gl 9 ай бұрын
Super tutorial Christine! is it also possible to collect data from table for which response is below the key word, and not after? thank you!
@bi-ome
@bi-ome 9 ай бұрын
If you just want the one value, you can use your target text and (.+) as the regular expression. That should get everything on the next line. If you are wanting to extract the WHOLE table, that depends on the structure of the PDF - if it's just one table, Power BI can sometimes grab it, but if you have multiple tables or a table that spans multiple pages and isn't always in the exact same location, there is entity extraction with AI in AI Builder or Syntex.
@Steph-ho2gl
@Steph-ho2gl 9 ай бұрын
@@bi-ome thank you for your response Christine ! After further investigation, i discovered that Power Automate desktop has not only a functionality to extract text as you exposed in your video, but also the functionality to extract tables from PDF. Both functionalities are very complementary and very powerful. Thank you again !
@bi-ome
@bi-ome 9 ай бұрын
@@Steph-ho2gl Awesome! Want to link to the resource you used? I don't actually use PAD all that often, this video was based on a technique I picked up while scrambling to do a particular project, so I'm interested!
@Steph-ho2gl
@Steph-ho2gl 9 ай бұрын
Hi Christine, I used the following sequence : 1. Select on file using the action Message box - Select file >> Variable = “Selectedfile” 2. Extract table from PDF file [from SelectdFile into “ExtractedPDFTables” variable] 3. Message box where Power automate will list all the tables he identified in the PDF 4. Launch Excel (which I have prefilled with the header column I want to keep. 5. Insert “For each” to loop all the identified tables [Variable used = “ExtractedPDFTable”] a. Then embed one “if” condition and I specified a specific content on one header column. In fact, my PDF file contains 42 tables, and only one is interesting for my purpose because it contains a specific key word in the CurrentItem.datatable.columnHeadersRow Contains Key word”. i. Obtain the first free line in my excel >> It creates a variable “FirstFreeRowOnColumn” ii. Write in excel [CurrentItem.datatable]when condition is met b. End if 6. End the loop started in 5. Hope it can help you! @@bi-ome
@bi-ome
@bi-ome 9 ай бұрын
@@Steph-ho2gl Thank you so much for taking the time to share!
@piratefish5856
@piratefish5856 10 ай бұрын
As a new user doing my best to learn power automate, i thank you, your video has been greatly helpful. Your tutorial works great for me except for one detail. At the 12:30 mark in your video, where you are pulling the second position in the array and enter the [1], i cannot get this to work. None of my variables have % signs around them. i have been searching for how to turn on the % signs, or how to make it work without the % signs. How do i make this work?
@bi-ome
@bi-ome 10 ай бұрын
The % signs are how you reference variables - so you can type %your-variable-name% to use them in formulas. It's just how Power Automate knows you're making a reference vs just normal text - so you can type whatever you want in the box. They won't have percentages in the dropdown menu. 🙂
@piratefish5856
@piratefish5856 10 ай бұрын
@@bi-ome That was one of the first things i tried. It gives me an error that reads "Parameter 'Value to write': Unexpected characters. Characters are used in the formula in an unexpected way." I don't know if i am on a newer version or if there is a setting for the % signs. In your video, all your variables have % signs around them. In my use of power automate, nothing has a % sign around it.
@bi-ome
@bi-ome 10 ай бұрын
@@piratefish5856 You are using PA *desktop*, right? Like from your start menu? Power Automate Cloud is very different and they are often confused--
@piratefish5856
@piratefish5856 10 ай бұрын
@@bi-ome I am using the PA desktop app. Version 2.40.157.24023, which is the most recent version as of 01/29/2024. To try to be more specific, on your Write to Excel worksheet popup, in the Excel Instance block, on your video is shows "%ExcelInstance%" but for me i see "=ExcelInstance"; same thing for the Value to write block, your video shows "%InstitutionName%" by default, my screen shows "=InstitutionName"
@bi-ome
@bi-ome 10 ай бұрын
Hm your comments somehow hit the moderation filter, and when I approved them they disappeared... but you mentioned something about the variables not having % in them? I would go to the step that they are created in, e.g. the step that opens the Excel file, and check that they have percentages there. When I name a variable in a step, it auto-adds the percentage signs, but perhaps yours don't for whatever reason? Also they only show up when you're editing the step and click on the var name in the editor, if that's part of it.
@iracypimenta440
@iracypimenta440 10 ай бұрын
First, thank you for your content. I might be doing something wrong, although I have followed yours instructions step-by-step, because I am only able to get even without the break just one iteration in my pdf file. Thanks again
@bi-ome
@bi-ome 10 ай бұрын
Did you add a breakpoint? The red circle? That will stop it after one loop, I think I added one in the video to show it... you can click the dot to turn it off. Also check what you're looping over for the actions and whether the thing that you're looping over contains all the files you want to loop on (you can check this by looking into the variable content after it runs).
@KomailButt
@KomailButt 11 ай бұрын
Thank you so much for providing such useful information. I need suggestion. I have document library folder in SharePoint where PDFs comes. In each PDF there is a table which is spreaded over multiple pages. PDFs also include text but I am only interested in extracting those tables from the PDFs. Please suggest the possible easier way to do this task. I am not sure whether I should use AI builder model (online) or this deskstop solution.
@bi-ome
@bi-ome 11 ай бұрын
I feel like AI Builder is probably going to be easier for your case if you have the credits for it. The method in this video is better for extracting very particular information from the text using what amounts to trigger words to find the content. It has the advantage of being "free", but is not as flexible as a LLM would be.
@komailbutt2998
@komailbutt2998 11 ай бұрын
​@bi-ome Thanks. I am looking for a free resource if possible in the first place, but let's see. By the way, how much credit does it require for doing this task, and do you know any other way to do this task for less money?
@bi-ome
@bi-ome 11 ай бұрын
@@komailbutt2998 AI builder credits come with most of the Power Platform license packages, so you might have enough floating around to work with. They’re a first-come-first-serve sort of thing at the tenant level.
@CLTPHDdad
@CLTPHDdad 9 ай бұрын
Hi Christine, Wonderful video! I am wondering whether you could offer any direction on an issue I am having. Basically, I have followed the steps in your video to extract PDF text and have viewed the structure of the text. Resultantly, I have identified a static, unchanging text string represented in all PDFs in the file folder (i.e., the report name) on line 35 and I need the dynamic information (client name, client ID, and date) from the subsequent line (i.e., line 36) to be individual variables that will be incorporated into the Excel file in columns A, B, and C, respectively. The format of the string is: "Full Name | ID Number | MM.DD.YYY" and there is some additional unnecessary text that follows the date. Any direction you might provide would be greatly appreciated!
@bi-ome
@bi-ome 9 ай бұрын
Sure, I use ChatGPT to produce the regex because that saves a lot of time. It provided this to get the full text of line 36: (?:.* ){35}(.*) So if you take that with your parse text action and put it in a variable, then use the action that splits text (search for the word split in the action menu) and split on " | ", you will get each of the values you want in an array. You can reference those by position with %yourVarName[0]% for the first item, %yourVarName[1]% for the second, and so on. You can insert those in columns in Excel like we do in the example in your loop that loops over the docs. 😀
@TTtipsforlife
@TTtipsforlife 4 ай бұрын
Thank you for content - just getting into using this tool and already finding it so useful. one question - one of the fields I am trying to pull is three rows down from the "text to find". Took your advice and asked Chatgpt and the expression provided was (?:Exchange rate )(?:.* ){2}(.+).. However the result is giving me the "Text to find" and all the rows data plus the row of information I want. Do you have any easy answer or point me in the direction of guides that can help me solve this? Thanks in advance
@bi-ome
@bi-ome 4 ай бұрын
Yes, that's how it worked in the example too, and we had to split off the part it was searching for to be left with the remainder. Lookaheads and lookbehinds are how you usually avoid getting the extra junk you don't need, but as far as I can tell Power Automate doesn't support those. Is there a character you can split on to remove them? You could try something like splitting on the line break ( ) and then getting the last value in the array that creates, that should leave you with just the text on the last line.
@TTtipsforlife
@TTtipsforlife 4 ай бұрын
@@bi-ome thank you in getting back to me. I managed to solve the issue by using this expression "(?
@bi-ome
@bi-ome 4 ай бұрын
@@TTtipsforlife Oooh clever, thanks for sharing and for the kind words!
@antoniogarzoncalo1478
@antoniogarzoncalo1478 Ай бұрын
Hello, this is super helpfull! but i have an error extracting the lines that are above the line im analyzing you said in the video it was but thats not working for me
@bi-ome
@bi-ome Ай бұрын
Try copying and pasting the paragraph around the chunk of text into Chat GPT and asking it what regex to use to extract the value you want. There can be hidden line break characters and things like that that make it a bit funky, but GPT can see them. Make sure to tell it to not use lookbehinds-
@pearltechnologyllc674
@pearltechnologyllc674 2 ай бұрын
Using the following flow: 1. Get Files from Folder for the PDF, 2. Launching the excel sheet, 3. Saving the excel sheet, 4. Extracting Tables from PDF. The PDf essentially has three tables two towards the top and a third in the middle. Power Automate extracts the three different tables, but i need the top two from each page removed and then the middle bottom one to extract line by line from 30 pdf pages. The top two tables all start with the same words, so those should be easy to ignore. Then the table I want has the same header and each line is number from 1, 2, 3, etc, all the way down the pages of PDF
@bi-ome
@bi-ome 2 ай бұрын
If you're trying to extract full very-regular tables, you might try Power Query (it can use files and folders of PDFs as a source) or one of the AI OCR tools. They're better for more structured data--
@pearltechnologyllc674
@pearltechnologyllc674 2 ай бұрын
@@bi-ome thanks, the data is pretty structured inside the PDF. The problem is that the invoice puts a "header" at the top with the invoice information before the lines items. However, it's not a true Adobe Header that I can remove.
@bi-ome
@bi-ome 2 ай бұрын
@@pearltechnologyllc674 I'm not sure what the issue with having a header is - if you try Power Query or an AI tool, it should let you recognize it as a header, or if it thinks its a row (Power Query does this sometimes), you can remove the top 1 row to get rid of it usually. You get a lot of transformation options with Power Query too, to split out account IDs/fill down the rows/that sort of thing-- the technique in this video is more about how to pull out a few specific single text strings, it's not really meant for extracting full tables.
@cbacca2999
@cbacca2999 6 ай бұрын
What is the link to get the PDF file so I can follow along?
@bi-ome
@bi-ome 6 ай бұрын
Sure, I went back and checked and its not super straightforward to download them from the FFIEC site directly, you have to put in the institution ID to get each file (it's been a while), but I found what looks like the files I used on my PC and uploaded them here: christine-payton.com/resources-files/
@ExecutiveAssistant-z5b
@ExecutiveAssistant-z5b 9 ай бұрын
Christine, this tutorial gets me most of the way to what I need done. The only problem is that the email I get from my vendor contains a link that I need to follow to get a web based copy of the invoice. To your knowledge is there a way to have Power Automate activate the link so I can scrape the webpage for the information I need and then save the webpage as a PDF with the name that I got from the document? In advance, thanks
@bi-ome
@bi-ome 9 ай бұрын
I think this will be a very difficult one but technically possible... you might want to go with Power Automate Cloud for this instead of desktop, since you'd be triggering it when an email is received. There is a comment in this particular thread that has what looks like a viable solution to me, using the HTTP connector in a cloud flow: ideas.powerautomate.com/d365community/idea/94b667cd-89cc-4c61-8982-79b53c1a6019 Alternately, you can parse email text in the desktop version, maybe do regex to get the URLs ( \bhttps?://[^\s]+ ), then use that to launch a web browser with the URL (from a variable), and save them... Web-based PDFs are a particular pain because the "save" button for a PDF is not something PAD recognizes as a UI element typically, so what I've done as a workaround is use "send keys" to print it to PDF (control p). When you're saving it, you can use send keys again to send a particular text string to the filename field based on a variable (I use send keys with "tab" to tab down to that field... lol... because it can't seem to select it with the recorder). Increase the time between keys slightly or it will get "ahead of itself" while things load.
@ExecutiveAssistant-z5b
@ExecutiveAssistant-z5b 9 ай бұрын
@@bi-ome Thank you. If I find a solution I will let you know.
@schmidtnolan
@schmidtnolan Жыл бұрын
Great information! I've modified this slightly to use 'crop text' to extract text between two headers. Would you have any suggestions for when the headers have slight variations? I'm thinking of running a subloop with each potential iteration based on the condition of if the text is found... the initial build will stink but should do the trick. Thoughts?
@bi-ome
@bi-ome Жыл бұрын
Cool! It depends on what your variations are… I’ve used the switch/case cards for this, basically to check if it contains the word I’m looking for before parsing (with one case for each variant, and slightly different regex for each).
@anushkatiwari1506
@anushkatiwari1506 Жыл бұрын
Thanks for the video, but I have a different request. I want to extract data from all the worksheets of a excel workbook using a loop and then print it in a pdf and save it. I want my user to first select the path of excel workbook and the path where the pdf should be saved. How can I do this?
@bi-ome
@bi-ome Жыл бұрын
Printing is tricky because often the UI browser often has trouble selecting menu options from the print menu when it runs. What I have done in the past is use "send keys" to do all the navigation in the print/save menu, using tab to navigate through options and "enter" to "click" selected buttons. I increase the delay between keys to significantly above the default, because if they run too quickly together they will trigger before the prior one is finished applying. You can use the action that allows users to enter an input to collect the save path if it will be different each time, then use the path concatenated with the filename in the filename slot in the save menu - that will save the file to whatever path is entered (e.g. C:\Users\username\Documents\FolderName\filename.pdf). Since this isn't usually a selectable input field, I use "send keys" to send the filename/path to that box - the cursor will be there by default when the window opens, if it's not for whatever reason you can send more keys to get there. 😊
@gioargentati7802
@gioargentati7802 Жыл бұрын
Thank you for posting this informative video. Sometimes, when I try to read text from a PDF using ExtractPDFText, instead of getting readable text, I get strange characters. Do you have any suggestions on how to handle this sort of thing? Thank you in advance, Christine!
@bi-ome
@bi-ome Жыл бұрын
Are they scanned PDFs? Scanned PDFs can be very weird on the text end. I would try opening one that is having the issue and selecting the text inside the PDF and pasting it into a text editor just to see what it looks like - that'll tell you if the issue is the Power Automate action or the PDF itself.
@kbirstein
@kbirstein 10 ай бұрын
@@bi-ome I had the same issue. I just created a PDF with one line of text and I was able to select it in the PDF and copy it and paste it to Notepad and it appeared as text so I'm thinking this only works on special types of PDFs and it's impossible to know which ones it will work with . . .
@bi-ome
@bi-ome 10 ай бұрын
@@kbirstein Do you have an example of what the weird text looks like? It could be an encoding issue.
@maanavjain193
@maanavjain193 4 ай бұрын
This is great stuff. Thank you. I do have a question though. Let's say I have a table in my pdf in which I have 4 subtotals one below the other. And several other PDFs some of which have 3 subtotal figures. How do I parse them when I want the last subtotal figure only from these tables?
@bi-ome
@bi-ome 4 ай бұрын
If it’s always the last line in the subtotal you want, I would go for getting the line above the target and searching for whatever is directly after those subtotals (hopefully it’s something reliable!).
@peterolah1165
@peterolah1165 Жыл бұрын
great video Christine! I got a question: I got many PDFs with multiple values to search and 1-2 of these are missing. I got the index out of range error message (right after the write to excel command), I try the error handling option and an IF sub flow but didn't work.
@bi-ome
@bi-ome Жыл бұрын
Try putting the if-statement above the parse text step - so it will only run the parse text and split steps if the file contains the text you're looking for. That way you don't have to error handle it, because it won't error. :) When you do this, make sure to have a step that resets any variable you were using to store values to blank at the end of the loop, so that doesn't insert the prior loop's value in Excel when the text doesn't exist.
@munbingkong5401
@munbingkong5401 7 ай бұрын
Hello this has been a great help, may i know how do you copy and paste the extracted pdf into the notepad with numbers indicating each line for it ? Because when i did that , it only copy word for word and did not include any line numbers..
@bi-ome
@bi-ome 7 ай бұрын
The numbers were just in the PDF file, they're part of the table text. If you want row numbers in the text editor though, you can use a different text editor - e.g. VSCode would have that.
@munbingkong5401
@munbingkong5401 7 ай бұрын
@@bi-ome Hello thank you for replying me ! i have been doing this power automate for my corporate JDE software and out of 10 times maybe 3 times it will log in successfully , by chance , do you know what's the problem?
@ryannedieu
@ryannedieu 8 ай бұрын
Thank you so much for the tutorial. I know someone has asked about dealing with out of range blank values. Could you elaborate more? I have a fairly simple PDF data extraction with two fields. However, one of them can have a lot of blank values. Thank you.
@bi-ome
@bi-ome 8 ай бұрын
There's lots of ways to do the error handling - you could put the part that selects the value and adds to Excel steps in an error block and set it to continue at the end of the block if it hits an error (which will happen when it tries to do things to a null value). That would basically just have it continue on to the next PDF if it doesn't find a value - not sure if you need to record the blank or not. You could also use the "if" conditional to check the isolated value (after you remove the word you're looking for) or the second position in the array, using a "is not blank" and put the update Excel actions there. Meaning it would only add them to the Excel file if they had a value. You could use "else" to set it to some specific text like "no value" probably too? The error handling takes some trial and error to get working the way you want it to.
@ryannedieu
@ryannedieu 8 ай бұрын
Thank you so much for the response. I’m able to get it work with your suggestions. Thank you!
@JimMott-l8g
@JimMott-l8g 15 күн бұрын
If I can make this work it will very valuable for me, but I only got as far as trying to launch Excel and an error message that "this file is locked for editing by another user" pops up and stops my program. How can I avoid this error? I do make sure that Excel was closed before running the flow. None of my files are shared, but all my files are stored on the organization's OneDrive, per our IT Department's directive. I thought I heard you allude to another way to approach this project if files are stored in the cloud.... Thanks!
@bi-ome
@bi-ome 15 күн бұрын
This is a SharePoint/OneDrive glitch - usually if you close it everywhere and leave it closed for a few hours it will go away. Basically the service is forgetting to release the lock when the file closes... If it gets truly stuck, the legit fix is to remove everyone else's permissions to the file for 24 hours and then re-inherit them (years ago this was a super common SharePoint bug, you see it less now). It' shouldn't have anything to do with Power Automate, specifically. But do note that you can only have one file with the same name open at a time in Windows, so if you are testing it and forget to close it between runs, it'll error out there too--
@JimMott-l8g
@JimMott-l8g 14 күн бұрын
@@bi-ome Thanks!
@deeznuff3245
@deeznuff3245 3 ай бұрын
How do you access the pdf from SharePoint if I might ask?
@bi-ome
@bi-ome 3 ай бұрын
The easiest way is to just use the "sync" button in the library to sync it locally with OneDrive. It's much easier to have it loop over local file paths vs web. In the web, some options are lots of jank with OCR and AI Builder, or programmatically opening each PDF and copying the text to clipboard with "send keys" and then using the clipboard text to extract in PAD.
@DenizSpearo
@DenizSpearo Ай бұрын
Hey Christine, I have numbers (with in the text) i need to read from the pdf. There is a number in each line, lets say for 10 lines. I'm struggeling to insert each number in cell X and following, starting from the first number from the first line. Do you any ideas? Thanks for the video :))
@bi-ome
@bi-ome Ай бұрын
Does the number have other characters, like a period and a space next to it? That can help narrow it down. Regex can get numbers but it helps if the numbers follow some pattern you can use in the expression-
@ExecutiveAssistant-z5b
@ExecutiveAssistant-z5b 9 ай бұрын
This is some really powerful stuff, thank you. In the video you used some reglar expressions that you use, where can I find the complete list?
@bi-ome
@bi-ome 9 ай бұрын
Thanks! There's not a list of regular expressions per se - here's more info on them: coderpad.io/blog/development/the-complete-guide-to-regular-expressions-regex/ ChatGPT/copilot are very good at creating regular expressions, so I usually just start by giving them what I'm looking for and an example and it will usually work. Do make sure to tell it something along the lines of "just give me the regular expression, not Python, and do not use look-aheads or look-behinds" to get the "right" thing to plug in. :)
@kevinmctarsney36
@kevinmctarsney36 Ай бұрын
@@bi-ome But you are right; as you stated in your video ChatGPT does suggest lookforward and lookback. I added instructions in my prompt NOT to do it and that worked until I got to my 4th edit and it forgot. Easy fix though. Appreciate the heads up.
@bi-ome
@bi-ome Ай бұрын
@@kevinmctarsney36 Yeah, it realllly wanted to use them for me too! 😂
@dvwatts
@dvwatts 8 ай бұрын
Thank you for the video! Maybe I missed it but was there an option to deal with multiple choice or check box data types?
@bi-ome
@bi-ome 8 ай бұрын
That's a really good question. I didn't need to get checkboxes so I didn't think to try - I found a PDF with checkboxes to prod at and when I copy and paste a checked box out of the PDF there is no indication that the box exists from that or that it's checked, which makes me think the extract text isn't the way to go here. I tried doing an internet search, too, and didn't find anything that looked like a functioning example of someone doing this, so you've stumped me lol. It's not supported in AI Builder either, apparently, which is usually a good fallback. Though maybe with OCR there it'd be able to "see" the checkmark?
@jomarismeneses3765
@jomarismeneses3765 Жыл бұрын
Good video! How can I do if I have documents with different structures and information and I want to put them in the same Excel document but in different sheets of it? Thank you!
@bi-ome
@bi-ome Жыл бұрын
You can add either an "if" branch or "cases" that checks the PDF text to see if it contains some particular unique string for each circumstance before extracting the text (you'd use a different extraction expression for each branch). There's actions for "get all Excel worksheets" that you could probably combine with "set active Excel worksheet" to change the worksheet before it gets the first free row to insert the data. You'd get-all outside the loop, then inside each conditional branch have it set the active sheet before it inserts. I'm not sure if the action that swaps sheets gives itself sufficient time to complete or not before continuing, you might need to add a short pause between that and inserting (just something to watch for, I've not tried it yet - might be fine). :)
@francosepulcri4283
@francosepulcri4283 3 ай бұрын
hi, how can I get files in a folder that is synced to my pc? thanks
@bi-ome
@bi-ome 3 ай бұрын
The folder location it wants is just the local path, and you can copy it out of the file explorer address bar when you visit it on your PC. If it's synced to your PC, there is a local path by definition. If for whatever reason it obscures the path (happens with a few Windows folders), you can use the up arrow in the file explorer toolbar to go "up a level", then right click the folder -> properties -> location tab and copy from there.
@francosepulcri4283
@francosepulcri4283 3 ай бұрын
@@bi-ome Thank you so much! It worked, I thought you will never answer, so you win a new subscriber.
@bi-ome
@bi-ome 3 ай бұрын
@@francosepulcri4283 I try to respond to all the questions! Whyever would I not?! I don't get so many that I can't keep up, perhaps that will happen someday but today is not that day. :)
@francosepulcri4283
@francosepulcri4283 3 ай бұрын
@@bi-ome continue doing that and I am sure that you will get it! thanks again!
@lucasmenezes6739
@lucasmenezes6739 14 күн бұрын
Is it possible to create a form where the user attaches the PDF and after attaching it is possible to have Excel return the PDF data via email?
@bi-ome
@bi-ome 14 күн бұрын
Yes, but I would go with Power Automate Cloud and one of the AI Builder actions for this instead of what we do here. The desktop app is more for "I have a pile of 5000 PDFs I need to churn through" sorts of applications vs "when an action happens, do a thing".
@JosephHadaCleveland
@JosephHadaCleveland 7 ай бұрын
Is there any chance you could do a video, if it is possible, on how to feed in a list of URLs from excel and save each page as a pdf to a folder?
@bi-ome
@bi-ome 7 ай бұрын
The only way I've gotten the web pages to PDF to work is to "print to PDF" via using "send keys" with hotkeys, with tab to tab through menus and enter to "click" print. Lol. It works okay that way but you need to have pauses in there to space out the send keys actions. You'd just loop over the Excel rows and launch a browser with the link as the address and close the browser at the end of the loop.
@cheerbear9059
@cheerbear9059 5 ай бұрын
Is there a way to do this with google sheets
@bi-ome
@bi-ome 5 ай бұрын
If you sync it with Google drive to your local computer, and then open it with Excel, maybe? The thing is most of the actions for spreadsheets are Excel-specific, so it really needs to modify the file in Excel unless you create a CSV or something instead. A CSV would probably work, but you'd be going about it very differently.
@stanTrX
@stanTrX 5 ай бұрын
Thanks. But it looks like too much time to set this up, so it should be something worth doing for repeated tasks maybe. Another thing is pdf file shall be in the same format. Can you also extract unstructured pdf s ?
@bi-ome
@bi-ome 5 ай бұрын
Yes? I would not generally do this for one handful of PDFs, it's just an example. 🙂 This will work on any PDFs that have selectable text in them, because the expression is just getting text next to or above or below other text, but not scanned PDFs.
@andrada25m46
@andrada25m46 3 ай бұрын
I got here looking for a way to implement my use case. You can extract text from unstructured pdfs but you need to know the data format to implement regex based extraction. Regarding scanned pdfs there is an OCR action. It also doesn’t take all that long to implement this, I have a process that takes hours each time, part of it is just searching which documents I need. I made a flow which in 20 minutes separates hundreads of docs into what needs processing and what doesn’t, you can do sth else in the 20min it runs
@bi-ome
@bi-ome 3 ай бұрын
@@andrada25m46Yes, the AI tools and OCR are great. Regex I would only recommend if you are looking for a solution that doesn’t require a subscription or other additional fees, that was the premise here.
@killiancolombo331
@killiancolombo331 Жыл бұрын
That is pretty cool!
@tateeslick4695
@tateeslick4695 7 ай бұрын
I have my table in the same format but when I view the detracted pdf text, it’s on a new line. Please show or explain what would be different when the heading I want is on a certain line and the text I want is on another line. In the pdf, they are next to each other.
@bi-ome
@bi-ome 7 ай бұрын
It might be "getting" a line break character that you can't see until it's in Excel. You can try one of the methods here and see if one of these works: www.reddit.com/r/excel/comments/zolxmo/how_do_i_clear_this_empty_line_breaks_for_each/
@coolandtrending4497
@coolandtrending4497 5 ай бұрын
Does this not work in power automate cloud version with files in SharePoint folder. This was awesome video but I don't use PA desktop version so asking 😅
@bi-ome
@bi-ome 5 ай бұрын
Good question! Power Automate Cloud does not have regular expressions and pulls the file content via API, meaning it comes in as base64 (garbled random letters). In order to do anything with it, you have to either translate it out of base64 (which requires a third-party connector that is not free) or use OCR on it as if it were an image to "recognize" the text and concatenate it all together. Then you'd use another AI connector on it, typically AI Builder or Azure Open AI, to parse the text and pull out the parts you want to put in Excel. Or use SharePoint Premium, which basically packages the above into a solution for you but is more expensive. So, long story short, since Power Automate Desktop is installed on most Windows machines and is free to use locally, it's often much less of a pita and less expensive to just use that vs cloud for this particular job. Power Automate Desktop essentially is just a robot that controls your computer where you can tell it what actions to take, so it bypasses the base 64 issue and allows regular expressions so you can extract without AI. But it's technically possible to get the same output with either.
@coolandtrending4497
@coolandtrending4497 5 ай бұрын
@@bi-ome you are right. Currently trying AI builder method. Again, thanks for the informative content.
@sebastiansolis5279
@sebastiansolis5279 9 ай бұрын
This is great. Is there any Way to send info to a share list?
@bi-ome
@bi-ome 9 ай бұрын
Yes, there are SharePoint actions in the desktop app, but they will trigger premium licensing. You might consider using the Excel file to feed a PowerShell script, too, that would also work!
@duchoanChu
@duchoanChu 5 ай бұрын
Thanks so much! It was very useful to me
@atrevisi
@atrevisi 3 ай бұрын
amazing !! so helpful thanks
@taesunyoo4637
@taesunyoo4637 8 ай бұрын
Hi Christine thank you for sharing wonderful video!. Quick question: does this work on Power Automate Cloud as well? sounds kinda silly question.
@bi-ome
@bi-ome 8 ай бұрын
Cloud doesn’t have regex per se, but you can extract from PDF with things like AI Builder to get what you’re after via prompt. The issue I have with it is that when it gets the file content from a PDF, it’s in base64 and has to be converted to be usable by AI… There’s various methods and workarounds for that, and alternatives you can use for more $$ like Syntex. The technique in this video is basically the budget DIY version of all that! 😅
@taesunyoo4637
@taesunyoo4637 8 ай бұрын
@@bi-ome thank you for your prompt reply. ah I was searching around KZbin as well and like you mentioned I did see solution relying on AI builder to mark/highlight each data field from PDF file. Too bad that regular expression is not supported by cloud which is a bit shocking for me. thank you very much!!
@bi-ome
@bi-ome 8 ай бұрын
@@taesunyoo4637 You can use regex with Syntex/SP Premium! Just not Power Automate. Syntex is a more holistic version of all of this, it's premium priced but they put a UI around it and let you auto-extract the data and apply it to the item in the SP library.
@kewtheii6764
@kewtheii6764 4 ай бұрын
Hi Christine, quick question, say if I have a PDF with 500 rows of data, when arranged, is spread to about 13 columns of categories in Excel, can power automate extract these information to Excel and sort to assigned columns and most importantly, identify key words i.e. Item codes, and dimensions within lengthy descriptions to input within Excel? Thanks!
@bi-ome
@bi-ome 4 ай бұрын
I'm not sure if you're trying to extract a structured table that's already in the format you want, or if it's less structured. If you're simply extracting table columns as-is, Power Query can do that on files or folders and just yank in whatever you want to Excel (it has a PDF or folder source option). The technique in this video is meant to be used where the structure is irregular or where the content is mixed with other text. Regular expressions can pull out item codes as long as they follow a particular format (e.g. 3 letters, hyphen, 3 numbers, or any kind of very regular pattern). If your item codes etc are not super formulaic, you might have better success with an AI tool. There's a bunch of different ways to do that; I'm partial to just sending the text to Azure OpenAI with a prompt and asking it to extract whatever you want and send it back as JSON with a particular format. If you give it an example of the desired structure, it'll be able to send it back to you in the same format, and then you can expand it as a table with Power Query. That will let you extract a handful of things at a time with good structure to it.
@Parth-e9r
@Parth-e9r 9 ай бұрын
You just extracted the values once what should I do to run a loop and extract all the values from my data? Is there a way to do so?
@bi-ome
@bi-ome 9 ай бұрын
It's running in a loop on the files, so it'll extract each thing you configure it to once per document. and create a row in Excel for each. If your files are structured differently, you can provide more info about what you're trying to do and I can try to offer a suggestion. :)
@Parth-e9r
@Parth-e9r 9 ай бұрын
@@bi-ome See in my pdf I just dont have one id or one name I have multiple and I want to extract all of them one by one in excel so is there a way to do it? Like run a loop until all the values are extracted and then stop?
@bi-ome
@bi-ome 9 ай бұрын
@@Parth-e9r Yeah, there is a toggle in the "parse text" action where you can toggle whether it gets the first occurrence only or all. So if the names always occur after the text "Name: " you'd use Name: (.+) for your regex, toggle off "first occurrence only" and it should just put them all in an array. You can reference those based on array position, or for-each loop over the items in the array to do things with them. That said, if your PDFs are complex, it might be a lot easier to use AI tools like Syntex (but more expensive).
@Parth-e9r
@Parth-e9r 9 ай бұрын
@@bi-ome Actually I used a python script to extract the data, the normal Parse Text was not working, so I dont know how to reach to the solution of my problem.
@starkybaxter4867
@starkybaxter4867 8 ай бұрын
Hi, Is this also works on Leads data pdf sheet? I want them to extract to csv file
@bi-ome
@bi-ome 8 ай бұрын
I'm not sure what a Leads data pdf sheet is, but it will work on any selectable text data in PDFs. If you want it to go to CSV, you can save the Excel file as CSV type at the end of the flow. There's also an action to go directly to CSV from a data table variable, skipping Excel, but we're not using data table variables in this particular tutorial - either will work
@theretheheheheee
@theretheheheheee 9 ай бұрын
I have like a hundred pdf personal data, like ID card, which state name, dob, etc. Can I use this to transform it to excel rather than to typing it one by one ?
@bi-ome
@bi-ome 9 ай бұрын
Yes, as long as they follow a similar structure (e.g. state name is always after "State:") and the text is selectable in the PDF (not scanned). If you're working with scanned ID cards, you need to use OCR on the text first, which you could do with a call to an API service.
@yandisafanqa6050
@yandisafanqa6050 5 ай бұрын
Thank you for this…exactly what I need
@user-di3rl6ww2d
@user-di3rl6ww2d 9 ай бұрын
Hi what if the pdf info is all image and unable extract the data
@bi-ome
@bi-ome 9 ай бұрын
You would need to use OCR on it in the flow to get the text. There’s various tools to do this, easiest to use is probably AI Builder in Power Automate Cloud-
@narmididi6336
@narmididi6336 11 ай бұрын
Hello. How do i extract from PDF address info which is not on the same line? Thanks
@bi-ome
@bi-ome 11 ай бұрын
Regular expressions are really commonly used to parse addresses, my guess is Chat GPT or the internet has them readily available. I would try to get the address, then split on the line break and comma to separate out the street, city, and zip-
@davidsides5398
@davidsides5398 Жыл бұрын
I get an error that says Index '1' is out of range, when it tries to write the second line item. Any idea?
@bi-ome
@bi-ome Жыл бұрын
That usually means it’s blank, that there’s nothing in it. I would check to see what ends up in your variable when it runs, your expression may need adjustment-
@partththombre7446
@partththombre7446 6 ай бұрын
​@@bi-ome I have got the same error , there is a [] sign besides my variable name please can you help me with the same??
@Gaurav19882001
@Gaurav19882001 9 ай бұрын
Thank You So Much for your lovely Support please make some more other tutorials on PDF to Exel mam Please it will be great help for All
@bi-ome
@bi-ome 9 ай бұрын
Is there something in particular you’re trying to do? The thing with Power Automate is it can do SO many things, it helps to know what people are looking for
@jomelreyes9133
@jomelreyes9133 9 ай бұрын
Hi can you help, I cant proceed to Value to write field on Write to Excel worksheet action my flow variable is Invoice. In the latest power automate the default is =Invoice not %Invoice% and when I attemping to put [1] or =Invoice[1] it gives me an error "unexpected Chatacter"
@bi-ome
@bi-ome 9 ай бұрын
Do you have this feature turned on? "Power Fx enabled (Preview)." I heard from another commenter that this setting was on and it was changing the structure of all of the variable references. You might toggle it off and see if that does it. This video was created before this feature was added.
@jomelreyes9133
@jomelreyes9133 9 ай бұрын
​@@bi-omeIt worked, because I couldn't figure out how to disable Power FX, what I did was I just created a new flow, and yes, it worked. Thank you for your help.
@KutzKraftsNKreationz
@KutzKraftsNKreationz 7 ай бұрын
I need it for my business but i zoned out after 3 mins. has nothing to do with the video. I am just not tech savvy. how can I get a one on one to see if i can use it for my business?
@bi-ome
@bi-ome 7 ай бұрын
If you have any particular questions you can ask here and I'll try to answer. Any automation tool will require a technical person, or at least someone who can make it through a video or two lol. I would recommend this tool for people who use Microsoft 365, thus are already invested in Microsoft stack products. I used it here because it's what was available to me at the time.
@jerrysaen
@jerrysaen 8 ай бұрын
Thank you very much! This helps me with my business
@lucasyoung9594
@lucasyoung9594 4 ай бұрын
I went through these steps, but it doesn't work with PDF forms that have filled in information (e.g., the PDF has a field that someone has filled in with info like their name, etc. On the pdf, you'll see those texts in the blue areas which can be edited). Basically the text extracted is everything except the information input by the user. Is there a way to get to that text and extract it?
@bi-ome
@bi-ome 4 ай бұрын
That's a good question... I found this thread where someone has a similar issue - they're using another tool but I think the same logic applies, where you can flatten or print to pdf to make the text readable. There are third-party tools that can extract field data too, but ofc they will have some associated cost. It's not super straightforward. www.reddit.com/r/excel/comments/16u54ah/how_do_i_extract_data_entered_by_a_user_into_a/
@akilalaksiri9172
@akilalaksiri9172 7 ай бұрын
Thank you very much! You saved me :)
@AsaelCosentino
@AsaelCosentino 7 ай бұрын
Getting the problem all the time: "file is open in another application". Someone knows how to fix?
@bi-ome
@bi-ome 7 ай бұрын
You can only have the Excel file open in one instance. So if you are testing your flow, you need to remember to close the file between runs so that it’s not already open when the flow tries to open it :)
@Payton-Prescott
@Payton-Prescott 3 ай бұрын
This is great, but extraktAI works a lot better for me. Great video either way!
@bi-ome
@bi-ome 3 ай бұрын
Thanks - this technique is the low-tech free method. All the AI tools for this that I'm aware of have a cost associated to use at scale.
@Payton-Prescott
@Payton-Prescott 3 ай бұрын
@@bi-ome Yes, you are absolutely correct. This is an amazing free solution, thank you!
@Eli22clem
@Eli22clem 9 ай бұрын
Didn't work (edit) After mapping the shared folder my pdfs were on to a drive it did work! Now just have to get it to pull all instances in each pdf and not just the first.
@bi-ome
@bi-ome 9 ай бұрын
Yes, quite literally nothing you will ever make in Power Automate desktop will work the first time you run it - it requires troubleshooting to get things working. That's part of the process. :)
@tsoares1982
@tsoares1982 5 ай бұрын
EXCELENT!!!
@shivarajaprasadambati9632
@shivarajaprasadambati9632 Жыл бұрын
I have pdf where data having in other languages
@bi-ome
@bi-ome Жыл бұрын
You can look for other language words with regular expressions too!
@jl6523
@jl6523 9 ай бұрын
She's good!
@MCAES01
@MCAES01 9 ай бұрын
you record your extra large screen and when we watch it on small screens or smartphones the words become microscopic get a dislike
@123456789santia
@123456789santia Жыл бұрын
Do you work for google? cause your channel is full of knowledge that takes too much time to learn.
@bi-ome
@bi-ome Жыл бұрын
Lol, no, thanks though - I have been using most of these tools a really long time. PA Desktop is pretty new for me, but I picked up this particular trick out of desperation trying to complete a particular project while doing a lot of mental flailing around trying to learn the tool. Thought it might save someone else some stress to share. 🤣
@lizzyvallejo8298
@lizzyvallejo8298 10 ай бұрын
Necesito extraer el Saldo Actual de un extracto bancario el dato se muestra así: Saldo Actual $ 1.234.567.890.23 y cuando hago el paso de analizar texto no me trae información, sabes que debo cambiar?
@bi-ome
@bi-ome 10 ай бұрын
Puede ser complicado cuando tienes que lidiar con comas y signos de dólar, porque el PDF es texto y no necesariamente reconoce que esos son parte del número. Yo intentaría algo así (usé GPT para obtener esta fórmula, es muy bueno para hacer expresiones regulares). Current Balance \$\s*(\d{1,3}(,\d{3})*) Explicación: Current Balance \$: Coincide con el texto literal "Current Balance $". El signo de dólar se escapa porque $ es un carácter especial en regex. \s*: Coincide con cero o más caracteres de espacio en blanco. (\d{1,3}(,\d{3})*): Captura el número. \d{1,3}: Coincide con 1 a 3 dígitos (para el primer grupo del número). (,\d{3})*: Coincide con cero o más grupos de una coma seguida exactamente por tres dígitos.
@aycampos
@aycampos 9 ай бұрын
@@bi-ome thank you so much for putting this video together, I have watched 3-4 times already and I almost finish my flow, the only thing is the same issue or similar as above, but my amounts do not have a dollar sign and are not next to the word but down below in a different row or underneath, and I have up to millions for the amount, could you please help me with the expression? Thank you
@bi-ome
@bi-ome 9 ай бұрын
​@@aycampos Yeah, the expression TargetText: (.+) should get whatever is on the line below the target text (replace target text with the key word you're looking for). I'd run it with that and see if you get something you can split parts off of. It depends on if you have other text in that second line or not. For all of these I am just using ChatGPT to help me write regular expressions, just give it an example of what your content looks like and ask it to write "regex without look-aheads or look-behinds". It's very good at regular expressions. 🙂
@aycampos
@aycampos 9 ай бұрын
@@bi-ome oh wow, thank you for getting back to me so quick, I'll try both the expression and ChatGPT.
Properly Convert PDF to Excel
11:28
Leila Gharani
Рет қаралды 1 МЛН
Power Automate Desktop - 5 real life use cases
8:40
Mathias Kjeldsen
Рет қаралды 905
Леон киллер и Оля Полякова 😹
00:42
Канал Смеха
Рет қаралды 4,7 МЛН
Cheerleader Transformation That Left Everyone Speechless! #shorts
00:27
Fabiosa Best Lifehacks
Рет қаралды 16 МЛН
She made herself an ear of corn from his marmalade candies🌽🌽🌽
00:38
Valja & Maxim Family
Рет қаралды 18 МЛН
Web Scraping Made EASY With Power Automate Desktop - For FREE & ZERO Coding
13:11
Use This Trick to Automate Any Excel Task (Better Than Macros)
10:53
Kenji Explains
Рет қаралды 111 М.
PDF to Excel Converter
22:34
Kevin Stratvert
Рет қаралды 238 М.
How to Extract Data from PDF with Power Automate
29:30
Anders Jensen
Рет қаралды 240 М.
Import Outlook to Excel with Power Automate Tutorial
13:33
Kevin Stratvert
Рет қаралды 370 М.
EASILY Make an Automated Data Entry Form in Excel
14:52
Kenji Explains
Рет қаралды 948 М.
📊 How to use Power BI DAX - Tutorial
37:28
Kevin Stratvert
Рет қаралды 1,1 МЛН
My JBL GO Collection 🔊
0:11
Bass Super Test
Рет қаралды 1,4 МЛН
Samsung S4 is the Goat 💀 #trollface #trending #trending #viral #shorts
0:19
Cómo meter un TELÉFONO en un aula📲 #cortos
0:28
Jelly Jail Spanish
Рет қаралды 1,1 МЛН
Máy Đếm Tiền Tính Tổng Chính Xác #shorts
0:26
Review Máy Đếm Tiền
Рет қаралды 904 М.