Damn, that was awesome! I can only imagine, what goes through your head, when you simply write lines of code like if it was normal sentence and it actually works! Well done, you've deserved your like and subscribe.
@PythonicAccountant3 жыл бұрын
Thank you!!
@chriskeo3924 жыл бұрын
Incredibly talented. The way you graciously remedied, remolded, and reimagined the landscape of that PDF report. Inspired.
@surafeltilahun74042 жыл бұрын
I am just mesmerised by the way you are playing around with python script and regex. Mind blowing!
@pedrocolangelo24583 жыл бұрын
Sir, I have no words to express my gratitude for this video. That was exactly what I needed. Thank you!
@justyclacky4 жыл бұрын
YES! This is excellent! I am eight hours in on reports which I was loading into pdf2xl and finishing with PowerQuery. LOVE THIS!
@PythonicAccountant4 жыл бұрын
Great!
@redfriedrice3567 ай бұрын
Superb detail and very straight forward. Thank you for showing me your workflow and insightful guide to parsing semi-structured data
@PythonicAccountant7 ай бұрын
Thanks!
@abhishekbharti83532 жыл бұрын
Thank you man soo much earlier i was working with PyPDF2 but it was not providing proper spacing while reading the file but with the help of pdfplumber i fixed this issue. and now code is running completely fine. Thanks man for helping with the name of library
@JonathanGarza-cn8ym6 ай бұрын
pdfplumber has worked well when PyPDF2 doesnt
@tobias63613 жыл бұрын
Thank you dude! I finally found a real tutorial that helps me on my way getting data out of a pdf that Isn’t easy and well structured. I have another problem that you aren’t facing here. My invoice has items that are split on multiple pages and I need to figure out how to assign that data from the second page to the invoice item of the page before!
@PythonicAccountant3 жыл бұрын
That’s a fun challenge - similar to when you have multiple lines you want to assign to one. You have a variable that tracks where you are in the page / document, and use that to determine when to combine / assign info as needed.
@tobias63613 жыл бұрын
@@PythonicAccountant yes, I’m hyped to test it tomorrow, I tried some other libraries too, but none of them worked for me. In the first place I wanted to do it with OCR but quickly found out that there are easier options when the pdf is searchable :D but I also face an other Problem where the PDF isn’t searchable and these pdfs contain handwritten text + copied invoices, I worked with easyOCR at some point in a different project but it‘s not working for me right now
@adnanpramudio61093 жыл бұрын
Now I convinced to learn regex and lambda to extract text from pdf, thank you!
@darioserrano39673 жыл бұрын
Bro... you saved me AF... I was trying to get data from a PDF and export that data to .csv or .xlsx and I was about to die trying to solve it :p. BTW I'm very new in scripting. Cheer from Argentina. You deserve millons of likes and suscriptors! (now I'm one of them)
@PythonicAccountant3 жыл бұрын
Glad to help!!
@IgorRegly2 жыл бұрын
WOW!! You are on fire with this code man! That was awesome!
@raginisharma93024 жыл бұрын
Execellent Tutorial - thank you for sharing 👏
@nmogral2 жыл бұрын
Excellent! Thanks a lot for taking the effort to solve a realistic problem!
@rogerredhat14314 жыл бұрын
Thank you so much Pythonic Accountant for sharing your awesome skills in this video, it helped me a lot!
@PythonicAccountant4 жыл бұрын
My pleasure, thanks for watching!
@gregocanepa3 жыл бұрын
dude i was just looking for something like this and this video saved me hours of work. Awesome content, thanks for sharing!
@PythonicAccountant3 жыл бұрын
Glad I could help!
@moek7644 Жыл бұрын
Absolutely fantastic video. Quick question I have, what changes would you make to the syntax if the PDF consists of let's say 350 pages or more with the same form of data across the pdf? That would be the biggest help. Thank you
@PythonicAccountant Жыл бұрын
Thanks for the question! The only change id make is to start a list, and iterate through each page in the pdf, appending the resulting extract to the list. That would only hold one pdf page in memory at once, and the list wouldn’t get that big.
@moek7644 Жыл бұрын
@@PythonicAccountant Cool Cool 👊
@dmax92813 жыл бұрын
Very nice approach to solving this kind of problem. Thank you.
@Tshepom243 жыл бұрын
How did you fix that error on 8:07? because I also get the same error
@mowburnt3 жыл бұрын
Love it...I'm new to python...learning as I want to do some data analysis on a dataset that can only be generated in a poorly laid out pdf (don't ask why!) So this a great intro / guide to some fantastic functions which gives me hope :-)
@PythonicAccountant3 жыл бұрын
Awesome! Enjoy the journey!
@SharonLurye-p8n Жыл бұрын
This is very helpful, thank you. I was wondering, is there a way to search for properties besides the text itself? For example, say every vendor number is always highlighted in blue, or it's always bold and a larger font size, or it is always directly above a thick horizontal line. Is there a way for PDF plumber to search for the text that matches those particular properties?
@badalrathod24352 жыл бұрын
Awesome ! Thanks for such a informative session, can you please continue the nlp lectire on data extractions from diff files like json, webscrapping,and pdf
@PythonicAccountant2 жыл бұрын
I do have several on the web scraping, and more PDFs, so check out my other videos! I don’t have too much on JSON but I do have a lot of experience working with that format, so can certainly look at adding some videos on that!
@jenkinssthomson88792 жыл бұрын
Very nice! Thank you for sharing!!
@user-zh8ni2hb7z3 жыл бұрын
I'm so glad I just found this channel. Haven't watched yet but Thank you
@TheSuperUser2 жыл бұрын
Wow fantastic video! Very informative and shows how to parse PDFs in the real world! Well done. This will help me learn about Pythonic parsing from various data sources. Please make equivalent for MS Word docx files. Thank you! 👍
@auradkhan50653 ай бұрын
You are a literal Grandmaster.
@cvadodaria073 жыл бұрын
Awesome - difficult example done with ease. Thanks for sharing knowledge.
@tomasjsierra2 жыл бұрын
This video was very helpful. Thank you very much.
@blinky1892 Жыл бұрын
Thank you so much? Is data scraping easier via word than via pdf, would you say?
@PythonicAccountant Жыл бұрын
It depends on the structure and what you’re trying to get out of it.
@saurabhyadgire72823 жыл бұрын
I have a pre-downloaded pdf to work with. But I get this erroe when I try to read the pdf: PDFSyntaxError: No /Root object! - Is this really a PDF?
@raginisharma93024 жыл бұрын
This is super helpful. Thank you so much for sharing. One question that I have is on missing values. I see that your invoice happens to have no missing values for the columns . What if I have missing values in few columns. Is it possible to read those missing/ blank values of the table also while reading the text to maintain consistency of columns.
4 жыл бұрын
Good question!
@tridunghuynh55732 жыл бұрын
Thank you so much for your explaination
@JyotiJoshi-mm5oj11 ай бұрын
It's good, but I have a doubt if we want dynamically then how it will work because I think re is Checking the particular number of range
@waylonbailey3492 жыл бұрын
Thanks for the videos, this has been extremely helpful!
@omrangroup87122 жыл бұрын
Thank you so much. I am working with a 100 pages pdf file. I did your codes, but only 5 rows of data from 1 page were exported to the df. all of the pages are extracted. is there any solution?
@PythonicAccountant2 жыл бұрын
Really depends on a lot of factors, you’d need to do some debugging and looking at the format to see why it’s missing the rest
@mannycalavera121 Жыл бұрын
Very cool, thanks for making this video
@PythonicAccountant Жыл бұрын
You bet!
@SonuSingh-sn8qg2 жыл бұрын
This is so damn awesome. Great work
@PythonicAccountant Жыл бұрын
Thanks!!
@SonuSingh-sn8qg Жыл бұрын
@@PythonicAccountant I’m trying to extract data nested table from pdf, is it possible for you to make a video on how to do that?
@spaghetticarbonara80662 жыл бұрын
This video is amazing! Thank you!
@SolidBuildersInc2 жыл бұрын
This Reg Expressions is the Army Swiss Knife on steroids. Is it used for Web Scraping in your experience ? Is there another Library you know of that is a jewel like this in the toolbox. I feel like I'm gulping from a River that is flowing with Pure Water.
@vivianno3633 жыл бұрын
Not sure if it is needed to do so, as every invoice is actually generated by the data system, so it should be in excel format at the very beginning, why don't we use SQL to generate all the info ?
@horzaclosp30583 жыл бұрын
Very informative thanks. I have a problem though in constructing a regular expression where the invoice may have a "Disc Amount" or not. At this time the expression assumes it is always blank.
@talaleveati37933 жыл бұрын
Thanks, that was really helpful. Only a short side-note that you should have zoomed the screen to make it more readable.
@asheeshmathur11 ай бұрын
Good Tutorial, Does it works for Bularian PDF as well. Does it supports other Crllic languages as well
@ajinkyadhoke47133 жыл бұрын
How can we print a line on the basis of a word we are searching for in the pdf?
@PythonicAccountant3 жыл бұрын
If word in line: print(line)
@ajinkyadhoke47133 жыл бұрын
See i have pdf and from that pdf i want to extract the task name and only the time taken by them
@michaelkonstantinovsky21843 жыл бұрын
Thank you! Perfect explained, easy to follow your code. I have a question, related Hebrew/Arabic languages. When extracting the text, it's reversed. How can I fix it? Thank you!
@PythonicAccountant3 жыл бұрын
line = line[::-1] # reverses order of the text
@michaelkonstantinovsky21843 жыл бұрын
@@PythonicAccountant Works!!!!! Thank you very much!
@amerinosf3 жыл бұрын
When i enter the code done at the 3:06 mark, it only returns the last line of text from my pdf. Anyone else have this problem?
@crisansou3 жыл бұрын
Amazing video, thank you so much!
@mowburnt3 жыл бұрын
One question....can you add the invoice number against each line item too?
@PythonicAccountant3 жыл бұрын
As long as it’s in the data and can be identified / extracted from its pattern, definitely.
@ivangomez38353 жыл бұрын
Hi, i'd like to aoly to a table where every cell has many rows of text, i cannot use this method cuz the multilines in every cell of the table
@ernestocarreras76522 жыл бұрын
Great tutorial! I agree, it’s the best tutorial I found d so far when working with unstructured PDF files. I have a question although my docs are not invoices but forms. Some of these PDFs have graphics, i.e., blue circles/elipses to “select” the best answer to some questions. Is there a way to identify these graphics and then select the text enclosed in them? If needed, I can share one of PDF so you can see what I mean. TIA
@YoussefHIDSI5 ай бұрын
Please Ihave many invoices each one with different structure of line items, sometimes you find first description qunatity ...
@baobaocai15184 жыл бұрын
Amazing video! Thank you for posting! Just curious, is it possible to extract vendor # and invoice # as well, although some is empty in pdf? I have a similar problem here with a Xero journal report. The debit and credit figures are in two separate columns and not sure how to use RE to tell which number is debit, and which is credit.
@PythonicAccountant4 жыл бұрын
Yep that’s a challenge but can usually be done. There are ways to use spacing to determine which is from which column but it gets a little tricky sometimes. If you want to send a sample page with any sensitive data redacted, I’d be happy to give it a shot - pythoniccpa@gmail.com
@baobaocai15184 жыл бұрын
@@PythonicAccountant Just sent. Thank youuuuu!!!
@PythonicAccountant4 жыл бұрын
enjoy! kzbin.info/www/bejne/hpCodICNm6Zmg7c
@lenalzheng2 жыл бұрын
I have the same scenario as you with the vendor number and vendor name. How do I get it to print all vendors and not just the one on the last page? How do I use regex if my vendor number and name is separated by a hyphen? For example, 700 - Smith, Joe. Sorry I’m a python newb
@tanmayeebhavsar35704 жыл бұрын
Can we do the same for medical forms? Regex does not seem to work for that.
@PythonicAccountant4 жыл бұрын
Of course, regex doesn’t care what profession it’s supporting :) The biggest factor is whether you are dealing with a scanned PDF or a computer generated PDF. If it is computer-generated then you need to make sure the text you’re trying to capture is not in an image. If the PDF is scanned, or you are trying to capture data from an image, then you will need to use a different approach than just regex, as you’ll need to perform OCR. One way to do that is with Tesseract, but there are other options as well.
@DavidChandra3 жыл бұрын
An amazing way to extract data with regex. Btw, what editor are you using?
@callan8163 жыл бұрын
file_loc = 'directory' with pdfplumber.open(file_loc) as pdf: page = pdf.pages[0:148:] text = page.extract_text() print(text) it just returns: 'AttributeError: 'list' object has no attribute 'extract_text' please help
@artist.podium3 жыл бұрын
Extremely helpful and easy. I'm looking to extract transaction details from bank or credit card statements for analysis say 3yrs and need to come up with spending pattern or project style of spending in each category. The data in statement contains different boxes and views. Would like to see if you any videos
@damianpatino38533 жыл бұрын
Hello, that it recommended me to use in Python 3.9; Thank you.
@PythonicAccountant3 жыл бұрын
It’s always best practice to use the most current version of python available.
@archismannanda7106 Жыл бұрын
why didn't you capture the invoice number?
@PythonicAccountant Жыл бұрын
Probably didn’t think of it :)
@atharvsalunke53904 жыл бұрын
Can we make something different if not a complete spreadsheet is given to us to extract data from, but if we generate such PDFs upon the transaction and we want to create a script that could directly take those generated PDFs from the chrome browser and manage to gather the name,invoice no.,date and the amount ,amount + gst= total amount,and then format them into a spreadsheet and finally make a sheet which could include the total amount collected at the end of the day or when specificed by the user. Hope it won't be difficult as it looks eagerly waiting for your reply.
@gouthamkarakavalasa42672 жыл бұрын
Hi, can u know how to read entire pdf document using pdfplumber.. i guess, we are reading only one page at once .. can't we read entire pdf doc contains like 4 or more pages.. i am unable to do this .. pls can anyone share me the thoughts.
@PythonicAccountant2 жыл бұрын
Yes. But pdfplumber is not very efficient (at least in my experience), so it takes a while for a large pdf document. There are other options for extracting text that may work better for larger files
@varungupta72263 жыл бұрын
For me, in the xlsx file, date are being rendered as ######## with the message of 'possible error loss' in excel. Any idea why? (My dataframe looks completely fine) How shall I resolve, thanks!
@AbdullahKhan-pv1qz2 жыл бұрын
is it somehow possible to use python to find and LOCATE Mathematical equations in PDF's?
@Ndofi Жыл бұрын
can u share file fot testing please. Sorry my ignorance.
@PythonicAccountant Жыл бұрын
Here you go! www.tabs3.com/support/sample/apreports.pdf
@meetmeraj20004 жыл бұрын
Hey, Excellent video.Can you make a video on extracting text from multiple documents at once.Thanks
@tereslok20634 жыл бұрын
Fantastic video! It just solved my work problem easily! But I also tried to extract description which is in a line after all the invoice line items. I am not sure how to get the description and append it as the last column in the DataFrame. i have found no text pattern for the description, cant use re.compile to fetch. Could you tip me on that pls?
@PythonicAccountant4 жыл бұрын
You can usually grab descriptions not based on the pattern of the description itself, but the pattern of everything around it. For example if there are three sets of numbers that always show up before the description, then use those three numbers of the pattern to determine that the description is coming next. If you want to send me a redacted example I could try and take a look - pythoniccpa@gmail.com
@tereslok20634 жыл бұрын
@@PythonicAccountant thank you for the prompt reply^^. Unfortunately, there is only the description in the line, nothing else around it :(
@PythonicAccountant4 жыл бұрын
Teres Lok That makes it even easier! Just grab the whole line and that’s your description!
@tereslok20634 жыл бұрын
Pythonic Accountant how could I specify the description line right after the invoice line? I’m stuck here😢
@PythonicAccountant4 жыл бұрын
Teres Lok can have a flag for that. Maybe “in_line_item = False”. When your regex picks up that you’re in the line items change it to “in_line_items = True”. Then each iteration of a new line, of in_line_items AND the regex doesn’t match a line item, you know you’re now in the first line after the line items, which is your description line.
@manishdeshpande4 жыл бұрын
Wonderful video. Thanks much
@PythonicAccountant4 жыл бұрын
Thanks!
@emmadewhurst35653 жыл бұрын
This is amazing! Could you use this code if you wanted to grab text from multiple pages in a PDF?
@trispham52152 жыл бұрын
You can help me to extract a tabular data from image, pleaseee
@PythonicAccountant2 жыл бұрын
I would suggest trying it out yourself, and if you need help perhaps posting in the r/learnpython subreddit to see if someone can help you
@finansalmodeller3468 Жыл бұрын
Thank you so much! Could you please share the code 🙏
Sir, I have a pdf in the Punjabi language, an Indian language, but I couldn't extract it using your technique so help me, please.
@hayathbasha45193 жыл бұрын
Hi, In tabula, Is it possible to get page number from which we got dataframe Thanks in advance
@welbsantos4 жыл бұрын
Excellent !!! Great video !!!
@PythonicAccountant4 жыл бұрын
Thank you! Cheers!
@emmal453 жыл бұрын
just what I was looking for, Thank You
@mairinixon1 Жыл бұрын
Awesome video!
@PythonicAccountant Жыл бұрын
Thanks!
@catkin604 жыл бұрын
I have a really complex pdf table that i have to read. Would you mind helping me with it?
@erfanebrahimi97484 жыл бұрын
In comparison to pdfminer how do you evaluate pdfplumber? Is it possible to draw black boxes on the pdf specific parts by using pdfplumber? As I had a short glance, it seems pdfplumber doesn't write the result into a pdf unlike pdfminer, but it seems it keeps the structure of pdf unlike pdfminer which is very helpful. Any ideas?
@PythonicAccountant4 жыл бұрын
Check out the method im.draw_rect(bbox_or_obj, fill={color}, stroke={color}, stroke_width=1) from github.com/jsvine/pdfplumber
@erfanebrahimi97484 жыл бұрын
@@PythonicAccountant Awesome, thanks.
@williamseewald2 жыл бұрын
Incredible work! Congrats! Do you think it's possible to extract a nonpattern text from a PDF file and move it to another text file?
@PythonicAccountant2 жыл бұрын
Yes, as long as there’s a way to identity which portion of text you want to grab
@AgbiJoseph2 жыл бұрын
How do you read from PDFs already downloaded in your system without creating a url for them?
@PythonicAccountant2 жыл бұрын
That’s even easier - just point to the local file path. Can skip the part to download it.
@sibghatullah27364 жыл бұрын
Amazing tutorial, my question is: "is there any method in python for doing same with scanned images as well like .png, jpg etc..."
@PythonicAccountant4 жыл бұрын
Hey Sibhat, yes, but it is a little different. It’s basically the same as using a scanned PDF file. I’ll try and do a video on that soon.
@sibghatullah27364 жыл бұрын
@@PythonicAccountant, Extracting original layout of any scanned document and exporting it to word, excel what ever is easy :)
@harshitagupta84364 жыл бұрын
@@PythonicAccountant sir plz make the video on this
@PythonicAccountant4 жыл бұрын
Harshita Gupta here you go! kzbin.info/www/bejne/nZ2tmmaCd8yhb7c
@divyebhardwaj2414 жыл бұрын
It was very helpful,thank you for sharing!!
@sameekghosh10 ай бұрын
Excellent ...
@TheBtrivedi Жыл бұрын
Superb. ❤
@PythonicAccountant Жыл бұрын
Thanks!
@academysolution8074 Жыл бұрын
Is it possible to extract only text that is in red color font from pdf by using font ???
@PythonicAccountant Жыл бұрын
I think so but I haven’t tried it
@kornflakesss3 жыл бұрын
thanks man u are amazing! keep it up!
@MaltrebaManalu4 жыл бұрын
Wow. find this cool tutorial after struggling with with pricy proprietary app. Btw, how to extract more than one line as single value. for instance there is an invoice with 3 lines in its description. Thank you.
@PythonicAccountant4 жыл бұрын
Absolutely doable! Takes a few extra steps, particularly taking some steps to identify what row you are in and when you reach a last row that you need to combine all the data. I will keep this in mind for a future video
@ivangomez34053 жыл бұрын
@@PythonicAccountant that would be awsome, i have the same problem, a "cell" in a table with more than one row
@steventezcan6894 жыл бұрын
This is amazing! Thanks for posting! How did you learn coding like this? By hands on? Do you have a web site or blog where we can get the codes?
@PythonicAccountant4 жыл бұрын
Thanks Steven! I learned pretty much everything I know about python through a few Coursera courses, a few python books, and some of the TalkPython trainings. Then it was just a bunch of playing around and having fun. All hands on, and trying to solve problems. Lost of googling also, that’s a programmer’s best friend no matter how experienced. I’ve included the code for most of these videos on github at github.com/danshorstein/pythonic-accountant
@steventezcan6894 жыл бұрын
Sincerely thank you! Great learning repository for individuals like me! Please keep posting!
@expat20104 жыл бұрын
When you are as brilliant as this guy, everything is easy :). The code is in the github link above.
@PANDURANG998 ай бұрын
is it possible to read read pdf from online location like google drive, sharepoint using python without download pdf
@PythonicAccountant7 ай бұрын
yessir! check out kzbin.info/www/bejne/bnungoCNbLWoi7s
@mohammadhegazy12858 күн бұрын
That was great thank you
@PythonicAccountant7 күн бұрын
Glad you enjoyed it!
@madhavkumarpancholi9842 Жыл бұрын
pdf plumber is not working with my pdf. i don' t see any text with extract_text function. what to do?
@PythonicAccountant Жыл бұрын
Is it a scanned or computer generated pdf?
@madhavkumarpancholi9842 Жыл бұрын
it's scanned. i had to use pytesseract to extract the data.
@PythonicAccountant Жыл бұрын
@@madhavkumarpancholi9842 it’s going to perform a lot less accurately on text that has gone through OCR, but you should still be able to use PDF Plumber I think, you just need to be careful about expecting some of the characters to not work correctly
@madhavkumarpancholi9842 Жыл бұрын
I am still working on it. i have a scanned pdf from which i have to extract text data. some of the data is inside the table in the file. problem is no library such as pdfplumber, tabula is working on it. it is giving me no output. i tried to do ocr using pytessract but even in that output i am not getting any data from the table. i am getting only the text which is outside that table. do you have any solution for that.
@RahulPrajapati-jg4dg3 жыл бұрын
hello i need some help how i can contact you
@PythonicAccountant3 жыл бұрын
Can email me at pythoniccpa@gmail.com
@RahulPrajapati-jg4dg3 жыл бұрын
@@PythonicAccountant I did you mail sir
@hari-codes4 жыл бұрын
Awesome Tutorial. Really informative, and I learnt a lot. I have a question, suppose say i have a list of application forms (pdfs/images) in which users fill them by handwritten. say "Name: _________ " Here Name: is going to be in Computer Printed text, but the blank is filled with handwritten text. And there are multiple fields that are need to be extracted with the similar issue. How can i get the the data extracted from these?
@PythonicAccountant4 жыл бұрын
Best bet is using OCR, with something like tesseract. Here’s an example of one of many tutorials that can help with that www.pyimagesearch.com/2017/07/10/using-tesseract-ocr-python/
@hari-codes4 жыл бұрын
But that ocr (tesseract) is for only optical characters like the digital types text only right? I don't think that can be used for handwritten text detection
@PythonicAccountant4 жыл бұрын
Hari Vamsi oh right, that’s different. Much more difficult and I haven’t done much with that, but think tensorflow or PyTorch could be your best bets for building a handwriting recognition model. Something like link.medium.com/QkxEiMcJN6
@GAURAVRAUL952 жыл бұрын
Thanks a lot man 👍
@manjubadiger29024 жыл бұрын
Can use this model for invoice image structure extraction?
@PythonicAccountant4 жыл бұрын
I’m not exactly sure what you’re looking to do, can you clarify with an example?
@JayadevLenka Жыл бұрын
Excellent!!
@PythonicAccountant Жыл бұрын
Many thanks!
@gvenagas7 ай бұрын
I found that by opening a pdf file with Mozilla Firefox and inspecting it with the developer tools you can collect its text (with the help of JavaScript) after the web browser has converted it to HTML and maybe save it for further processing with someone programming language.
@PythonicAccountant7 ай бұрын
Yeah there are quite a few ways to get the text from PDF file, it’s just a matter of whether you need it to maintain information about where the text is on the page, and if you need to create an automated process versus manually clicking, that’ll impacts what method ends up being the best.
@davidm38944 жыл бұрын
Why is there a "*" by vend_name (at 3:36)?
@PythonicAccountant4 жыл бұрын
That’s used for tuple unpacking when there are a varying number (0 or more) values in that part of a list or tuple. See the “Asterisks in tuple unpacking” section of this post ( treyhunner.com/2018/10/asterisks-in-python-what-they-are-and-how-to-use-them/#Asterisks_in_list_literals ) for more details
@SK-jv2ro3 жыл бұрын
Where can we find the code you have created here ?
@PythonicAccountant3 жыл бұрын
It’s in the description - github.com/danshorstein/pythonic-accountant/tree/master/015%20Extract%20line%20items%20from%20PDF%20AP%20listing
@my_opiniondemocracy6584 Жыл бұрын
can you show a method to do this without regex?
@PythonicAccountant Жыл бұрын
Eh not really easily. GenAI isn’t good enough yet, from my initial experiments. And other methods would mostly still require pattern matching, just may not be as difficult as regex. Suggest trying another pattern matching library
@daveg5857 Жыл бұрын
I think AI Builder in Power Automate is becoming a viable alternative to using Python for this. This is a little above my head, but thank you.
@PythonicAccountant Жыл бұрын
Thanks I’ll have to check that out
@daveg5857 Жыл бұрын
@@PythonicAccountant I lied about one thing: It's a lot above my head! Haha. But, there's literally never been a better time to learn, especially with resources like this.
@PythonicAccountant Жыл бұрын
@@daveg5857 haha you’ll get there, just practice and have fun!
@rushabhukani20212 жыл бұрын
Can you make video for pdf with unstructured data
@PythonicAccountant2 жыл бұрын
Maybe! Do you have any example pdfs you can reference?