UiPath Document Understanding: Extract Tables Out of PDFs

  Рет қаралды 40,295

Anders Jensen

Anders Jensen

Күн бұрын

This full video tutorial shows, how to extract table data out of PDFs with the Document Understanding Package from UiPath to Excel. With the help of a template we are able to extract data from PDFs with our UiPath Workflows. Here we have a structured PDF document, as it is time sheet forms.
You could also watch:
🔵 UiPath Document Understanding - Invoice Data Extraction - • UiPath Document Unders...
🔵 How to extract data from PDF's with RegEx - • How to extract data fr...
0:00 Use case presentation
We have a folder with time sheets PDFs and we want to extract data out of each of them with the Document Understanding package from UiPath and into Excel. The data is structured, meaning that the layout is fixed across all the documents. We will use a template-based approach to determine, which data to read and collect.
1:41 Install packages
We install three packages (1) UiPath.DocumentUnderstanding.ML.Activities, (2) UiPath.IntelligentOCR.Activities and (3) UiPath.OmniPage.Activities to enable the activities, we will use in this use case.
2:25 Load Taxonomy
We use the Load Taxonomy activity to define and set the files/data for the extraction. You can later edit it, if you miss something, so don't worry. We will create a group and category, where our Time Sheet case will go in. We extract a single field of text (employee number) and a table of the time registrations.
5:17 Digitize Document
We now digitize the text and the location and will have output of a string (the text itself) and a Document Object Model (information and properties about the text). You can use all the OCR Engine, but I prefer the OmniPage.
6:53 Data Extraction Scope
Based on our rules we can very easily extract the PDF data. I recommend you install Notepad++ (simply just Google and download). We use the program to edit our taxonomy JSON. We need to copy the DocumentTypeID from the JSON to use in our workflow. Because we have structured/form data, we can use a Form Extractor activity. Use the default End Point and then go to your UiPath Automation Cloud and get the Api Key (it's free). Afterwards we create a template, where we define how our looks like and then specify what data, we want to extract. In Configure Extractors just pick everything.
13:23 Export Extraction Results
We take our extraction results and output it into a DataSet.
13:50 Understand the output data
With a For Each, an Output DataTable and a Write Line we can take a look at the data. Remember to use the Tables property to our DataSet. We now have two DataTables with our data, that we can work with.
16:02 Build DataTable for output data
We create a DataTable with just one column (Employee Number), which we can later merge with our time sheet table data. Besides the column header there is nothing in it.
17:08 Merge Data Table
We merge the newly created Data Table with the Employee number header with the Data Table with the extraction results (the time sheets table).
17:54 Iterate through our output and add data
Using a For Each Row we iterate through our extracted data and then add our employee number as a string.
19:02 Write the extracted to Excel
Use the Write Range activity to write the data to Excel. Remember to Add Headers.
20:26 Extract multiple PDF files
We expand our solution to solve for the case, where we have more than 1 PDF file. We use a For Each and the Directory.GetFiles method. Remember to change the TypeArgument to String. Drag our activities in and change the strDocumentPath to item. Furthermore we need a final DataTable, that is completely empty, where we add data for each of our iterations.
💼 Get the files from the video: andersjensen.org/uipath-docum...
Connect with me:
🔔 Subscribe - kzbin.info...
💼 LinkedIn - / andersjensens
👥 Facebook - / andersjensenorg
💌 Email Newsletter - andersjensen.org/email-newsle...
#uipath #rpa #documentunderstanding

Пікірлер: 139
How to extract data from PDF's with RegEx in UiPath - Full Tutorial
17:30
Please be kind🙏
00:34
ISSEI / いっせい
Рет қаралды 188 МЛН
Неприятная Встреча На Мосту - Полярная звезда #shorts
00:59
Полярная звезда - Kuzey Yıldızı
Рет қаралды 7 МЛН
БОЛЬШОЙ ПЕТУШОК #shorts
00:21
Паша Осадчий
Рет қаралды 7 МЛН
UiPath Video Tutorial for PDF (extract Tables from PDF)
42:06
UiPath Video Tutorials made by Cristian Negulescu
Рет қаралды 10 М.
How to Extract Data from PDF with Power Automate
29:30
Anders Jensen
Рет қаралды 205 М.
UiPath Document Understanding Framework: A Crash Course
1:02:06
UiPath Community
Рет қаралды 7 М.
UiPath PDF Table Data extraxtion with ReGex
12:43
UiPath with Jeppe
Рет қаралды 10 М.
UiPath Document Understanding - Full Tutorial
43:31
Marcelo Cruz
Рет қаралды 21 М.
How to Extract Invoice using UiPath Document Understanding and Action Center
28:53