Oracle APEX: How to extract text from inside a PDF or Word Doc

  Рет қаралды 3,732

Chip Baber

Chip Baber

Күн бұрын

In this video we will show you how to quickly extract all the text from inside a pdf or word document and store the information inside a CLOB for quick retrieval.
Our video begins with a table containing a BLOB column with several pdf's and .doc's. We will start by creating a filter on the table to index the documents leveraging CTXSYS.CONTEXT and CTXSYS.AUTO_FILTER.
Next we will create a table with a CLOB column to store the text inside of the pdf. Once created we showcase a short segment of code leveraging the ctx_doc.filter API to process all the BLOBs extracting the raw text into the CLOB.
With the information inside a CLOB Oracle APEX can now easily display and process the text to users at a fraction of the time it would take to display or download the BLOB.
Sample Code leveraged in this video:
github.com/chipbaber/apex_tex...
Related Videos of Interest:
How to print a CLOB inside a Dialog Window in Oracle APEX
• How to Print a CLOB in...
How to enable Full Text Search on a BLOB
• Oracle APEX: How to en...
Oracle APEX: How to add an Image as BLOB to Existing Table/Form/Report
• Oracle APEX: How to ad...

Пікірлер: 14
@organismisimbiotici
@organismisimbiotici 4 ай бұрын
wonderful! I've been trying to convert a PDF/BLOB to CLOB in APEX for days!! Thank you!
@chipbaber
@chipbaber 4 ай бұрын
glad it could help
@asifiqbal5877
@asifiqbal5877 2 жыл бұрын
Hello, can we upload records from oracle apex to an MDB format file?
@chipbaber
@chipbaber 2 жыл бұрын
I am probably not the pro on this front with MS Access. But found this in the Oracle forums. asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:9523935800346847870
@deepakdakhore
@deepakdakhore 2 жыл бұрын
Hi I am getting with CTX_DOC package, I am using 21.2 apex version and Database 12C ORA-20000: Oracle Text error: DRG-50857: oracle error in ctx_doc.filter ORA-20000: Oracle Text error: DRG-11207: user filter command exited with status 127
@chipbaber
@chipbaber 2 жыл бұрын
Check to see if the user running ctx_doc has create table and create trigger privileges. A Text index internally creates some tables like DR$$R, DR$$I, DR$$K.To be able to create those tables, these privs are required.
@kauecastelani4417
@kauecastelani4417 10 ай бұрын
at here in oracle g11, does not work. Just staying an ' - ' after sucess process, do you know why?
@chipbaber
@chipbaber 10 ай бұрын
What if anything do you see when you query your filtered_doc table? Or is this error slightly before that? This demo was done on a DB 21c.
@uselvan
@uselvan 9 ай бұрын
Hi., Same like can we extract the image from a word/docx file.?
@chipbaber
@chipbaber 9 ай бұрын
I don't believe there are any native libraries in the database for this today. It can be done though in Java, found this example. gist.github.com/aspose-com-gists/7af5b641d0ab658dbddce3292649c227 So one path could be to use Oracle functions and java to consume the doc and output the images, then save the images inside the database or in object storage.
@luisf.rodriguezgarcia2888
@luisf.rodriguezgarcia2888 Жыл бұрын
Hi, I have done all steps and works really fine, but when I try to convert a large pdf (54 pages) it only give me a string like this SKM_C3320i23022111200 in the filtered_docs table. I'm wondering if there's any limit of size with this index ?
@chipbaber
@chipbaber Жыл бұрын
So the filtered docs table stores the result of the index inside a CLOB. The max size of a CLOB should easily handle 54 MB. docs.oracle.com/en/database/oracle/oracle-database/19/refrn/datatype-limits.html . Couple small checks you probably tried already but just in case. After upload make sure to rebuild the index, example at bottom of markdown page resumeAdmin.Batch_Create_Filtered_Docs(); If that doesn't work and the PDF is something you can share I can take a look at it if you shoot me a link.
@luisf.rodriguezgarcia2888
@luisf.rodriguezgarcia2888 Жыл бұрын
@@chipbaberThanks !
@luisf.rodriguezgarcia2888
@luisf.rodriguezgarcia2888 Жыл бұрын
@@chipbaber Hi Chip, I sent you an email
Oracle APEX: How to enable Full Text Search on PDF/DOC BLOBs
7:03
How to Print a CLOB inside a Modal Dialog Window in Oracle APEX
5:06
ПРОВЕРИЛ АРБУЗЫ #shorts
00:34
Паша Осадчий
Рет қаралды 6 МЛН
Каха и суп
00:39
К-Media
Рет қаралды 6 МЛН
Now THIS is entertainment! 🤣
00:59
America's Got Talent
Рет қаралды 37 МЛН
Does size matter? BEACH EDITION
00:32
Mini Katana
Рет қаралды 20 МЛН
The BEST AI VIDEO Generator is…(Head-to-Head Battle)
31:22
AI Samson
Рет қаралды 14 М.
PDF Viewer in Oracle APEX
4:16
Oracle Developers
Рет қаралды 22 М.
How does HTTPS work? What's a CA? What's a self-signed Certificate?
11:02
How to extract text from PDF file in C#
5:11
Gautam Mokal
Рет қаралды 30 М.
Difference between cookies, session and tokens
11:53
Valentin Despa
Рет қаралды 603 М.
Oracle APEX- PDF Preview in Page #oracle #oracleapex #apex
10:34
Oracle APEX Solutions
Рет қаралды 4,5 М.
تجربة أغرب توصيلة شحن ضد القطع تماما
0:56
صدام العزي
Рет қаралды 57 МЛН
S24 Ultra and IPhone 14 Pro Max telephoto shooting comparison #shorts
0:15
Photographer Army
Рет қаралды 8 МЛН
Look, this is the 97th generation of the phone?
0:13
Edcers
Рет қаралды 4 МЛН