Master Scanned Book Processing: Acrobat Pro: Comprehensive Guide: Optimal Efficiency, Searchability

  Рет қаралды 18,036

Digitize Your Books

Digitize Your Books

Күн бұрын

Пікірлер: 53
@stevenwoodfield1658
@stevenwoodfield1658 11 ай бұрын
Thank you so much for your videos! I've found them a perfect starting point for my own digitizing journey. Before I go on to the next step in the process, I come back to reference your videos. Thank you for saving me time and a headache, blessings to you!
@DigitizeYourBooks
@DigitizeYourBooks 9 ай бұрын
Thank you for the kind words! I'm glad I could help.
@etiennedegaulle3817
@etiennedegaulle3817 3 жыл бұрын
FYI, as of the December 2018 version of Acrobat DC, "the embedded index in the PDF is no longer used for searching." Hopefully they are adding an internal optimized index by default. Great video by the way! I just started digitizing most of my library. This video save me lots of time and trial and error!
@DigitizeYourBooks
@DigitizeYourBooks 3 жыл бұрын
Thanks for the update! Cheers!
@larrythibodeaux7236
@larrythibodeaux7236 10 ай бұрын
Thank you so much for this! I bought the czur shine book scanner, and it scans flat paper ok. but converting the text into a searchable pdf is not good at all. This is way better than CZUR! Adobe doesnt make unknown characters like , and it doesnt combine 2 words into 1. And it also doesnt change the order of the paragraphs when you copy and paste it. Thank you for showing me this!
@DigitizeYourBooks
@DigitizeYourBooks 6 ай бұрын
👍
@UncleMatte
@UncleMatte 5 жыл бұрын
Thanks for this video, in showed me LOTS of things I was doing wrong, and how to improve things way beyond what I was doing. I was hoping to ask you a REALLY long question, probably to long to put here? I there any way to send it to you, or I can put it on my dropbox account? for you to read if you have a spare moment or two? .... I can barely use Facebook, no clew about twitter or any of the "other ones" .... I can post it here, but it might bore everyone! Thanks Again !!!
@scotto0010
@scotto0010 6 жыл бұрын
This is an amazing set of videos. Very concise when it can be but then plenty of detail here in the final video where it is really needed. You have a very good system and seem to have thought about everything. I would love to pick your brain about this type of project but replace the word "books" with "magazines". There are a whole host of additional issues to deal with on the magazine side. Thanks for your time and effort.
@PeterMosier
@PeterMosier 6 жыл бұрын
scotto0010 Thank you so much for the kind words. You’ve made my day! As for magazines: when I first started experimenting with and learning how to do this, I did a few magazines. The results weren’t quite as good as text-only books, which benefit from Clear Scan, but they were still pretty good. Some “moire” patterns in the photos, but I didn’t play enough to improve that totally. If you have a few questions, feel free to ask in this thread. Maybe I can help, or perhaps make a video about digitizing mags. Thanks again.
@scotto0010
@scotto0010 6 жыл бұрын
That would be great. Th main things about magazines are #1 how do you deal with yellowing of old magazine pages? Also, what is the best "Dpi" to scan them at? Is it best to start big and down-sample form there? How would Clear Scan work with magazines - especially with colored pages and colored text? I suppose a lot of the questions though would entail the post-processing in Photoshop in order to make it look right. What size files would we be looking at for a "good", highly readable product? It seems that really only a few things would overlap between creating text only books and picture/image heavy magazines.
@larrythibodeaux7236
@larrythibodeaux7236 10 ай бұрын
Also, can you do a video of audiofying your books with the software balabolka? Meaning making them into audio books? And buying a text to speech voice like IVONA Amy voice?
@DigitizeYourBooks
@DigitizeYourBooks 6 ай бұрын
Ooohhh... this touches a nerve for me. I have produced some audiobooks the old-fashioned way, by performatively reading the text and then painstakingly editing the production. (voice.mosier.ca/). Automated text-to-speech (TTS) is destoying the low end of the audiobook narration vocation. Having said that, I might to a video on this topic just to compare the results to a human reader. Thanks for the suggestion!
@larrythibodeaux7236
@larrythibodeaux7236 6 ай бұрын
@@DigitizeYourBooks Haha that sounds great!
@gabrielcastejon7914
@gabrielcastejon7914 2 жыл бұрын
You're a great samaritan
@DigitizeYourBooks
@DigitizeYourBooks 2 жыл бұрын
Glad to help!
@MrsCalabresesTeachingChannel
@MrsCalabresesTeachingChannel 6 жыл бұрын
Great information! Thanks! Mac user here, I often find Adobe difficult to navigate, this helps!
@DigitizeYourBooks
@DigitizeYourBooks 6 жыл бұрын
Glad you liked it!
@rodolfo6168
@rodolfo6168 3 жыл бұрын
Recommend Downsample: 300 dpi
@DigitizeYourBooks
@DigitizeYourBooks 3 жыл бұрын
I scanned at 300 dpi (see 10:15 in video). Are you recommending down-sampling to something lower than that? Thanks!
@theanthropic8114
@theanthropic8114 7 ай бұрын
Thanks for the tips. For my part, I found it odd that after combining my .tiff files using Adobe Acrobat DC Pro, then converting them to searchable images at 300dpi (originally 600dpi), then to editable text and images, the file size somehow increased from 9mb (after searchable images) to over 10mb (after editable text and images). Any idea why? Thanks.
@DigitizeYourBooks
@DigitizeYourBooks 6 ай бұрын
🤔 I have no idea why it swelled after editable text & images. Fortunately, going from 9MB to 10 MB is not a big difference. I wouldn't worry about it -- but I'm still wondering why it happened!
@koritz123
@koritz123 6 жыл бұрын
Would Adobe Acrobat Pro upgrade 2015 do an equivalent job as opposed to leasing the 2017 version for a month. The 2015 upgrade is available for $65. I checked and if I'm not mistaken the 2017 version of Adobe DC leases for approximately $25 a month as of December 2017.
@DigitizeYourBooks
@DigitizeYourBooks 6 жыл бұрын
Hi koritz123, thanks for asking, I am flattered that you asked. However I do not know the answer. They key question is: does the 2015 version do the "Editable Text and Images" OCR feature? I think it was called "ClearScan" back then, and I don't know if Adobe made any changes to the algorithm when they changed the name. You should also know that the 2017 Subscription includes other services that may, or may not, be important to you, so you can consider that. Having said that, if the 2015 has ClearScan (aka Editable Text and Image OCR) and you don't need any newer features, then the 2015 version should be OK for you.
@koritz123
@koritz123 6 жыл бұрын
Digitize Your Books I found something online about the new version of Adobe Acrobat Pro DC 2017 being able to resize or rescale pages to be more easily read in Kindle and other e-reader software so that being the case I like the idea of being able to resize a PDF other than just cropping off the white part of the perimeter. So this goes along with your comments about the 2017 version having potentially more features that may be useful than maybe an Antiquated version that's a few years old.
@gr3yg0at
@gr3yg0at 5 жыл бұрын
Great video. When I followed this for a pdf I have, 578 pages, the final file size is more than double the original file size. It started out as a 43 mb file, did the first OCR "searchable" text file and the file size was reduced to 31mb. One the next OCR "editable" text the final file size jumped to 116mb. That doesn't seem right. I have gone through the process twice with the same results. Any ideas?
@DigitizeYourBooks
@DigitizeYourBooks 5 жыл бұрын
Hmmmm, that is a real head-scratcher. I've not seen that before. Perhaps there is something unusual about this book? Perhaps lots of diagrams, that are more difficult to convert to "editable images"? Or maybe lots of weird fonts? Just a guess. This reminds me: I wish Adobe gave the option for "editable text" without also trying to create "editable images". I never want the images made to be editable, and have found for some engineering texts it really messes up parts of some images when making them editable (in dangerously subtle ways). The good news: because you followed my 2-step solution, you now ditch the 2nd (larger) version knowing that it is needlessly too big. That is why I always do it as a 2-step: in case there is a problem with the "editable text" version. Cheers!
@gr3yg0at
@gr3yg0at 5 жыл бұрын
This book does have a lot of pictures and illustrations. Since you have mentioned them I am starting to think that is whats causing this issue. My engineer brain is thinking there must be a way to exclude the images. Now I know how Im spending my weekend.
@wjhyde
@wjhyde 5 жыл бұрын
Thank you for this video. Very helpful.
@DigitizeYourBooks
@DigitizeYourBooks 5 жыл бұрын
Glad it was helpful. Have fun digitizing your books!
@UncleMatte
@UncleMatte 5 жыл бұрын
I tried to contact you through twitter, what a nightmare, it endless looped me to "how to" do this and that, but no way to sign up and fix things! A bit dizzy, I'll try again later. Either way your help is GREATLY appreciated?
@UncleMatte
@UncleMatte 5 жыл бұрын
Before I start going farther down the "Rabbit Hole". I have an older Acrobat 7 Pro version. Is it worth it for me to spent ($100 to $150) for a much newer version of Acrobat? Thanks!
@DigitizeYourBooks
@DigitizeYourBooks 5 жыл бұрын
I would only suggest upgrading software if your current software is missing a feature you need. Specifically, I personally MUST have “Editable text and images” feature, as explained in this video. That feature has had different names in previous versions, and I don’t know whether or not v7 has that feature. If it does (by any name) then probably no need to upgrade. Cheers.
@UncleMatte
@UncleMatte 5 жыл бұрын
@@DigitizeYourBooks Thanks, I "Upgraded to Ver XI. A little different, but I'll get the hang of it eventually. Thanks for your advice.
@lkj234
@lkj234 6 жыл бұрын
Great content! Keep up the good work!
@DigitizeYourBooks
@DigitizeYourBooks 6 жыл бұрын
Glad you liked it! Thanks for commenting.
@gr3yg0at
@gr3yg0at 4 жыл бұрын
I just finished digitizing another book. As I started to read through it I noticed Adobe Acrobat had changed some of the words. I compared it to the original scan and the paper book itself and confirmed words were being changed. What I have found is words are being changed during the step when changing text with the editable text and image option. I'm curious if anyone else is seeing this happen.
@DigitizeYourBooks
@DigitizeYourBooks 4 жыл бұрын
I haven't noticed this but it may be possible. I have noticed where engineering graphs and drawings get modified during the "Editable text and Images" process. It is for this reason that I first do a conventional OCR, and then repeat the OCR using "Editable Text" -- just in case the "Editable" process messes up. My guess for what is happening: OCR is not perfect. And when using "Editable Text" method, the image of the word(s) is replaced by the OCR result. So if there is an error in the OCR, that error is now "baked in" the final text. Thanks for commenting. Cheers!
@gr3yg0at
@gr3yg0at 4 жыл бұрын
@@DigitizeYourBooks I'm also curious why using the "editable text" more than doubles the file size. My book went from 22mb to 88mb. The book did have a lot of images and I wonder if this is whats causing the jump in file size.
@DigitizeYourBooks
@DigitizeYourBooks 4 жыл бұрын
Another KZbin viewer had the same issue. I suspect you are correct: images seem to be not handled well with Acrobat's "Editable Text and Images" option. I really, REALLY, wish Adobe would give us an option for "Editable Text" which doesn't try to make the images editable. In addition to file size growing, I have seen it mangle the images, but so subtly that it isn't obvious -- truly dangerous for an engineering textbook. That is the main reason that my process is two-step: (1) regular OCR and (2) Editable Text/Images OCR. Hope this info helps. Cheers!
@DanielRamos-zx1kh
@DanielRamos-zx1kh 7 жыл бұрын
Hi Peter, do you have any way of private message you?
@DigitizeYourBooks
@DigitizeYourBooks 7 жыл бұрын
I am on Twitter @PeterMosier you can follow me, and then DM me there if you like. What did you want to talk about?
@DanielRamos-zx1kh
@DanielRamos-zx1kh 7 жыл бұрын
I just tried to DM you at Twitter but I only can If you follow me. Anyway I Just OCRed a scanned book, but there are some texts that aren't recognized by the OCR. Look here: i.imgur.com/avlUkD4.jpg Do you know how to get recognized these texts?
@DigitizeYourBooks
@DigitizeYourBooks 7 жыл бұрын
I suspect the problem is poor contrast. That is, instead of black and white (the normal for most books) your example had light grey text on a non-white background. From my experience, poor contrast confuses OCR. You can try playing with the contrast settings in your scanning software to try to increase the contrast so that it works. However, you might not be able to ever get it to OCR correctly, especially for the very light grey text.
@DanielRamos-zx1kh
@DanielRamos-zx1kh 7 жыл бұрын
Digitize Your Books Thanks for your response! And you know how to type that specific part manually?
@DigitizeYourBooks
@DigitizeYourBooks 7 жыл бұрын
Manual corrections may or may not be possible, depending on which software you are using.
@daithiocinnsealach1982
@daithiocinnsealach1982 5 жыл бұрын
That book Voodoo Science looks interesting, but the cover is awful. I''m even more shocked to see it's an Oxford Press book. It looks like an attempt by an amateur self-publisher, rather than a professional cover made by one of the largest and most prestigious publishers in the world...
@DigitizeYourBooks
@DigitizeYourBooks 5 жыл бұрын
Agreed: not a very impressive cover on this book. Contents are interesting, though.
Unlocking Efficient Document Scanning with NAPS2: A Comprehensive Guide
29:10
Enhancing Scanned PDF Accessibility
6:56
CITS Instructional Development @ UMassD
Рет қаралды 13 М.
Миллионер | 2 - серия
16:04
Million Show
Рет қаралды 1,9 МЛН
СОБАКА ВЕРНУЛА ТАБАЛАПКИ😱#shorts
00:25
INNA SERG
Рет қаралды 1,8 МЛН
小丑揭穿坏人的阴谋 #小丑 #天使 #shorts
00:35
好人小丑
Рет қаралды 38 МЛН
the balloon deflated while it was flying #tiktok
00:19
Анастасия Тарасова
Рет қаралды 35 МЛН
How to Turn a Physical Book into a Searchable PDF
7:56
Learning and Technology with Frank
Рет қаралды 108 М.
Acrobat DC: Enhancing Scanned Page Contents
3:40
Acrofacts
Рет қаралды 63 М.
Cleaning up a scanned page in Photoshop
8:41
Keep Silence
Рет қаралды 74 М.
Adobe Acrobat Pro the best OCR for your scanned books
3:25
Digitize Your Books
Рет қаралды 15 М.
How to Convert a Physical Book into Microsoft Word
2:24
Learning and Technology with Frank
Рет қаралды 18 М.
Миллионер | 2 - серия
16:04
Million Show
Рет қаралды 1,9 МЛН