Types of PDF - Computerphile

  Рет қаралды 129,174

Computerphile

Computerphile

2 жыл бұрын

"Just send me a PDF!" - but what kind of PDF? As Professor Brailsford explains, PDF is simply a wrapper which can contain a variety of joys!
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Пікірлер: 396
@isaac10231
@isaac10231 2 жыл бұрын
Life goal - finding something to be as passionate in life as this man is about crispy text.
@skuzzbunny
@skuzzbunny 2 жыл бұрын
crispy text is the best!!!!!D
@unlokia
@unlokia 2 жыл бұрын
CRISP, *_not_* "crispy". This is a silly error that seems to be propagating net-wide +as usual we can blame the yanks!!+ A brand of creme donuts' products are named "crispy", images and text are *CRISP!!*
@CJT3X
@CJT3X 2 жыл бұрын
@@unlokia no need to be so crispy ‘bout it
@DryPaperHammerBro
@DryPaperHammerBro 2 жыл бұрын
@@skuzzbunny {o{obi,l. K.l k I 98xd
@kokoinmars
@kokoinmars 2 жыл бұрын
Crispy text is nothing to scoff about.
@martinbean
@martinbean 2 жыл бұрын
Imagine saying something as innocuous as “I’ll send you a PDF” to this guy and then getting a 2-hour lecture in response…
@FriedEgg101
@FriedEgg101 2 жыл бұрын
Maybe you could cut the lecture short by following up with "it'll be PDF Normal".
@erwinmulder1338
@erwinmulder1338 2 жыл бұрын
Professor Brailsford can lecture me all day.
@michaeldamolsen
@michaeldamolsen 2 жыл бұрын
That would be the best day of the month for sure!
@swiftfox3461
@swiftfox3461 2 жыл бұрын
I'd listen closely and turn off my phone to make sure I didn't miss anything.
@amicaaranearum
@amicaaranearum 2 жыл бұрын
Professor Brailsford definitely made this video in response to receiving a low-quality PDF scanned from a photocopy.
@sedawk
@sedawk 2 жыл бұрын
“I asked someone to send me a PDF and all I got was this lousy bit map” - would make a great t-shirt.
@SomethingUnreal
@SomethingUnreal 2 жыл бұрын
Complete with blocky JPEG artifacts all around the text, of course!
@frankharr9466
@frankharr9466 2 жыл бұрын
Don't tempt me.
@naughtiusmaximus789
@naughtiusmaximus789 2 жыл бұрын
Grand Theft Auto : Vice City 100% completion reward
@StraightOuttaJarhois
@StraightOuttaJarhois 2 жыл бұрын
What PDF says to me isn't quality, but uniformity, as in it'll look the same no matter what device or software you're using to view it, even if it's a sheet of paper instead of a screen. (I know this isn't actually the case, but as I understand it, it's how it _should_ work.) So when I get a PDF, I trust that each line and character is exactly where it's supposed to be, and not shifted due to text reflow or different fonts or whatever. From that perspective it doesn't matter if it's using razor sharp vectors or blocky bitmaps.
@max15half
@max15half 2 жыл бұрын
Well, you could be reasonably sure that a bitmap will not misplace your lines and characters.
@StraightOuttaJarhois
@StraightOuttaJarhois 2 жыл бұрын
@@max15half Sure, but there are other qualities of bitmaps that make them less than ideal for text. PDF has the same advantages as other document formats while feeling more trustworthy than, say, a .doc or a .html, even if they're not always used to the fullest.
@Platoqp
@Platoqp 2 жыл бұрын
I think that is how it started too. That said, if a professor asks for a PDF, it is a decent implication for some layout
@hirmuolio
@hirmuolio 2 жыл бұрын
@@max15half But how are those bitmaps viewed by the receiver? Numeric ordered images but reader tries to open them in alphabetical order, size order or age order (whatever is the default on their image viewer). Varying image sizes and the image viewer scales them in stupid ways. PDF is still good system even if the content is just bitmaps. It keeps them all in correct scale and order.
@ccreutzig
@ccreutzig 2 жыл бұрын
@@hammerhals These days, not everything in PDF is "statically linked." Many PDF viewers, including Acrobat, have a JavaScript engine, and for the modern type of PDF forms, where you may be able to add table rows etc., you kind of need that. That in turn means some people embed code in their PDF to, say, render animations etc.
@StevenSeiller
@StevenSeiller 2 жыл бұрын
🤓me before video: "Finally time to learn the differences between PDF/X, PDF/E, and PDF/A!" 🤷‍♂️me after video: "Where is PDF(FTG), PDF(I), or PDF(I+HT) in my Adobe Save As...???"
@greatquux
@greatquux 2 жыл бұрын
Brailsford’s eyesight is better than mine, he can use xterm at the default font size!
@randalthor17
@randalthor17 2 жыл бұрын
69 likes, nice
@mastertacosmith
@mastertacosmith 2 жыл бұрын
This man needs a 40” ultrawide so he can truly enjoy a good typeface at scale
@thuokagiri5550
@thuokagiri5550 2 жыл бұрын
How much we missed prof Brailsford
@mikefochtman7164
@mikefochtman7164 2 жыл бұрын
Reminded of a similar issue we had with old mechanical, piping, and electrical drawings, the kind that were literally 'blueprints'. They had been photographed onto microfische and the originals worn out/lost. Taking the microfische cards and having them scanned (causing even more loss of quality). Then a team of graphics artists would import the scanned image as 'background' into a modern drafting tool and literally 'trace' over each marking on the original. This basically re-drew the drawings using the scanned background image as the template. The final step was to 'hide' the background and voila! A modern, vector drawing that was searchable and could be manipulated with modern tools. If anyone suspected a mistake in the redrawing, we would 'unhide' the background to look at the scanned image, or even go back to the microfische (we kept a 30-year-old viewer on hand). I forget how much that cost, but it was about 3 graphics artists working over a year to do several hundred drawings. :(
@ToSMaster12345
@ToSMaster12345 2 жыл бұрын
I was smiling in total bliss throughout the video! Finally I feel understood! This is the reason why I write all my documents in LaTeX and using vector images for figures that have embedded text! So that even the scalebar and axis labels in my plots can be selected or searched via text! Reject Bitmap! Embrace PDF-FTG! :D
@carlosmspk
@carlosmspk Жыл бұрын
I mean, anyone wtih academic background would understand you
@IIARROWS
@IIARROWS 2 жыл бұрын
I got worse: an Excel sheet with a picture pasted inside it. And not a picture of a table, a screenshot of the application I was working on.
@olik136
@olik136 2 жыл бұрын
my architectural software has a library folder with a drawing file that contains a screenshot of that library folder telling you that certain files are hidden and can only be found with windows explorer...
@recklessroges
@recklessroges 2 жыл бұрын
I'll send you a screen-shot of that in an HTML email ;-) /s
@david.mcmahan
@david.mcmahan 2 жыл бұрын
I once had a client take a screenshot of their full desktop (with an opened PDF among many windows), paste it into a Word doc., crop it down to just a signature graphic, and then scale it back up because the signature was too small. This was their method of "extracting" the signature image from a PDF. Fair enough, but it was because they wanted the version of the signature we had already cleaned up to look better in print.
@JNCressey
@JNCressey 2 жыл бұрын
@@david.mcmahan, can whoever they give the Word document to tell Word to show the full image to see everything they had open in the screenshot?
@david.mcmahan
@david.mcmahan 2 жыл бұрын
@@JNCressey Yes, I could see everything they had opened on the screen. There was nothing bad, but it could have been a security incident.
@1337Unlucky
@1337Unlucky 2 жыл бұрын
He clearly has strong views on PDFs, it's funny because it reminds me of me but explaining formats for photography and how to preserve quality. God i hate when they send photos via social media without using .zip or .rar and all the photos gets ultra compressed. It's not only about photos and not only about PDFs, I understand the man, it's about PRESERVATION. The world needs to understand better formats and ways to preserve content. I just love this man.
@ZaneDaMagicPufferDragon
@ZaneDaMagicPufferDragon 2 жыл бұрын
💯 Preservation!!! I’m a Preservationist At Heart ❤️😉
@LordMegatherium
@LordMegatherium 2 жыл бұрын
If it's about preservation then rar should be out of the picture because it's a closed format. It's unlikely that we won't be able to open them in 50+ years especially since we have a libre decompression implementation but the point still stands.
@Entertainment-
@Entertainment- Жыл бұрын
That's why I love Telegram, it does the compression too, but it also allows you to send pictures or any file for that matter in it's original size
@nikolayrayanov2895
@nikolayrayanov2895 2 жыл бұрын
This is gold. I've tried to explain to people at work about different types of PDFs for years.
@jlivewell
@jlivewell 2 жыл бұрын
Every time I watch a video by Dr. Brailsford, Phd, I add a new life regret …. That I didn’t meet him when I was 17 and learn everything from him.
@jackkraken3888
@jackkraken3888 2 жыл бұрын
With someone like him you can never learn everything.
@noferblatz
@noferblatz 2 жыл бұрын
This professor is positively the best you feature. His enthusiasm and his ability to explain complex technical concepts in a simple way is unmatched.
@drskelebone
@drskelebone 2 жыл бұрын
I'm in a completely different field, and when the Professor states "if you want a straight line, you just say Line()" he is 100% talking to my soul and speaking the truth I have wanted to shout into so many faces. ty!
@Sam-th4jl
@Sam-th4jl 2 жыл бұрын
i think i could listen to him talk about literally anything and find it interesting just because of his delivery
@kasamikona
@kasamikona 2 жыл бұрын
Prof Brailsford you're a very brave man pronouncing PNG as "ping" around these parts...
@TheAstronomyDude
@TheAstronomyDude 2 жыл бұрын
How does post office OCR work? Sorting centers read the address off an envelope in a fraction of a second and they've been doing it for decades; long before Adobe.
@666Tomato666
@666Tomato666 2 жыл бұрын
fundamentally the same technology, but they have the benefit that the address is highly redundant; can't read the full postcode? check the city and street name
@bluedeath996
@bluedeath996 2 жыл бұрын
Combined with a very standardised way to format addresses. There is also a "lost letter" centre where a person decodes things the OCR can't read, but newer tech is better at the job.
@the_lenny1
@the_lenny1 2 жыл бұрын
@@666Tomato666 yeah, and on top of that the most important information is the postcode, which is only numbers.
@deansundquist9601
@deansundquist9601 2 жыл бұрын
The strive for excellence in typesetting is very noble. As always, thanks for the wonderful content Prof. Brailsford.
@m47h4r
@m47h4r 2 жыл бұрын
This was a joy to watch! I respect people like him very much. Being genuinely interested in something and actually putting the time in to learn about its ins and outs. Never mind the fact that he uses Linux with a bunch of open terminals, that's just the cherry on top!
@RhinoBlindado
@RhinoBlindado 2 жыл бұрын
Prof B looking quite dapper today. Loved the video!
@harshjinger
@harshjinger 2 жыл бұрын
Thanks... I rely on open source information to learn about computer based things that occurred even before I was born. Recently, I was looking into this exact question for a project of my own, And this is a perfect resource. I have never used Adobe's official softwares, being a novice ungrad student besides being broke, this serves as a great reference. Thanks a lot again...
@YingwuUsagiri
@YingwuUsagiri 2 жыл бұрын
As someone in an administrative job when someone says send me a PDF they mean "any quality yet not easily edited". Invoices for example are never allowed to be easily editable like Word or Excel (and yes that happens often enough). If they want infinitely scalable they'll ask for a Vector and if they want something that's super sharp made in InDesign etc. they'll ask for an INDD. In my almost decade of working in administrations PDF just means can't be edited (easily, because I am very well aware that you still can somehow).
@Starguy256
@Starguy256 2 жыл бұрын
I edit PDFs every day in my work. Sometimes our software prints the wrong thing and instead of going in and trying to fix it, just edit it on the PDF before you send it. As long as it's FTG (as anything not produced by a photocopier should be) you just hit "Edit PDF" in Acrobat.
@lawrencedoliveiro9104
@lawrencedoliveiro9104 2 жыл бұрын
The irony is that using vector graphics and actual text objects make it easier to edit the PDF file. The hardest type to edit is the one where every page is a bitmap.
@balmar3
@balmar3 2 жыл бұрын
Yesss! Professor is using Alpine, one of the best emailers out there. You should make some videos on the awesome power of terminal-based utilities.
@JNCressey
@JNCressey 2 жыл бұрын
Some interesting wierd things I've encountered with PDFs: 1. I remember some time last year I copied a JPEG out of a PDF container and found it had a slightly different format than regular JPEGs. I think normal JPEGs have the word "JFIF" at the beginning of the file but I think this had something else maybe "ADOBE" through I don't exactly remember, could have been a different word. 2. Just today I found out there are two options to save a pdf from Microsoft edge. "Save as PDF" vs "Microsoft print to PDF", and the "Microsoft print to PDF" produced a file that was significantly larger and slower to load when viewing. 3. some PDFs I've seen allow you to search and select text, but don't let you copy or print. I think it's called "secured PDF". I'm not sure why PDF viewers from companies other than adobe would respect those restrictions. Is there something in the file that fundamentally makes these actions impossible or does it just ask the program to disallow them?
@neumdeneuer1890
@neumdeneuer1890 2 жыл бұрын
Response to point 3: Yes, the PDF just asks nicely to not allow copying. There are no technical restrictions and more then enough programms which ignore such requests.
@hanelyp1
@hanelyp1 2 жыл бұрын
And a fair selection of the software you could use to read the open format PDF is open source. If such software did pay attention to a "no copy" flag it would be possible to alter the software to ignore it.
@PhilReynoldsLondonGeek
@PhilReynoldsLondonGeek 2 жыл бұрын
The only real *problem* with PDF is that many organisations provide you with their forms as images. If they could be done as proper forms it would be far easier to actually use them.
@turpialito
@turpialito 2 жыл бұрын
But isn't it that it's not actually a PDF problem, but rather people not using the proper PDF generator; in this case Adobe Forms (which AFAIR is bundled with Acrobat)?
@ophello
@ophello 2 жыл бұрын
This isn’t a problem with PDF. It’s a problem with organizations.
@geirtwo
@geirtwo 10 ай бұрын
I wish this channel had more satisfying visuals.
@magacacciari3565
@magacacciari3565 2 жыл бұрын
Huge fan of Professor B and his computer lores.
@DaimlerSleeveValve
@DaimlerSleeveValve 2 жыл бұрын
It surprised me that for the last couple of years, Google has been running OCR on the contents of PDFs which contain only images. I've located names mentioned only on signs visible in the backgrounds of pictures of something else.
@johnno4127
@johnno4127 2 жыл бұрын
The searchable nature of image and hidden text or (image with text replaced by an actual font) is fantastic! . The vast quantity of extra spaces and line returns can get frustrating when trying to use that OCR text, though. It's also a pain when adobe put a random space in the middle of a word or between EACH LETTER and now you can't find what you're looking for.
@TheFakeVIP
@TheFakeVIP 2 жыл бұрын
I feel it bares also pointing out that correctly type-set text in PDF files that is reproduced from a font, not a bitmap, significantly increases the accessibility of such documents for people who use assistive technologies such as screen readers. PDF files are often ripped to shreds by the blind community for this exact reason. Even correctly produced PDFs that are, for instance, produced from a word processor, often cause problems for screen readers depending on how the text is drawn, and the competency of the software to add accessibility hints where appropriate. A common example of this is text in columns: quite often assistive technologies don't expect this, and so read it linearly (I.E. they read both columns at once). Properly tagging important landmarks such as headings can also be a great help, as screen reader users frequently navigate (or even summarise) a document simply by jumping between headings.
@williamchamberlain2263
@williamchamberlain2263 2 жыл бұрын
Yes
@lawrencedoliveiro9104
@lawrencedoliveiro9104 2 жыл бұрын
DJVU format deals with this by storing searchable text objects which are not rendered, separate from the actual page rendering. I think PDF allows this also.
@mickjames73
@mickjames73 2 жыл бұрын
Pdf variability is very frustrating for blind or low vision people. You would often receive a document of instruction manual which was rendered as an image only and we used to have to print, rescan and ocr them (often quiite tricky with complex page layouts). Luckily there is now a fairly accurate builtin ocr engine in things like acrobat reader. The other issue with pdf variantion is many pdf dont confirm to standards for accessibility and thus become unusable, or difficult, when viewed with accessibility features turned on.
@Jebusankel
@Jebusankel 2 жыл бұрын
I was frustrated recently that my auto insurance documents are all in bad bitmap PDF format. But if I complain to them and claim to be blind, I think they'll have some follow up questions. 😜
@jorisschellekens4630
@jorisschellekens4630 2 жыл бұрын
The way most PDF libraries or programs handle OCR is by something the spec calls "optional content groups". Optional content groups allow you to mark any content in the pdf content stream with a particular tag (typically the layer name). Programs like Adobe will then show you a listing of all the layers. So you could imagine being able to toggle OCR on and off.
@okusa7750
@okusa7750 2 жыл бұрын
Feel like David Attenborough just lectured me about the types of PDF. Amazing passionate storyteller
@ajayrangishetti5515
@ajayrangishetti5515 2 жыл бұрын
Please do a video on explaining Pentium processor architecture, and about how multi-core processor perform out-of-order execution.
@DrSteveBagley
@DrSteveBagley 2 жыл бұрын
We’ve done out of order execution
@ajayrangishetti5515
@ajayrangishetti5515 2 жыл бұрын
@@DrSteveBagley thankyou I got it!!👍
@Baxtexx
@Baxtexx 2 жыл бұрын
Urg this reminds me of a software I was working on that was consuming pdfs and rebranding them. There were so many edge cases all the time!
@tjarko72
@tjarko72 2 жыл бұрын
I always tought that PDF(ftg) was closely related to postscript, I would have expected a mention of postscript. More mordern, also PDF/A.
@ZedaZ80
@ZedaZ80 2 жыл бұрын
PostScript is lovely
@nezZario
@nezZario 2 жыл бұрын
It is.
@squishmastah4682
@squishmastah4682 2 жыл бұрын
"[PDF] covers a multitude of sins." Yes. Especially at Hustler Magazine.
@Richardincancale
@Richardincancale 2 жыл бұрын
Do you remember desk-top search engines? I used to test them by hiding the word ‘marmalade’ in a PowerPoint in a zip file to test their ability to find and index text :-)
@ShankarSivarajan
@ShankarSivarajan 2 жыл бұрын
Did that work?
@CJT3X
@CJT3X 2 жыл бұрын
You mean like an early version of Spotlight/Alfred?
@Richardincancale
@Richardincancale 2 жыл бұрын
@@CJT3X I recall that both Altavista and Hoogle had desktop indexing tools. Yes it worked and found my hidden marmalade!
@Richardincancale
@Richardincancale 2 жыл бұрын
@@ShankarSivarajan Yup
@Gnsdtc
@Gnsdtc 2 жыл бұрын
This is beautiful. The OCR version is PDF I+HT!
@adrianalexandrov7730
@adrianalexandrov7730 Жыл бұрын
That's kinda how djvu worked: saving text as a high detailed foreground and compressing background. That was miracle how scanned hundreds of pages book could fit into just a few Mb
@Yupppi
@Yupppi 2 жыл бұрын
I see new computerphile with prof. Brailsford's face and my week is immediately better. I even got to walk inside his home a little bit this time! After seeing bad photocopies of 80's device manuals, I too can get behind their obsession about pdf quality. Even the manufacturer's archives has that poor photocopy and the original pront could've been subpar.
@Graham_Rule
@Graham_Rule 2 жыл бұрын
The photocopier/scanner at work can scan to PDF/A which generates searchable text by doing OCR. Being internet enabled it can then send a copy by email (possibly bcc'd to Xerox or other third parties without our knowlege).
@lablnet
@lablnet 2 жыл бұрын
Nice love to see more video's like these
@SeanBZA
@SeanBZA 2 жыл бұрын
Also different types of PDF creator gives different file size outputs. Firefox PDF is massive, often bigger than the original, as it is a PDF of the page as it would be sent to the printer, but the PDF output from Debian is a lot smaller, just a file with the fonts and text, as the original document had.
@delhatton
@delhatton 2 жыл бұрын
OCR for pure text. Maybe OK. It will still require editing. OCR for numerical data, like some Excel sheets, by the time you've verified all the numbers, you might as well have retyped it.
@MartinOmander
@MartinOmander 2 жыл бұрын
Excellent video! I have a request for future videos: please consider keeping the camera still if the subject is stationary. The shakycam effect unfortunately made me seasick and distracted from the professor's excellent performance.
@AleksyGrabovski
@AleksyGrabovski 2 жыл бұрын
Can you also do a video on DJVU format?
@MrBoubource
@MrBoubource 2 жыл бұрын
My internship topic is to find the paragraphs containing some keywords in a pdf with 4 different formatting depending on its provider. I am beginning to hate it.
@DT-dc4br
@DT-dc4br 2 жыл бұрын
Might be a job for a Linux shell script with awk / grep & sed
@MrBoubource
@MrBoubource 2 жыл бұрын
@@DT-dc4br I went with python (and regex's) because I'm most familiar with it... But holy what a mess it is to covert pdf to html and plain text..
@etziowingeler3173
@etziowingeler3173 2 жыл бұрын
Hahaha I can imagine
@jashaswimalyaacharjee9585
@jashaswimalyaacharjee9585 2 жыл бұрын
I am totally convinced that Prof. Brailsford uses this machine 9:58 as his occasional-use Computer. What Peeping Toms like me can observe, there's Alpine 2.21 (fairly latest software compared to the system)
@jashaswimalyaacharjee9585
@jashaswimalyaacharjee9585 2 жыл бұрын
@ComputerphiIe KZbin Hahaha!
@iabervon
@iabervon 2 жыл бұрын
Midway through the video, I was distracting by recognizing that Professor Brailsford uses the same program for email that I do. I often solve crossword puzzles that I get as PDFs, and it's interesting to see whether the program that made the PDF put the text of the clues in the logical order that you'd read them, or if it went top to bottom, left to right, ignoring columns.
@zombiegeorge749
@zombiegeorge749 2 жыл бұрын
2:42 whats up with the edges of the screen?
@Computerphile
@Computerphile 2 жыл бұрын
if you read the small text on the "newspaper" it helps explain it a little :) -Sean (basically I rotated it a little to fix my wonky camerawork and missed zooming it in)
@pierreabbat6157
@pierreabbat6157 2 жыл бұрын
Many of my programs output PostScript, which can be converted to PDF. I've seen many PS files get bigger when converted to PDF; I just checked one which is 4.5 times as big in PDF as in PS. I also once wrote a PS file using the random number generator and converted it to PDF. The converted file lost the randomness. I'm a surveyor and download maps in PDF from register of deeds sites. The old ones are scanned, of course. But the ones drawn with CAD are, I think, also scanned. They should be taken from the PDF output of the CAD program, except that the signature is written on paper (or clear plastic sheet), which poses a problem. Digitizing the numbers from a printed copy of the plat can result in illegible numbers (is that a 6, an 8, or a 9?).
@saranchance5650
@saranchance5650 2 жыл бұрын
Pdf has additional accessibility features that the variants you described make possible
@henke37
@henke37 2 жыл бұрын
Fun fact: the pdf format is so complex that it literally includes functionality for executing arbitrary shell commands. As a feature.
@No0utlet
@No0utlet 2 жыл бұрын
At 2:30, it appears that the video of Prof. Brailsford is overlaying a video of the paper on his table and is rotated a very slight amount. Are there any video editors out there that could explain how that might happen by accident?
@soccerox817
@soccerox817 2 жыл бұрын
Exactly why I cant stand when people just ask for a PDF or send a poorly rendered pdf. Gotta write documents in LaTex and export a quality PDF
@peterwhitey4992
@peterwhitey4992 2 жыл бұрын
LaTex is overrated.
@miran248
@miran248 2 жыл бұрын
@@peterwhitey4992 Wouldn't say overrated, but maybe an overkill in most cases. Something like markdown should be more than enough for simple stuff (w/o math equations, ..)
@peterwhitey4992
@peterwhitey4992 2 жыл бұрын
@@miran248 - I know it's practical to write in, but it's the result that I find overrated. You can always tell when a paper/book is written in LaTex. They all look the same. Especially textbooks written in LaTex are generally not very good.
@Platoqp
@Platoqp 2 жыл бұрын
@@peterwhitey4992 It is excellent for writings that include mathematics and other scientific formulas
@michaelb2047
@michaelb2047 2 жыл бұрын
@@peterwhitey4992 I would say most natural science textbooks are written in latex. You can change everything so you won’t notice that it was actually written with latex. You notice it only if they use the default template / font. Also they are often much cleaner / more consistent than „Word“ books for example.
@marsgal42
@marsgal42 2 жыл бұрын
In a past life I did a lot of work with PostScript and one product we developed was a PostScript sanitizer that would take any deranged PostScript you threw at it and output well-behaved well-structured PostScript suitable for further processing. We got the idea from generating PDF then printing it to a file with Adobe's PostScript printer driver.
@superfluidity
@superfluidity 2 жыл бұрын
If you can, don't just aim for the highest quality that your audience demands - aim for quality far beyond that. That will give you more freedom to rework the document later if you want to.
@lawrencedoliveiro9104
@lawrencedoliveiro9104 2 жыл бұрын
12:03 It looks like a scan that has been quantized into a bilevel (black and white only, no greys) bitmap. Those little hairy extensions on the edges are characteristic of that.
@TimothyWhiteheadzm
@TimothyWhiteheadzm 2 жыл бұрын
Expecting a certain quality of content from the pdf format is as ridiculous as expecting quality content on a web page. A container is just that. It can contain flowers, or manure. As for the OCR feature, that is great, but one wonders if that is part of 'pdf' or part of the tool that creates the pdf?
@harshjinger
@harshjinger 2 жыл бұрын
Idk... About this... I would love to know more... Commenting for any followups
@majorgnu
@majorgnu 2 жыл бұрын
It's a feature of the software that produced the PDF, obviously. Even if the format was extended at some point with features that facilitate this kind of use, the file itself still only contains the *result* of the OCR process, which was performed by whatever applications were used to produce it.
@drawapretzel6003
@drawapretzel6003 2 жыл бұрын
Well, its not in the free version of adobe reader, thats for sure. Theres lots of free OCR software that can OCR a pdf for you, but yes, its included in the tools for an actual PDF creation software too.
@HetareKing
@HetareKing 2 жыл бұрын
The actual OCRing happens in the creation tool, but this whole notion of having a bitmap overlay invisible text has to be encoded into the file and so the format has to support it. And since this functionality only really makes sense in the context of the OCR feature, I think it's fair to say it's part of "PDF".
@JNCressey
@JNCressey 2 жыл бұрын
I suppose if the creator of the pdf has a bitmap with text that is obviously unOCRable (maybe stylised text) they would manually add the hidden text, getting the same effect but without OCR. Styles that come to mind that OCR wouldn't work well on could be extra objects between the letters (google doodles), people posing in letter shapes (it's fun to stay at the YMCA), drawing just the negative space, bubble text or drawing just the shadows of the text, leaving out lines (E as 3 horizontal lines, A without the horizontal part), or using characters of other alphabets that look similar (like in r/grssk).
@PemboCycling
@PemboCycling 2 жыл бұрын
Didn't Techmoan do a video on the company that was purchased by Adobe for OCR, as they made a text to speach program that the video for Techmoan was covering?
@trollhunter200
@trollhunter200 2 жыл бұрын
You are just awesome Professor. 👍👍👍
@b391i
@b391i 2 жыл бұрын
Awesome as usual 😇
@PswACC
@PswACC 2 жыл бұрын
What software on linux are you using to activate OCR search ability?
@Ice_Karma
@Ice_Karma 2 жыл бұрын
Prof. Brailsford, do you still use PINE, or Alpine? =D (PINE user since 3.87...)
@xelaxander
@xelaxander 2 жыл бұрын
What’s the software Prof. Brailsford is using? I’d really love to search to some older mathematical books.
@ieperlingetje
@ieperlingetje 2 жыл бұрын
4:24 Sean often gets camera settings wrong and things come out blurry, so here's an animation to hide that.
@kakka4462
@kakka4462 2 жыл бұрын
2:31 whole clip is tilted showing background clip of table rug?
@davidgillies620
@davidgillies620 2 жыл бұрын
I primarily generate PDFs with pdflatex, using EPS or PNG for embedded graphics, so I get searchable, arbitrary-resolution output. It looks very nice.
@ZaneDaMagicPufferDragon
@ZaneDaMagicPufferDragon 2 жыл бұрын
PDF FTG FTW 🙌🏻 I LOVE ❤️ PDF AND ITS PROGRESS IS AMAZING 🤩 GREAT VIDEO PROFESSOR 👨🏻‍🏫 BRAILSFORD!!!
@Chobungus
@Chobungus 2 жыл бұрын
Can someone clarify for me, when he is going over the "hideously complex mathematical equations" @ 9:19, he says that you do not want to have to type that out character-by-character. Yet he then demonstrates that he is able to zoom in greatly while preserving quality. So how did he translate the bitmap image to that high quality type set?
@Computerphile
@Computerphile 2 жыл бұрын
In this case that's exactly what the Prof is working on, recreating this important document page by page using similar software to what Dennis would have had available - Professor Brailsford talks about it in a recent video but it has been an almost full time job for him for a while now! -Sean p.s. if you see the two pictures early in this video you'll see that a version of the Thesis Dennis held was damaged but one his friend had reviewed is OK - The damaged one has amendments so this is a difficult task!
@Chobungus
@Chobungus 2 жыл бұрын
@@Computerphile Thanks for the reply! Great video!
@UncleKennysPlace
@UncleKennysPlace 2 жыл бұрын
My day job is assembling documents in PDF format for aviation certification. It's shocking how many engineers send everything as PDF, even bitmaps, when I know they had to convert them, despite instructions saying we can work with any format that their native applications produce.
@bhargavk1515
@bhargavk1515 10 ай бұрын
Sir how do I learn to pdf format encoding, any guide?
@jeromethiel4323
@jeromethiel4323 2 жыл бұрын
I worked for a company, and we had electrical prints that were paper only. We paid a company to generate CAD files of the prints. What they did is insert scans of the paper copy into the CAD software, which isn't what we wanted. They basically screwed us over big time. The whole point of having them i CAD format was so that we could edit the bloody things!
@danielmnet
@danielmnet 2 жыл бұрын
If Prof. Brailsford is explaining I am interested in, it doesn't matter the subject
@bhargavk1515
@bhargavk1515 10 ай бұрын
Can you make a tutorial (or is there a tutorial) on how prof. Brailsford restored the bitmap pdf into pdf encoding...
@Rubrickety
@Rubrickety 2 жыл бұрын
Fascinating video with perhaps the least clickbaity title in history.
@HugoOneYT
@HugoOneYT 2 жыл бұрын
To me PDF is about compatibility, there's a reason why all invoices are PDF, everything can open it
@Smogshaik
@Smogshaik 2 жыл бұрын
I would love a video about the PDF/A format!
@bartas9693
@bartas9693 2 жыл бұрын
It's ok I'll send you a PDF.
@SimGunther
@SimGunther 2 жыл бұрын
Yeah, but what? Image, full, text?
@anarchist
@anarchist 2 жыл бұрын
8:40 4:3 monitor because nothing can throttle Brailsford's brain power. Not PDF but something that tickled when working with TIFFs was a joke it stands for "Thousands of Incompatible File Formats"
@ahmetardaedogan6697
@ahmetardaedogan6697 2 жыл бұрын
Could you explain harris corner detection?
@PhilipStorry
@PhilipStorry 2 жыл бұрын
How do I subscribe to Vague Magazine? If it has high quality reminiscing from Professor Brailsford, then I need a subscription! 😉
@Fre1maurer
@Fre1maurer 2 жыл бұрын
My first PDF was the manual of the flight simulator game TFX back in 1994, it was the re-release budget version without printed manual. There was Adobe Acrobat Reader for MS-DOS on the game CD, and holy crap was the quality of the document bad (and the clumsy Reader itself was not much better). They obviously simply scanned a real printed manual and saved it as images with something like 4-Bit grayscale and the the text sections looked like plain 1-Bit black-or-white without any anti-aliasing. I never thought this text for the poor called PDF could be a thing in the future.
@unlokia
@unlokia 2 жыл бұрын
Prof Brailsworth: The font of all PDF knowledge.
@LoesserOf2Evils
@LoesserOf2Evils 2 жыл бұрын
If you can decompose the PDF into the text and the graphics and then recreate them into a word processing document, that can help. Then drop the document into Adobe Indesign for better and tighter layout. I admit that's a lot of effort, but sometimes it's worth it; and if the PDF standard changes in the future and it's important to produce a new standard, it'll be far easier.
@SteveMacSticky
@SteveMacSticky 2 жыл бұрын
Very well explained
@travcollier
@travcollier 2 жыл бұрын
Do folks not use the terms vector and raster anymore?
@samuelworsnop9983
@samuelworsnop9983 2 жыл бұрын
I really want to know what Professor Brailsford's favourite font is!
@DrSteveBagley
@DrSteveBagley 2 жыл бұрын
Optima I suspect.
@lakompee
@lakompee 2 жыл бұрын
Comic sans
@johnno4127
@johnno4127 2 жыл бұрын
@@lakompee papyrus
@johnholland7497
@johnholland7497 2 жыл бұрын
I'd love to know which software you used to convert the PDF with just bitmaps into one with searchable text. Is it open source?
@igorthelight
@igorthelight 2 жыл бұрын
I know about "ABBYY FineReader PDF" which is not Open Source nor free. Maybe there are others
@beakmann
@beakmann 2 жыл бұрын
There is tesseract
@pedrofurla
@pedrofurla 2 жыл бұрын
How does djvu fits in the great scheme of PDFs?
@oposkainaxei
@oposkainaxei 2 жыл бұрын
4:30 OCR Systems
@power-max
@power-max 2 жыл бұрын
2:30 why is the video askew? why is it when I google that word the search results are askew?
@PhilBoswell
@PhilBoswell 2 жыл бұрын
As to the first, see Sean's answer elsewhere on this page; as to the second, that's a Google Easter Egg.
@peterwhitey4992
@peterwhitey4992 2 жыл бұрын
Isn't it obvious why the results are askew?
@power-max
@power-max 2 жыл бұрын
@@PhilBoswell yeah I know that was just the joke
@sweting
@sweting 2 жыл бұрын
please enable auto-generated captions if you are unable to provide custom captions, removing auto-generated captions when they are automatically provided means that people who need assistance with hearing will have nothing to fall back on
@UnOrigionalOne
@UnOrigionalOne 2 жыл бұрын
One could argue similar points for video.
@John_Fx
@John_Fx 2 жыл бұрын
He barely scratched the surface of the complexity of PDF formats. Didn't even cover PDF/A or why you should never redact a PDF and send out that original file.
@Jebusankel
@Jebusankel 2 жыл бұрын
There is a true Redact function in Adobe Acrobat. You just have to use that instead of drawing a box on top. Ditto on PDF/A though.
@StevenSeiller
@StevenSeiller 2 жыл бұрын
🔆 The request for a PDF should be followed by the question, “What for?” Its intended use will dictate how it should be generated. ⁉️On a related note, isn’t it so fun to be asked for a specific file format by someone who doesn’t know why, nor the necessary specifications, while they assert that you are the one who is making the process complicated by asking so many questions?!? 🤔
@volodyadykun6490
@volodyadykun6490 2 жыл бұрын
4:18 great newspaper
@miran248
@miran248 2 жыл бұрын
.5btc - that's one expensive newspaper :)
@klaxoncow
@klaxoncow 2 жыл бұрын
@@miran248 Or maybe not. Depends how well Bitcoin's doing at the time. Virtual currency, yes. Anchored currency, no.
@rudiklein
@rudiklein 2 жыл бұрын
A great talk, scrolling printer paper and a flashy shirt. What else does a video need?
The Font Magicians - Computerphile
19:31
Computerphile
Рет қаралды 366 М.
Discussing PDF@30 Years Old - Computerphile
14:33
Computerphile
Рет қаралды 94 М.
FOUND MONEY 😱 #shorts
00:31
dednahype
Рет қаралды 8 МЛН
[Vowel]물고기는 물에서 살아야 해🐟🤣Fish have to live in the water #funny
00:53
Glow Stick Secret (part 2) 😱 #shorts
00:33
Mr DegrEE
Рет қаралды 51 МЛН
ОДИН ДОМА #shorts
00:34
Паша Осадчий
Рет қаралды 6 МЛН
What is a Monad? - Computerphile
21:50
Computerphile
Рет қаралды 589 М.
Is DeepFake Really All That? - Computerphile
12:30
Computerphile
Рет қаралды 131 М.
How DNS Works - Computerphile
8:04
Computerphile
Рет қаралды 455 М.
Sleeping Beauty Paradox - Numberphile
15:45
Numberphile
Рет қаралды 310 М.
Why Files Become Bigger in Emails - Computerphile
18:17
Computerphile
Рет қаралды 65 М.
Where did Bytes Come From? - Computerphile
11:31
Computerphile
Рет қаралды 473 М.
Randomness is Random - Numberphile
13:31
Numberphile
Рет қаралды 861 М.
Hacking Out of a Network - Computerphile
25:52
Computerphile
Рет қаралды 237 М.
Discussing node.js - Computerphile
12:55
Computerphile
Рет қаралды 224 М.
FOUND MONEY 😱 #shorts
00:31
dednahype
Рет қаралды 8 МЛН