Optical Character Recognition (OCR) - Computerphile

  Рет қаралды 190,547

Computerphile

Computerphile

Күн бұрын

OCR isn't just about scanning documents and digitizing old books. Explaining how it can work in a practical setting is Professor Steve Simske (Honorary Professor at the University of Nottingham as well as Director & Chief Technologist at HP Labs' Security Printing Solutions)
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscom...
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Пікірлер: 220
@JeaneAdix
@JeaneAdix 7 жыл бұрын
More of this guy please. He's a genius
@ahmedal-attar2393
@ahmedal-attar2393 6 жыл бұрын
its a must
@retop56
@retop56 7 жыл бұрын
This guy is a beast with mental math. Also I just tried this with the Google Translate app and it's incredible. It just translates text almost instantaneously. So cool!!!!!!!!!
@marcofuhrer3872
@marcofuhrer3872 6 жыл бұрын
Well it looked fast paced, and the 3.85 is correct, but divided by 5 is not .79 . Otherwise very nice video with dense information.
@SuperYtc1
@SuperYtc1 3 жыл бұрын
A lot of the math is wrong/slightly off.
@micmacha
@micmacha 2 жыл бұрын
This is probably the most fluid explanation of OCR that I've gotten, thank you!
@AfonsodelCB
@AfonsodelCB 7 жыл бұрын
the world needs more people like this man, able to explain so much in so little time
@srdjadzogaz286
@srdjadzogaz286 7 жыл бұрын
now this is excellent video
@ComputersRULE
@ComputersRULE 7 жыл бұрын
Was it just me or is the audio not synced with the video?
@chewyfruitloop
@chewyfruitloop 7 жыл бұрын
PmMeInsteadOfReplyI'mSwitchingChannels looks like the start is repeated in the middle, but the audio is fine
@snowballeffect7812
@snowballeffect7812 7 жыл бұрын
Excellent guest and video.
@ASHWIN7575
@ASHWIN7575 3 ай бұрын
Detailed explanation and couldn't expected anymore. I would like Professor Steve Simske to explain OCR again now. given the improvements in the field of Deep Learning
@karranz
@karranz 6 жыл бұрын
it's amazing when you a video featuring someone that knows what he is talking about. Really masterful
@RobertT1999
@RobertT1999 7 жыл бұрын
I've always looked at OCR as a way to be able to recreate a text document that is printed out when a virtual copy doesn't exist rather than it being a form of compression. That's a very good way of looking at it.
@unbreakablefootage
@unbreakablefootage 7 жыл бұрын
lool 0:06 i love how the word "Recognition" changes to "recognition" and then formats a little further
@RobertT1999
@RobertT1999 7 жыл бұрын
unbreakable footage That's some mighty fine editing. I didn't even notice that until you said.
@BlackHoleForge
@BlackHoleForge 7 жыл бұрын
I appreciate all the little tid bits, and the basics of who did what, which let's us be where we are today.
@JustinWarkentin
@JustinWarkentin 7 жыл бұрын
I like this guy, but I want to know so much more. Bring him back!
@ThatManMelvin
@ThatManMelvin 7 жыл бұрын
the further i get into my Computer Science study, the more of the content these videos i already understand and have implemented once or twice. shows me i actually learned some stuff in uni haha
@simonstrandgaard5503
@simonstrandgaard5503 7 жыл бұрын
Great walkthrough. OCR is hard. I wonder what self driving cars do to recognize signs and texts?
@Rising_Pho3nix_23
@Rising_Pho3nix_23 7 жыл бұрын
Break the image chunk into a 3x3 grid using pixel color and edge detection. Let's say I feed it an R. Using Row then Column and a grayscale to determine stroke density, I get a signature of {1,1,0.5}, {1,1,0.5}, {0.25,0,0.25} Row 1: Column1: 1 = "BDEFMNPRSW" 0.5 = "..." So once you go through the 4 possible grayscales of all 9 segments of the character (36 calculations), you can compute what the character is by asking "What characters fit into 90% of these arrays" This will accommodate handwriting and computer fonts. If more than 1 letter fits, use the zipf algorithm to determine the likelihood of a letter. (Zipf is the theory that the most common letter or word occurs twice as much as the 2nd most common. The 2nd most common occurs 3x as much as the 3rd most common and so on. This is universal across all languages, contexts and extends well beyond the scope of this application.)
@anywallsocket
@anywallsocket 7 жыл бұрын
i just love how this CS dude chops up string theory in a way physicists themselves can't swallow.
@marcmarc172
@marcmarc172 7 жыл бұрын
High quality interview with only one bad string theory reference.
@Htarlov
@Htarlov 7 жыл бұрын
Great video. There are few more problems that good ocr engines solve: - what if characters are broken with gaps? how to connect those? - what if some characters are merged together like in hand writing? - problem of connecting words into lines of text, especially when it is not a scan with straight lines and there are maybe some gaps? - how to correct for perspective and rotation? Those problems add to the complexity of already uneasy task.
@symbioticcoherence8435
@symbioticcoherence8435 7 жыл бұрын
Wow, he was fast at dividing in his head there
@camillolukesch6217
@camillolukesch6217 7 жыл бұрын
Symbiotic Coherence .83 not .82, please.
@MilanNedic94
@MilanNedic94 7 жыл бұрын
Caught my attention, too!
@patrickssj6
@patrickssj6 7 жыл бұрын
The /5 part? It's the same as /10 *2 which can be done pretty fast.
@oOBL4CKH4WKOo
@oOBL4CKH4WKOo 7 жыл бұрын
3.85/5 = 0.77 and not 0.79
@xbronn
@xbronn 7 жыл бұрын
yep thats important
@askmiller
@askmiller 7 жыл бұрын
"I can see that very quickly that this is above .8 In fact it's .82" that was the most fascinating part of this whole video for me. This is edited right? There's no way he just found an average like that in under 5 seconds.
@Flankymanga
@Flankymanga 7 жыл бұрын
One of the best computerphile videos ever! Hats down... thank you!
@djaksonclebergoncalvesfilh9513
@djaksonclebergoncalvesfilh9513 3 жыл бұрын
My God, what a flawless class.
@XSpImmaLion
@XSpImmaLion 7 жыл бұрын
Very nice explainer on an extremely complex subject to talk about with people who don't know how it works...
@steveregan7149
@steveregan7149 7 жыл бұрын
This is really interesting, thank you for the video.
@braggyaggie9750
@braggyaggie9750 7 жыл бұрын
Love the detail, set of applications, and history in this video
@sneaky_tiki
@sneaky_tiki 7 жыл бұрын
No need to apologize for detailed explanations Mr. Simske :D
@_taylor_v
@_taylor_v 7 жыл бұрын
This is a great video; he hit the perfect level of abstraction for this type of video. Going to check out the tesseract github repo, thanks!
@DFPercush
@DFPercush 7 жыл бұрын
I know there's only a certain level of detail you can get into in a 15 minute video, but I'm curious about the part where you come up with the numbers that say how well the characters match. That seems like the key to the whole thing.
@pwarelis
@pwarelis 7 жыл бұрын
Give this man money so he can guide humanity forward
@Linkmat97
@Linkmat97 7 жыл бұрын
Is there a video explaining the basic algorithm behind Shazam? If not, it should be an excellent video to watch.
@mystwalker479
@mystwalker479 5 жыл бұрын
Shazam just basically try to compare the recorded song and their music libraries. It checks if the recorded song has an accurate frequencies for atleast 5 seconds, and from there the program makes a final decision
@animenosekai_edit
@animenosekai_edit 3 жыл бұрын
Btw “stop” signs in France are also red with “STOP” written on them
@AngryArmadillo
@AngryArmadillo 7 жыл бұрын
Fantastic guest! We need more of him.
@louishuynh951
@louishuynh951 5 жыл бұрын
Great explanation! Best overall match score is actually .83 (not .82) but the point is clear anyhow.
@felipeperilla
@felipeperilla 6 жыл бұрын
Great video, guys. Thanks! I'd like to make a couple of questions. Is Abbyy the best option available to deal with OCR in Spanish? What about OCR for handwritten texts? Is there something already available? Should I better use Python and tesseract? I would very much appreciate your help.
@Nexfero
@Nexfero 7 жыл бұрын
OCR-A is the coolest looking font, you should go into fixed width verses proportional fonts
@gamesandstuffs
@gamesandstuffs Жыл бұрын
Would OCR be suitable to put highlight a document requiring redaction due to confidentiality - ie. types of keywords being disclosed by human error? Cool stuffffff
@sergheiadrian
@sergheiadrian 7 жыл бұрын
I can certainly appreciate the complexity of the OCR, the font matching and all that, but 99.9% of the time, what I want from my OCR software is the raw text which I will then export into a word processor and sort it out myself.
@Nemorosum
@Nemorosum 6 жыл бұрын
This video helped tremendously with a project I'm working on for work.
@Utkarshkharb
@Utkarshkharb 3 жыл бұрын
Very articulate and beautifully explained.
@Integralsouls
@Integralsouls 4 жыл бұрын
tesseract, my fav band and now my project library
@user-fy5go3rh8p
@user-fy5go3rh8p 4 жыл бұрын
Best explanation so far, just perfect. Thank you!
@ChristopherSprance
@ChristopherSprance 7 жыл бұрын
This guy sounds a lot like Will Forte. Great videos as always
@larskendall7101
@larskendall7101 7 жыл бұрын
Christopher Sprance Hahaha I was thinking the exact same thing!
@Dimiranger
@Dimiranger 7 жыл бұрын
This guy speaks so eloquently, impressive.
@henrikwannheden7114
@henrikwannheden7114 7 жыл бұрын
Wow! Excellent! More from this guy!
@allie-ontheweb
@allie-ontheweb 7 жыл бұрын
This OCR better be easier than that A-Level Unified Physics paper last week
@mennonis
@mennonis 7 жыл бұрын
does this man experience the world at half speed? hes so fast
@Stubrok
@Stubrok 7 жыл бұрын
Great guest, great explanation, great vid, thanks.
@xilluminati
@xilluminati 3 жыл бұрын
He would make a great professor!
@Tahgtahv
@Tahgtahv 7 жыл бұрын
I'm suprised he didn't even mention image skew or rotation in the process. That's probably an important step in the process, but I'm not exactly sure if you'd do it before a (local) binarization, or during the pattern matching stage.
@allmycircuits8850
@allmycircuits8850 7 жыл бұрын
In ScanTailor (one of programs for processing scans before putting them to OCR or converting to PDF/DJVU) there is first skew correction, then determining of working area and in the final step: otsu binarization. But actually for automatic skew correction image gets downsampled (say, from 300..600 dpi to 100 dpi) and binarized, because these algorithms usually use B/W as well, some pretty simple ideas, like finding lines which are all white (so they are between lines of text) which then turn to very dark overall (top and bottom of letters) etc.
@1cheryl
@1cheryl 7 жыл бұрын
Wonderful video... explains the subject very well. I could learn a lot from more videos like this!
@HoD999x
@HoD999x 7 жыл бұрын
i've been wondering how our brains can do it. contrary to what is explained here, i can read text even when its rotated or entirely upside down
@josedihego
@josedihego 3 жыл бұрын
Do you know Read4Me? It is for Android and iOs that is pretty decent and works entirely offline, no data plan or wifi needed. It saves time by allowing you use the extracted text in so many ways.
@ELYESSS
@ELYESSS 7 жыл бұрын
In France their stop signs says STOP
@plemli
@plemli 7 жыл бұрын
ILYES "Arrêtez-vous" doesn't quite fit so well on a hexagonal roadsign. قف does.
@EebstertheGreat
@EebstertheGreat 7 жыл бұрын
Apparently "stopper" is attested in French since 1792, obviously from the English "stop." But in Quebec, signs may read either "STOP" or "ARRÊT" (and in other parts of Canada, sometimes both are used on the same sign). Presumably the noun form is used because it is shorter than the verb "ARRÊTEZ".
@Blast-Forward
@Blast-Forward 7 жыл бұрын
FYI there are also yield signs in Australia with YIELD written on them, while e.g. in most european countries these signs are just blank.
@roidestrolls4934
@roidestrolls4934 6 жыл бұрын
yeah the guys who wanted to translate a STOP sign was a moron or it was just a sht example.
@ezzywizzy1049
@ezzywizzy1049 5 жыл бұрын
Je confirme.
@GregoryMcCarthy123
@GregoryMcCarthy123 7 жыл бұрын
This guy knows his stuff!!!!!!
@abraxis_602
@abraxis_602 5 жыл бұрын
very high quality video
@RossumAI
@RossumAI 6 ай бұрын
We love this video!
@igt3928
@igt3928 7 жыл бұрын
Great episode
@mohamadhallak8644
@mohamadhallak8644 3 жыл бұрын
It is an awesome video, thank you very much.
@Holobrine
@Holobrine 7 жыл бұрын
I'm more fascinated with machine learning OCR.
@DeathBean89
@DeathBean89 7 жыл бұрын
What he demonstrated in the video is basically how it's done in "ML" OCR. Pre-process the data (thresholding), classification of the objects to their respective letters (through one of the machine learning algorithms that he mentioned -- SVM, deep learning, HMM, etc.), and then determine the word and the font once you have the letters (which may be done using population stats, as described in the video, or through some other ML algorithm if you want to get fancy).
@The141192
@The141192 4 жыл бұрын
Nice Explanation. I just have one question: Can the OCR extract data from different documents? Consider the case of Suppliers sending Delivery notes to the OEM plant, Each Supplier has different format, can OCR help to extract data from those delivery notes. Ofcourse the data to be extracted is always the same - for eg. - Order ID, Qty., Material etc. Thank You
@navneetkrc
@navneetkrc 4 жыл бұрын
Check solution by Nononets. That might be of some help. Anyways I feel that you need to provide a decent amount of training data. How to get training data, just have your important data from a data source and print/ pdf in different formats and here you have unlimited training data for each data type
@parasjuneja7709
@parasjuneja7709 5 жыл бұрын
excellent explanation sir
@kevindeland9079
@kevindeland9079 4 жыл бұрын
Great choice of green highway sign. “We were somewhere around Barstow, on the edge of the desert, when the drugs began to take hold.”
@MrNateSPF
@MrNateSPF 7 жыл бұрын
I've got Optical Character Detection. That is I can tell if there's a character, but have no clue what it is.
@PrajwalSingh15
@PrajwalSingh15 7 жыл бұрын
Perfect , Thanks for video on OCR
@AxelWerner
@AxelWerner 7 жыл бұрын
Well.. that is like "state of the art" OCR today. but what about back then in the early days of OCR, like en.wikipedia.org/wiki/Westminster_(typeface) when there where just one or two "fonts" , no cpu power at all , no machine learning and nothing but still had the needs to automate "reading" ?? how they did this "simple OCR" back then technically ? please go more into the history, the low level, the beginnings.
@marioprawirosudiro7301
@marioprawirosudiro7301 7 жыл бұрын
I went to your Wikipedia page and found a link to this: en.wikipedia.org/wiki/Magnetic_ink_character_recognition Hopefully it sates your curiosity :)
@pranayreddy2728
@pranayreddy2728 3 жыл бұрын
just one request can we get subtitles for the videos , just to don't miss the important documentation names when they mention.
@SouravTechLabs
@SouravTechLabs 7 жыл бұрын
Is "classifization" even a word? However, this was really an awesome video!!
@tiagoandrade7945
@tiagoandrade7945 Жыл бұрын
amazing video explanation :)
@hamidtavakolian7121
@hamidtavakolian7121 3 жыл бұрын
Excellent, thank you.
@indiansoftwareengineer4899
@indiansoftwareengineer4899 6 жыл бұрын
is there any library for handwriting character recognition(HCR) in python, which can give output of text from images provided or it can give output in real-time in video?
@t_kon
@t_kon 5 жыл бұрын
You might want to combine both opencv and tesseract to get the text out from image/video.
@xanokothe
@xanokothe 7 жыл бұрын
Excelent video!
@peto348
@peto348 7 жыл бұрын
I have never seen OCR that just works. I have tried tesseract-ocr on screenshots (with no rotation and distortion), and still too many words contain errors. Yes, Google translate is nice, but it can afford to guess what word you are looking at even if there are few wrongly recognized characters. Replace some letters with similar and it still shows translation. Or point it on random icons or noise on your screen. Clearly there is no text, but it still shows random guesses. So yes, google translate tries to match ocr results to words - which I consider cheating. If you want from OCR software to find mistakes and show what is really written, it won't know if it is mistake in text or wrong character recognition. Y0u Can cleariy see these nistakes and you can be sure that it's not problem of your in-head OCR.
@Cruzz999
@Cruzz999 7 жыл бұрын
Would it be possible / worth the effort to have the OCR program look for common shapes that contain the text? Rectangles, for example, and use this in relation to the letters, as a method to fix letters that are otherwise skewed and difficult to read?
@mgord9518
@mgord9518 Жыл бұрын
Yes, but there's no guarantee that what appears to be a rectangle is actually one. If a document is photographed head-on and contains a trapezoid or diamond, the software might interpret them as rectangles, and unskewing the "rectangle" will skew the text, causing a negative effect.
@ahmedal-attar2393
@ahmedal-attar2393 6 жыл бұрын
tried to learn more about ocr this video got it all and short. the problem of tesseract since 3 years poor of resources... still my project stopped of that.. please advice on ocr-training
@mariosactron7617
@mariosactron7617 7 жыл бұрын
Wow this guy is fantastic!
@davidgoffredo1738
@davidgoffredo1738 7 жыл бұрын
This guy's the real deal.
@Rising_Pho3nix_23
@Rising_Pho3nix_23 7 жыл бұрын
The "T" could also be a fence post, not an actual letter at all. Just like an O could be a wedding ring, letter, number, uppercase, lowercase, doughnut. These things need to be considered when building this up around a camera. People like to play around with apps and find funny things to make them do. It would be halarious if an OCR dev knew about the O bug and when you showed it a picture of a doughnut instead of it showing a letter O, it said "yummy" :)
@Maisonier
@Maisonier Жыл бұрын
Now there are websites that detect the fonts of any image. Why don't you use this to start: detect the font. If it doesn't detect use the input of the user for some selections, then, start the OCR comparing with the font.
@mystwalker479
@mystwalker479 5 жыл бұрын
But in a programmer's perspective, you'd have to read all the pixels from top to bottom. Single letters are easy to detect but what if it was a sentence? With one vertical scanned line of pixel you'd get different dots of different letters. How do we know the X and Ys each character has in order to get scanned individually?
@djdexcat
@djdexcat 7 жыл бұрын
I spent all of last night setting up a PC game that was released exclusively in China. I took tons of pictures of option menus and the like with the Google Translate app to set it all up to my liking. And as soon as I needed a break from that, what do I see in my subscription box? This. What are the odds?
@geurra6559
@geurra6559 7 жыл бұрын
The secret brother of Woody Harrelson? Great video btw!
@educationaltechnology8363
@educationaltechnology8363 2 жыл бұрын
inspiring! can this be done using a raspberry pie or similar
@davidrubio.24
@davidrubio.24 3 жыл бұрын
He felt that deep learning was too hard to explain, so he made a string theory analogy to help us understand.
@GroovingPict
@GroovingPict 7 жыл бұрын
Why oh why is there still no OCR software that can even remotely adequately do Fraktur or other blackletter typefaces? Considering the amount of historic text there is in such typefaces, youd think this would be considered important by some to invest in. I know, I know, ABBYY had a dedicated blackletter version a while back, but it too was absolute shite at the job it was supposedly designed to do.
@Shivampandey-cb2oq
@Shivampandey-cb2oq 5 жыл бұрын
Thank you, man ! you make it great!
@RichardReikowsky9005
@RichardReikowsky9005 7 ай бұрын
I got all the files and folders for tesseract opened, how to I install it from the list thanks in advance.
@Lostpanda123
@Lostpanda123 7 жыл бұрын
Great tutor!
@obiwantschernobyl5650
@obiwantschernobyl5650 6 жыл бұрын
It's actually 0.77. I guess he firstly took the 350 out of 385, which divided by 5 equals 70. Then he divided the rest by 5 which equals 7 and not 9. 0.7 + 0.07 = 0.77 He mistook the rest as 0.45 and not 0.35. That's why he got 9 at the end...
@amorphant
@amorphant 7 жыл бұрын
Is that a young Ira Graves?
@BrentAureliCodes
@BrentAureliCodes 7 жыл бұрын
It comes out at '.83'. Sorry couldnt let it go.
@AmlanjyotiSaikia
@AmlanjyotiSaikia 7 жыл бұрын
We had to implement Otsu's method in MATLAB as part of our Introduction to Digital Image Processing course. It's a pretty nifty little algorithm.
@hamadosman8073
@hamadosman8073 7 жыл бұрын
Numberphile vs Computerphile
@xandercrox
@xandercrox 7 жыл бұрын
Love 'em both! I like the contrast in this video of the dot matrix sketches from brown paper of Numberphile. Brady films in a very human, obviously handheld over the shoulder shot. Here, the Sketches are shown from a static tripod, from an inhuman, oddly unsettling, though at the same time much more effective angle (Is that shot new BTW? stood out as novel to me).
@Computerphile
@Computerphile 7 жыл бұрын
+Alex W clamped second camera on bookshelf pretty much above him so the angle was a little odd as was off to the side but hopefully made the diagrams clear! >Sean
@procactus9109
@procactus9109 7 жыл бұрын
How far would you get if you put 3 letters through google and get the correct guess ?
@scottwatschke4192
@scottwatschke4192 7 жыл бұрын
Very educational.
@MaGFarqui
@MaGFarqui 7 жыл бұрын
Sometimes I dream I'm looking at something and I know there are letters in there and words but I just can't read them... and it's just so frustrating. I wish I had ocr built into my mind in those cases.
@NikiDaDude
@NikiDaDude 7 жыл бұрын
, hon hon hon
@JuusoAlasuutari
@JuusoAlasuutari 7 жыл бұрын
Tesseract 4 is amazing compared to Tesseract 3. The difference is that T4 uses an LSTM neural net.
@y2ksw1
@y2ksw1 7 жыл бұрын
I have coded a simple OCR engine which does right that 😊
@MARQUITOSGUALACBA
@MARQUITOSGUALACBA 5 жыл бұрын
Hi! This program don't recognice a numbers with background what do you recomend?
@dangee1705
@dangee1705 7 жыл бұрын
Can you do a video on elliptic curve cryptography? Thanks.
How Branch Prediction Works in CPUs - Computerphile
25:57
Computerphile
Рет қаралды 66 М.
Я сделала самое маленькое в мире мороженое!
00:43
Кушать Хочу
Рет қаралды 4,1 МЛН
VAMPIRE DESTROYED GIRL???? 😱
00:56
INO
Рет қаралды 6 МЛН
Крутой фокус + секрет! #shorts
00:10
Роман Magic
Рет қаралды 39 МЛН
Сюрприз для Златы на день рождения
00:10
Victoria Portfolio
Рет қаралды 1,5 МЛН
Cracking Enigma in 2021 - Computerphile
21:20
Computerphile
Рет қаралды 2,5 МЛН
How to Preprocess Images for Text OCR in Python (OCR in Python Tutorials 02.02)
53:24
Python Tutorials for Digital Humanities
Рет қаралды 162 М.
How Does Optical Character Recognition (OCR) Work?
5:48
Techquickie
Рет қаралды 432 М.
But what is a neural network? | Chapter 1, Deep learning
18:40
3Blue1Brown
Рет қаралды 17 МЛН
How TOR Works- Computerphile
14:19
Computerphile
Рет қаралды 1,7 МЛН
Mouse Cursor History (and why I made my own)
15:09
Posy
Рет қаралды 2,7 МЛН
AES: How to Design Secure Encryption
15:37
Spanning Tree
Рет қаралды 165 М.
How AI 'Understands' Images (CLIP) - Computerphile
18:05
Computerphile
Рет қаралды 205 М.
Essentials: Functional Programming's Y Combinator - Computerphile
13:26
Я сделала самое маленькое в мире мороженое!
00:43
Кушать Хочу
Рет қаралды 4,1 МЛН