Defining Regular Expressions (RegEx) - Computerphile

  Рет қаралды 83,082

Computerphile

Computerphile

Күн бұрын

Ahead of an upcoming Python implementation, Professor Thorsten Altenkirch goes through the details and definitions of Regular Expressions.
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharanblog.com
Thank you to Jane Street for their support of this channel. Learn more: www.janestreet.com

Пікірлер: 306
@jmvanick
@jmvanick 5 ай бұрын
Regular expressions... something that looks like your cat just walked across your keyboard, but somehow magically work when you have them just right.
@SarahKchannel
@SarahKchannel 5 ай бұрын
well i normally called it chicken feet patterns...
@lauralhardy5450
@lauralhardy5450 3 ай бұрын
Put an AI on it. It'll be able to work out if its still a cat
@DantalionNl
@DantalionNl 5 ай бұрын
"Are you from Germany?", "Yes!, suprise to many, I am from Germany". that was a good laugh, good sense of humor.
@Desmaad
@Desmaad 5 ай бұрын
As if the name and accent didn't give him away.
@jenshauser6351
@jenshauser6351 5 ай бұрын
What accent 😂
@Yxcell
@Yxcell 5 ай бұрын
I honestly thought he was Scandinavian. 😅
@earlgrey8611
@earlgrey8611 5 ай бұрын
@@Yxcellfound the American
@Yxcell
@Yxcell 5 ай бұрын
@@earlgrey8611 haha, you got me. To be fair, though, with a name like "Thorsten" it's no wonder I thought he was Scandinavian.
@3snoW_
@3snoW_ 5 ай бұрын
11:40 - Here I think that the first and third * aren't supposed to be there, the expression should be (b+Ɛ)(ab)*(a+Ɛ) . I believe that the language he was aiming for was "an alternating sequence of 'a's and 'b's". Also (b+Ɛ)* is the same as just b*, which would make adding the Ɛ redundant.
@PeterNickson
@PeterNickson 5 ай бұрын
Glad someone else posted this. Was thinking this myself
@Infermity
@Infermity 5 ай бұрын
Yeah I'm pretty sure thats correct as klein star always includes the empty word in the resultant language.
@BeheadedKamikaze
@BeheadedKamikaze 5 ай бұрын
I think you are mistaken. The * allows any number of 'a's (or none) before the sequence of alternating 'ab's, followed by any number of 'b's (or none). Without the *, you must supply only either zero or one of each.
@lukmigindnuforhelved
@lukmigindnuforhelved 5 ай бұрын
"((ab)+a?|(ba)+b?)"
@GodShiru
@GodShiru 5 ай бұрын
​@@BeheadedKamikazeThat is what we want, though. Either 0 or 1 b, followed by any quantity of ab, then 0 or 1 a. That would allow for an alternating sequence of a and b, starting and ending with either a or b.
@bogdanstamenic2836
@bogdanstamenic2836 5 ай бұрын
I honestly don't understand all the hate for regex. Granted, I'm not a full-time SW developer (I'm an electrical engineer), but as long as you're matching something simple, it's perfectly fine. If it starts to become a mess, because I want to match 50 things, then I'll just switch to writing parser code instead
@MrBluelightzero
@MrBluelightzero 5 ай бұрын
Because it's usually more important that code is readable than writeable.
@Quasi84
@Quasi84 5 ай бұрын
Regular expressions are not known for clarity and speed.
@krum1987
@krum1987 5 ай бұрын
I use regular expressions all the time for quick work and I tend to avoid them in programming unless they make sense to use. It is much easier to read and understand a de-serialized class/struct containing the data from a value. That being said, I use regex quite often for redundant file modifications ALL the time. for example in notepad++ we can assume the data "Test " Test1 Test2 ....etc If I select "Test (\d)" and then replace it with "$1" it will replace "Test 1" with "1" and "Test 2" with "2", because you can grab the group values from the regex. This is of course extremely simplified, but it is very handy!
@sadhappy8860
@sadhappy8860 5 ай бұрын
It gets messy really quickly and mistakes are... always going to happen.
@mage1over137
@mage1over137 5 ай бұрын
Honestly my problem with RE is that there is usually a better solution in python.
@saranobutt
@saranobutt 5 ай бұрын
Just letting you know I used to watch you when I was literally 10 years old and I'm still watching you
@JuusoAlasuutari
@JuusoAlasuutari 5 ай бұрын
You should really take a bathroom break
@BenRHarsh
@BenRHarsh 5 ай бұрын
That doesn't really mean much to anyone else unless you mention how old you are now.
@usopenplayer
@usopenplayer 5 ай бұрын
Well, she joined KZbin 7 years ago. We could guess since she was an avid KZbin watcher at 10, but couldn't create an account until she was 13 (because there is no way she could sign up before this age according to KZbin's guidelines 😂) 7+3 is 10 years of watching the channel.
@OneMadGamer
@OneMadGamer 5 ай бұрын
​@@BenRHarshfr. They could be 11 now.
@vadrif-draco
@vadrif-draco 5 ай бұрын
And may you continue watching and learning
@GijsvanDam
@GijsvanDam 5 ай бұрын
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. -- Jamie Zawinski, Netscape engineer
@xeus
@xeus 5 ай бұрын
This is one of the stupidest quotes I know from the field of IT. Honestly, if someone uses this quote in all seriousness, I think they are either too lazy to learn RegEx or simply not smart enough to do it. A regular expression may look like a horrible mess, I grant that, but so does any code, even well written, to someone who doesn't understand the language. If someone thinks they understand RegEx, but also think it's a problem, they don't in fact understand RegEx.
@UmaiKayu
@UmaiKayu 5 ай бұрын
@@xeus Devil's advocate: "Honestly, if someone doesn't write all of their code in the native assembly for their CPU architecture, they're simply not smart enough to do it."
@xeus
@xeus 5 ай бұрын
@@UmaiKayu I find your analogue quite unfair, so let me fix it for you: "Honestly, if someone finds native assembly language a mess, they are either too lazy or not smart enough to learn it." It's still a bad analogue, but now it's just a forgivable reductio ad absurdum instead of some utterly pointless misrepresentation of the actual argument.
@UmaiKayu
@UmaiKayu 5 ай бұрын
@@xeus I was hoping we could agree the appropriate solution to increasing problem complexity is using sufficiently high level language constructs, as "just think real hard" scales poorly.
@AlexRoseGames
@AlexRoseGames 5 ай бұрын
@@UmaiKayu but writing in assembly makes a task slower, regex makes a task 100x quicker. 99.9% of the time I use regex I'm not writing it into code, I'm using it to find strings, find and replace, extract text from websites, find files on my computer, query etc. etc. - in those cases I am solving a problem with regex and I now have 0 problems
@Erik_The_Viking
@Erik_The_Viking 5 ай бұрын
I have a lot of respect for regular expressions - they're incredibly powerful tools in the right hands. Sadly I rarely use them in my work, but when I've needed them, OMG what a difference it made!
@BenRHarsh
@BenRHarsh 5 ай бұрын
I love and hate RegEx at the same time. It's so goddamn useful and elegant but so semantically dense it's hard to wrap my mind around it when trying to use it.
@AlexRoseGames
@AlexRoseGames 5 ай бұрын
just do regexcrosswords and don't stop until you've done every single one. it will take 1 weekend only and you will be able to use it for life
@martinbakker7615
@martinbakker7615 5 ай бұрын
I use tdd to get a grip on what I want from a set random strings. Re are really difficult.
@toby9999
@toby9999 5 ай бұрын
They don't look elegant to my eyes. They're a mess.
@JuusoAlasuutari
@JuusoAlasuutari 5 ай бұрын
I'm also dense, that's why I find regexes fairly understandable.
@williamli55
@williamli55 2 ай бұрын
I find some aspects of regex to be relatively straightforward, and fairly easy to get right on the first try. But things like negative lookahead and lookbehind, and nested groups boggle my mind to no end.
@ivarkrabol
@ivarkrabol 5 ай бұрын
Interesting video, but there are several problems: 9:46 "(b+c)*(a(b+c)*a)*(b+c)*" does not include ALL words with an even number of "a"s, such as "abababa". The final "(b+c)*" should be moved inside the preceding parethesis: "(b+c)*(a(b+c)*a(b+c)*)*" 11:37 "(b+Ɛ)*(ab)*(a+Ɛ)*" includes words that do not have alternating "a"s and "b"s, such as "bbaa". The first and last parenthesis should not have the asterisk: "(b+Ɛ)(ab)*(a+Ɛ)"
@ANunes06
@ANunes06 5 ай бұрын
10:55 - The "game" is called Regex Golf. You try to match set A while NOT matching set B in the fewest characters possible. ^(?!(..+)\1+$) for example, exclusively matches strings with length equal to any prime number. For reasons that I *barely* understand let alone explain.
@Stratelier
@Stratelier 4 ай бұрын
... I think I can explain that (at least part of it). ?! is an assertion that the subsequent pattern must _not_ match, and (..+) is a sequence of any two or more characters, also flagging it for repeat as "group 1", with \1 being a reference to "group 1". Note that when dealing with variable-length keywords (e.g. + meaning "one or more") the regex engine must, in principle, brute-force every possible permutation of lengths to find a match (and whether it should prioritize longer or shorter matches is an iceberg unto itself). So the patterns inside this regex are essentially trying to "factor" the input string into two-or-more groups of two-or-more characters each. If no such combination exists then the input string must have a length that is unfactorable (i.e. a prime number), the negative assertion succeeds, and the expression as a whole is declared a match.
@thebarnold7234
@thebarnold7234 4 ай бұрын
I remember Thorsten teaching this during my second year of University (im on my third now), this kind of stuff is really interesting and VERY useful if you want to make your own programming language
@iabervon
@iabervon 5 ай бұрын
I think a lot of the difficulty of regular expressions in practice is that they get specified using the same character set that the words they are trying to recognize are from, and that there's a short form of (a+b+...+z) but it's hard to read. This video would be a lot harder to follow if Sigma contained ( and ) and + and *, which aren't good for examples but are kind of necessary if you want to use regular expressions to tokenize programs in a compiler.
@WormHunter
@WormHunter 4 ай бұрын
Man this series is really taking me back to my compilers class in University.
@AndersJackson
@AndersJackson 4 ай бұрын
I have learned to use "|" and not "+" as an "or" operation in RE. I think it is easier to read, but that is just me. I also define "a+" as equal to "a(a*)", so it can be one or more and not just zero and more, like "a*". That is useful.
@Misantronic
@Misantronic 5 ай бұрын
the syntax of these regexes is so different from what I use in javascript etc. + here means OR + in JS means 1 or more occurrences
@MystbornYT
@MystbornYT 5 ай бұрын
yup, confused me for a sec
@fragglet
@fragglet 4 ай бұрын
I honestly wonder if he forgot the syntax and didn't bother to refresh his memory before making the video
@liamwelsh5565
@liamwelsh5565 4 ай бұрын
⁠@@fraggletThis is the standard syntax when using regex’s in a theoretical context.
@TheDmviper
@TheDmviper 5 ай бұрын
Im pretty sure the RE at 0:31 is binary numbers that are multiples of 3. My reasoning is that the construction 10(1+00)*01 is built specifically to keep parity. Since 1, 2, 4, 8, 16, 32, ... Is +1, -1, +1, -1, +1, -1 from a multiple of 3, 11 is the smallest number where the pariry cancels out and as long as you add an even number of 0's between you're fine. However, you could in the case of 1001, go to 10101 because all 3 bits will have the same parity and cancel out due to the zeros in between. This is the reason it's not just 1(1+00)*1. The ... + 0 + 11 represent both base cases and the final klenne star on top is because if you concatenate two multiples of 3 you gev a multiple of 3. I figured since nobody put an answer in the comments yet I would shoot my shot in case I missed something. Was definitely a fun puzzle!
@BenAlternate-zf9nr
@BenAlternate-zf9nr 3 ай бұрын
Oh, are those 1's? I was waiting for him to explain what a lambda or caret character meant in a regex.
@ash__borne
@ash__borne 2 ай бұрын
Had a test the other day, one of the questions was to write a regex that would allow an odd number of a specific symbol. Can't believe I didn't think of that simple trick. This is amazing as it helps me deepen my understanding of the branch of CS. Thank you real much (plus the German accent is super)
@tracyrreed
@tracyrreed 5 ай бұрын
I love regex! It's so useful. I don't understand how Windows folks get by without it. It's built into so many linux/unix tools.
@AndrewTSq
@AndrewTSq 4 ай бұрын
I was blown away the other day, when someone had made one of the advent of code in regular expresions :D that was really cool.
@spaceshipable
@spaceshipable 5 ай бұрын
The numbers of times I've used regex and the number of times I've googled how to write regex are the same.
@OlafDoschke
@OlafDoschke 5 ай бұрын
You can subdivide regular expressions into the categories of real expressions, rational expressions, imaginary expressions, irregular expressions, and complex expressions. Most of them are in the last two categories.
@lxathu
@lxathu 4 ай бұрын
I love RE's and they've been a part of my job for decades. But I'm glad I didn't have to learn them by starting from math's formalism (although I like math pretty much, too) because otherwise our close relation wouldn't have became so strong.
@rajbhai-tp7cf
@rajbhai-tp7cf 5 ай бұрын
Is that right at 11:40? Looks like bbbbababaaaaa would match. I don't think the first and third *s were intended.
@3snoW_
@3snoW_ 5 ай бұрын
Yeah, just commented the same thing, then loaded some more comments and found yours lol.
@BitJam
@BitJam 5 ай бұрын
@9:58 the regular expression ((b+c)*(a(b+c)*a)*(b+c)* does not match all words with an even number of a's. However this should do it (b+c)*(a(b+c)*a(b+c)*)* The expression in the video requires that all the a's except possibly the first and the last appear in pairs.
@smuecke
@smuecke 4 ай бұрын
Can you give a counterexample of a string with an even number of a's that the first expression does not match?
@BitJam
@BitJam 4 ай бұрын
@@smuecke "abababa" Note that after the 2nd "a" is matched, the original regex can only match another "a" or (b+c) to the end of the string. If you still don't see then try looking at this related expression: (a(b+c)*a)(a(b+c)*a)* It's not the same expression but the requirement that a's, other than the first and the last, must come in consecutive pairs should be more obvious.
@ProjektRakete
@ProjektRakete 5 ай бұрын
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
@sadhappy8860
@sadhappy8860 5 ай бұрын
Amen.
@AlexRoseGames
@AlexRoseGames 5 ай бұрын
luddite cope
@bopcity5785
@bopcity5785 5 ай бұрын
spent a long time being confused about the man talking about regex while not writing regex before realising the math is more general
@danieldarroch4775
@danieldarroch4775 5 ай бұрын
A regex for the language of words not containing 'ab', over the alphabet {a,b,c}, is the expression (aa*c+b+c)*a*
@danieldarroch4775
@danieldarroch4775 4 ай бұрын
This can be further simplified to the expression (b+a*c)*a*
@danieldarroch4775
@danieldarroch4775 4 ай бұрын
As others have pointed out, the regex given for the language over alphabet {a,b,c} of words containing an even number of a's is not correct. Other's have posted a few different regexes which work, but I think the following is simpler: (b+c+a(b+c)*a)* This clearly always forces the a's to appear in pairs, so it only includes words with an even number of a's. But it also allows any word on {b,c} to appear before the first a, after the last a and between any pair of a's.
@Nnyerix
@Nnyerix 5 ай бұрын
0:06 made me check my Outlook :D
@Computerphile
@Computerphile 5 ай бұрын
sorry about that - we did realise and mute it! -Sean
@warrenarnold
@warrenarnold 5 ай бұрын
​@@Computerphiledont worry he just underwent a deterministic state change
@shdon
@shdon 5 ай бұрын
Unless I'm mistaken, the regex that is supposed to match alternating sequences of A and B (ABABAB or BABABA etc) would also match the empty string or just single A or single B - three edge cases that I don't think qualify though it is arguable. In the notation that is used there, I feel ((ba(ba)*(b+Ɛ))+(ab(ab)*(a+Ɛ))) would be a better match for the spoken description.
@napukapu
@napukapu 5 ай бұрын
Look into how RegEx handles greedy vs non greedy matching. What you described here is called greedy behavior. It's too long to get into in a youtube comment.
@shdon
@shdon 5 ай бұрын
@@napukapu I know all about greedy and non-greedy, as I use RegExes extensively in my day job. This is not that concept. It is a case of the regular expression and spoken description not being the same. And I'm explicitly using the (rather strange, for people familiar with Perl compatible RegExes) notation used in the video.
@stro5179
@stro5179 5 ай бұрын
I think yours would match babbb, but it's an easy fix. We can change the trailing b* and a* to (b + epsilon) and (a + epsilon) respectively. Did I parse that right?
@shdon
@shdon 5 ай бұрын
@@stro5179Yes, correct. It should be that (now fixed in an edit). I was thinking about the ? in perl regexes.
@Misantronic
@Misantronic 5 ай бұрын
to everyone wanting to get into regular expression in practice, I can only encourage to write you own little template-language like f.e. handlebars. I did this ~8 years ago to get more into it and since then I am very good in that field
@Misantronic
@Misantronic 5 ай бұрын
function T(t, c, r, _) { try { return eval( // set context "with(c||{}){_='';" + // match html-tags t[r = "replace"](/(?: |^)\s*(
@JuusoAlasuutari
@JuusoAlasuutari 5 ай бұрын
Here's some tediously boring trivia: In the C language specification, there's no definition for a decimal integer constant with a zero value. At all. In other words, in C it is not possible to express the number zero as a base-10 literal. Maybe you're wondering how that makes any sense. You've seen lines like "int i = 0;", so how is that not a zero? Because in C, "0" is an octal. And if you think about how to match integers as regexp, it makes perfect sense.
@wattmeter
@wattmeter 5 ай бұрын
Please cover Parsing Expression Grammars/Generators next!
@natescode
@natescode 4 ай бұрын
I second this
@philliph2581
@philliph2581 5 ай бұрын
As a frequent Regex user in coding, I found this hard to follow due to the non-standard operators Prof. Altenkirch uses. That is, he uses `(a+b)` instead of `(a|b)` for the "or" or alternation operator and `(a+ε)` instead of `a?` for the optional operator. Is this coming from somewhere, like a historical or mathematical variant of Regex rather than the more common modern coding styles used by `grep`, Python, or PCRE? Also, as others have pointed out, he should have written `(b+ε)(ab)*(b+ε)` for the alternating example, which would be `b?(ab)+a?` in more standard regex. FWIW, I find regular expressions are almost always more readable than their parser alternatives, but then anything you use all the time becomes second nature.
@user-go5ri2yg5f
@user-go5ri2yg5f 5 ай бұрын
This guy looks like he is the installation wizard.
@grndkntrl
@grndkntrl 5 ай бұрын
He reminds me of Brent Spiner's character (Dr. Brackish Okun) who is the lead scientist at Area 51 in Independence Day
@YuTv1408
@YuTv1408 5 ай бұрын
THORNSTEN!!!! This guy is a Genius!!
@velho6298
@velho6298 5 ай бұрын
I always thought regex is something magical but the more and more I use it, this assumption hasn't changed it is still as magical as regex can be
@coolcodingcat
@coolcodingcat 5 ай бұрын
I thought + meant 1 or more of the preceeding group and | meant this one or that one
@julianmahler2388
@julianmahler2388 5 ай бұрын
As a fellow German, I like how he doesn't even try to hide his accent
@MartinMaat
@MartinMaat 5 ай бұрын
+ does not mean "OR" in regex, that would be the pipe character (|). Plus means "the preceding must occur and may be followed by". So quite confusing to anyobe vaguely familiar with regex.
@_garicas
@_garicas 4 ай бұрын
He's not writing in any specific syntax. If you stop to search, there are many implementations of regular expressions syntax, even one being incompatible with another. You're just used to modern languages implementations which tends to follow some sort of standard, but even tho there are nuances from implementation to implementation.
@MartinMaat
@MartinMaat 4 ай бұрын
@@_garicas Alright. I know about the nuances, this radical deviation from the norm is new to me though.
@Yupppi
@Yupppi 5 ай бұрын
Could you make a video where you have someone talk about the difference of compile time vs runtime and why would you prefer something over the other for which situations? I've heard the terminology passed around regarding C++ or Rust or Swift, talking about how you can make something runtime calculated or compile time calculated and I don't quite understand the advantages of each.
@kiseitai2
@kiseitai2 4 ай бұрын
Compile time computation just means that the compiler precomputes a value while processing your code. It then stores the value in the program. When a program calls the function, the assembly under the hood is simply some mov or return operation as opposed to going through n operations for that function. Basically, the compiler builds a cheat cheat ahead of time so the program can perform a look up instead of the actual computation. Bonus points if the result of your function is some value that gets called so many times it ends up in the cpu cache.
@joergbrueggmann
@joergbrueggmann 5 ай бұрын
An even number of the letter 'a' in an alphabet of {a, b, c} is "((b+c)*a(b+c)*a(b+c)*)*", isn't it? Even more elegant is the one from @BitJam, which is "(b+c)*(a(b+c)*a(b+c)*)*", don't you agree? The video location is 9:57. Let me know if I am wrong. I am curious as to see the conversion from RegEx to NFA.
@MyName-tb9oz
@MyName-tb9oz 5 ай бұрын
Someone should write a regex training program that gives you a crossword puzzle and requires you to write a regex that it will check against a dictionary file and give you a list of words that fit. I'd only allow you to choose the word from the list your RE gives you. It should also be able to let you specify exactly which 'regex' you are using. Is this already out there somewhere? I think it would be very useful. No, I can't write it myself.
@spot1401
@spot1401 5 ай бұрын
I think someone wrote a safe cracking game which had to do with RegEx but I might be mistaken.
@negativeseven
@negativeseven 5 ай бұрын
RegexGolf is somewhat close to that
@MyName-tb9oz
@MyName-tb9oz 5 ай бұрын
RegexGolf looks like it's the opposite of what I'm thinking of, @@negativeseven. With RegexGolf you get two lists and have to match everything in one and nothing in the other. What I'm thinking of is where you have a set of letters in specific positions and you want to compare them to everything in a dictionary. More like grep golf...
@Diamonddrake
@Diamonddrake 5 ай бұрын
Regex crossword exists
@ANunes06
@ANunes06 5 ай бұрын
@@negativeseven Is that available outside of the one you can find online with google and the telehack version? I love me some regex golf.
@Aquarii_Z
@Aquarii_Z 5 ай бұрын
So what is the short English description of the regular expression at 0:30?
@paulb75
@paulb75 5 ай бұрын
(Spoiler below ...) I reckon they are the multiples of three in binary. Seems to be that way in practice at least, and I can see why 0 and 11 are in the regex (sort of) but I haven’t explained to myself why the other element captures all the "other" multiples of 3. Would welcome any insight
@themightiestofbooshes9443
@themightiestofbooshes9443 4 ай бұрын
Funny thing, I opened explorer++ on my PC today to find all files of a particular name and type, and one of the options was "include regular expressions." I was like wtf are those? So I look it up and couldn't find anything I could understand with my novice brain. So thanks for this, and being recommended to me in short order xD
@fragglet
@fragglet 4 ай бұрын
Kind of odd that he's not using the usual syntax for regexps. What he writes as h(e+a)llo is usually written h[ae]llo
@liamwelsh5565
@liamwelsh5565 4 ай бұрын
In a theoretical context, this is the standard.
@philevans4021
@philevans4021 5 ай бұрын
Am I missing something, or should it be "h(e|a)llo" rather than "h(e+a)llo" - the latter looks for h, followed by one or more e's, followed by allo. Or is he not talking about PCREs?
@JuusoAlasuutari
@JuusoAlasuutari 5 ай бұрын
Also, if that's just an extended regexp (e.g. "grep -E") the parens are unnecessary.
@Eagle3302PL
@Eagle3302PL 5 ай бұрын
He's talking about regular expressions as a concept, so he's not bound by the RegEx syntax, RegEx is just a standardised syntax spec for writing regular expressions.
@JuusoAlasuutari
@JuusoAlasuutari 5 ай бұрын
@@Eagle3302PL ambiguous syntax is not the same as describing abstract concepts, not even when the ambiguous syntax happens to be pseudocode.
@lasdernas
@lasdernas 5 ай бұрын
just academics.. not using regular symbols for regular expressions is one of their specialities
@professorpwerrel
@professorpwerrel 5 ай бұрын
h[ea]llo
@00Ruben
@00Ruben 5 ай бұрын
I'm a simple person. I see RegEx and I like the video.
@redryder3721
@redryder3721 5 ай бұрын
The sequence defined in the Regex is possibly OEIS A182632: "Toothpick sequence on the hexagonal net starting from a node." - 0, 3, 9, 21, 33, 45... Sorry if this was covered somewhere in the video, I didn't see the explanation.
@kennythegamer1
@kennythegamer1 4 ай бұрын
I'm pretty sure the main reason most people don't like Regular Expressions is that there are about 12 common syntaxes, and few things specify which they use.
@gerardvanwilgen9917
@gerardvanwilgen9917 5 ай бұрын
I don't even try to analyse the regular expressions that I encounter in code. If one must be changed, I just start again from scratch.
@guitoo1918
@guitoo1918 5 ай бұрын
Regular expression are not technically Type 3 regular languages descriptors. There is a construct that allows you to search for exact match of a previously matched pattern. That requires having memory and a dfa doesn't have any.
@megaing1322
@megaing1322 5 ай бұрын
Because most regex engines don't actually implement regular languages and are just misusing the term. PCRE style regex for example are actually significantly more powerful than even CFGs and are potentially turing complete or at least linear bounded automatons.
@TofuBug24
@TofuBug24 5 ай бұрын
Maybe I'm not used to this form of grammar since I'm used to doing them in programing where + is a token like * and ? that represents at least one to infinity of the token that precedes it, also that the pipe | is used for OR instead of +. That being said couldn't the abab... or bababa... regex just simply be (ba+ab)* I know at least in my code ((ab|ba)*) will return any string of ababa.... or bababa..
@helleye311
@helleye311 5 ай бұрын
This seems like an odd regex syntax to me. In all the examples + is an or, which I've never heard about. Is this actually used anywhere (as in, can you actually write those things into some program to highlight/replace those patterns) or is it regex "pseudocode"/mathematical notation? Genuinely curious, if someone more knowledgeable in theoretical stuff could enlighten me it would be great.
@halfsourlizard9319
@halfsourlizard9319 5 ай бұрын
It's maths notation; in most implementations a pipe (or an escaped pipe) is used.
@helleye311
@helleye311 4 ай бұрын
@@halfsourlizard9319 that's what I'm familiar with. They don't really teach finite automata or regex theory here, at least not at the level I was studying. Thanks!
@halfsourlizard9319
@halfsourlizard9319 4 ай бұрын
@@helleye311 And, of course, in most implementation, + is used for >0 repetition, rather than disjunction/alternative.
@halfsourlizard9319
@halfsourlizard9319 4 ай бұрын
s/implementation/implementations/
@Sharaton
@Sharaton 4 ай бұрын
There's a slight miss with the RE for an even number of a:s. Every pairing of a:s can only have b:s and c:s between themselves, and not between other pairings. So (using b for every (b+c)*) babaabab is allowed, but not babababab. A corrected RE would be (b+c)*(a(b+c)*a(b+c)*)*, ((b+c)*a(b+c)*a)*(b+c)*, or (a(b+c)*a+b+c)*.
@devttyUSB0
@devttyUSB0 5 ай бұрын
I work with regexs daily but this is deep ;)
@joergbrueggmann
@joergbrueggmann 5 ай бұрын
Usually, I am using POSIX Basic Regular Syntax (BRE) to express RegEx like "[bc]*(a[bc]*a[bc]*)*", which would be the equivalent to "(b+c)*(a(b+c)*a(b+c)*)*". Which is the standard used in this video?
@DavidLindes
@DavidLindes 4 ай бұрын
In the places where I've seen + supported, it's like * but 1-to-many instead of 0-to-many, so "(b+c)*" (to pick on just part of your example) would be "(bb*c)*" in BRE. I haven't quite gotten through this video yet, so I'm not ceratain, but that's how I was interpreting the initial challenge... (Will respond again if I find something else as I watch.)
@DavidLindes
@DavidLindes 4 ай бұрын
Ahh, no, 6:18 shows he's using it quite differently. I don't know where this syntax comes from, but yes, in seeing it defined that way, I think your equivalency is valid. But yeah, where on earth is this syntax from? The world of pure theory, I guess... because it's not PCRE or anything else I know.
@computersciencebyd-m-3323
@computersciencebyd-m-3323 5 ай бұрын
10:30 Exercise: is the RE that does NOT contain sequence 'ab' given by (b+c)*((a*c)*(b+c)*)*a* ? Correct? Thank you for your comments :)
@inigo8740
@inigo8740 3 ай бұрын
I've checked out your solution and it seems right to me. After seeing your solution I came up with my own: ((a*c)+b+c)*(a*+epsilon).
@eternaldoorman5228
@eternaldoorman5228 4 ай бұрын
0:34 if you use Bourne shell bracket-expansion you can get an idea of what set this is, but I don't see much of a pattern there, apart from "Well, it starts 10, ... Try this: echo {,10{,{1,00},{1,00}{1,00}}{01,0,11}} which gives 1001 100 1011 10101 1010 10111 100001 10000 100011 101101 10110 101111 1010001 101000 1010011 1000101 100010 1000111 10000001 1000000 10000011 and you can push it out as far as you like by duplicating the expression and leaving out the first comma,..
@nickpalance3622
@nickpalance3622 5 ай бұрын
I use a few simple regular expressions on occasion. This made my head hurt. 🤯
@williamli55
@williamli55 2 ай бұрын
I'm currently writing a POSIX-compliant parser for creating Linux utilities and the most frustrating aspect of the code base is the regex. I attempt to avoid regex at all costs, because there are usually simpler and more elegant ways to get the job done, but some portions of the code do need regex. Every time I write unit tests for complex regex, I have to prepare myself for a world of pain, suffering, and disappointment.
@ScottLahteine
@ScottLahteine 4 ай бұрын
I tend to think the best way to approach a new topic is to start with a more high level and practical view and only later delve into the deeper theory and relationships, but then again I’m not a designer of databases.
@AgentM124
@AgentM124 5 ай бұрын
I usually use IrregularExpressions
@lymnjuice
@lymnjuice 5 ай бұрын
I'm pretty good at regular expressions and this made me forget all of it... his use of + instead of | hurt my head.
@Kwauhn.
@Kwauhn. 4 ай бұрын
Maybe it's just me, but I feel like the follow-through from initial presentation to complete explanation isn't really there. I mean, he explained the formal language and all the components of a regular expression, but I don't feel like I ever really understood _how_ all the definitions give rise to the useful tool it is. I've used regex before for string search and in programming, I'm just having a hard time connecting the practical concept with the all the formal math presented in this video.
@Teambr00klyn
@Teambr00klyn 4 ай бұрын
ive used regex a few times, this is the first time ive watched a tutorial where i feel i now know less about the subject
@RobBulmahn
@RobBulmahn 4 ай бұрын
I often use regex, and I am extremely confused by whatever notation he's using. For example, he says that "h(e+a)llo" is "H, either E or A, L L O" but that's not what it would match as a regex expression. That would be "H, 1 or more Es, A L L O." As a regex expression for what he's saying, that would be "h[ea]llo", so I really have no idea what syntax he's using. Repeat this throughout pretty much all the examples throughout the video. Just about any expression he gives, I'm sitting here thinking "That is *absolutely* not how you would write that."
@ShadowManceri
@ShadowManceri 5 ай бұрын
My favorite regular expression is horror when I try to fix production pattern.
@cryptotharg7400
@cryptotharg7400 5 ай бұрын
I was always mystified by regular expressions. I still am. 😎
@ncktbs
@ncktbs 2 ай бұрын
you should try to get a video on the Vesuvius challenge reading the herculaneum scrolls
@kiseitai2
@kiseitai2 4 ай бұрын
I guess I’m in the minority? I love regexes, but I get they can become difficult to follow after a certain length. However, they make certain pattern lookups so consistent in a succinct form . Makes life easier in combination with interpreted languages where you don’t have the convenience of optimizing out the steps to a non-regex solution. I prefer using regexes as simple filters for an unknown input. It’s very handy for input sanitization.
@eproulx
@eproulx 4 ай бұрын
I thought that the plus sign + is cardinality symbol for "at least one". Not "or" which i thought was the vertical bar |.
@Bianchi77
@Bianchi77 4 ай бұрын
Nice video, thanks :)
@jp-ny2pd
@jp-ny2pd 5 ай бұрын
RegEx is the closest you get to being a grey bearded wizard writing out spells in ancient runes to get the database to import.
@belialofeden
@belialofeden 5 ай бұрын
Regular expressions. Walter wallis the perl poet knows all about them Edit: had to fix spelling error added 3 l's in all
@jorgelandrian4847
@jorgelandrian4847 5 ай бұрын
It is weird the use of + as an or. The regex normally use + for one or more instance
@rizkiaprita
@rizkiaprita 5 ай бұрын
yay!
@nathanaelsmith3553
@nathanaelsmith3553 5 ай бұрын
I love regular expressions. I use them loads at work. They can also be used to generate fractal art.
@patton72010
@patton72010 5 ай бұрын
Professor Thorsten Altenkirch looks like Dave Grohl and Taylor Hawkins have morphed into 1 person and quit Foo Fighters for Foo(x) Pythons.
@randy7894
@randy7894 5 ай бұрын
Both fascinating and ptsd triggering.
@rdubb77
@rdubb77 4 ай бұрын
Psst, use an LLM based helper to write RE for you. Also good for non trivial CSS 😉
@reecenoone3113
@reecenoone3113 5 ай бұрын
Honestly, I clicked on the video to see how he pronounced RegEx. Unless I missed it, it seems controversy was tactfully avoided.
@Qbe_Root
@Qbe_Root 5 ай бұрын
It's the GIF debate all over again!
@huawafabe
@huawafabe 5 ай бұрын
he's german so there's no confusion there 😆
@igrim4777
@igrim4777 5 ай бұрын
For me galavanting(\saround){0,1} \sin\sfar\saway\splaces is a regular expression.
@nirualleer4482
@nirualleer4482 5 ай бұрын
through path of exile i found out abouut regex and i love it, want to become an expert
@momohLBY
@momohLBY 5 ай бұрын
"Yes im from Germany, are you surprised?!" 😆
@noelsherron
@noelsherron 5 ай бұрын
I have a problem finding which computer file to watch. So I watch the video on regular expressions. Now I have two problems.
@proosee
@proosee 5 ай бұрын
Hearing the accent, I was NOT surprised that professor is from Germany 😅
@martinseal1987
@martinseal1987 5 ай бұрын
This is the most complicated explanation of regex I've ever heard
@stelliosbonadurer359
@stelliosbonadurer359 5 ай бұрын
ChatGPT is a godsend for regex
@cottonfoo
@cottonfoo 5 ай бұрын
Only if you already know regular expressions, because if you don't you won't be able to spot its mistakes.
@dylanclarke9497
@dylanclarke9497 5 ай бұрын
That sounds like a nightmare to debug when something goes wrong. "Sorry for the confusion" feedback loop incoming
@tron121
@tron121 5 ай бұрын
regex101 ftw
@BigWhoopZH
@BigWhoopZH 5 ай бұрын
​@@cottonfooOr write tests.
@mikiex
@mikiex 5 ай бұрын
@@cottonfoo You just test them with an online regex tester
@henrygonzales9666
@henrygonzales9666 15 күн бұрын
What is an irregular expression?
@nicolasvecchione6016
@nicolasvecchione6016 5 ай бұрын
I thought + meant one or more of previous character?
@landsgevaer
@landsgevaer 4 ай бұрын
The example about any even number of a's near 9:40 is wrong though. It shouldn't be (b+c)*(a(b+c)*a)*(b+c)* but (b+c)*(a(b+c)*a(b+c)*)* imo.
@enriqueparodiYT1
@enriqueparodiYT1 5 ай бұрын
Honestly guys, this explanation was very confusing. It manages to be structured and extremely unstructured at the same time.
@xelaxander
@xelaxander 5 ай бұрын
„Are you from Germany?“ - How do you not recognize that accent.
@andrewjknott
@andrewjknott 5 ай бұрын
Regex for no "ab" using the notation from the video: a + (b + c + ac)*
@GaborRevesz_kittenhuffer
@GaborRevesz_kittenhuffer 5 ай бұрын
how does this you give you "baa"?
@Spax_
@Spax_ 5 ай бұрын
regex is epic
@ROBOTRIX_eu
@ROBOTRIX_eu 5 ай бұрын
@coolcodingcat
@coolcodingcat 5 ай бұрын
This seems to be teaching more than just regex its teaching set theory of regex, etc
@pilotavery
@pilotavery 4 ай бұрын
Okay I just got to say, anyone who looks like this obviously could be trusted to talk about computer science
@hcn6708
@hcn6708 5 ай бұрын
0:06 This made me check my emails
Python Regular Expressions - Computerphile
22:16
Computerphile
Рет қаралды 51 М.
Turing Machine Alternative (Counter Machines) - Computerphile
26:17
Computerphile
Рет қаралды 53 М.
小路飞第二集:小路飞很听话#海贼王  #路飞
00:48
路飞与唐舞桐
Рет қаралды 18 МЛН
Мы играли всей семьей
00:27
Даша Боровик
Рет қаралды 2,4 МЛН
Regular Expressions - Computerphile
17:19
Computerphile
Рет қаралды 237 М.
Graphs, Vectors and Machine Learning - Computerphile
23:08
Computerphile
Рет қаралды 92 М.
Cracking Enigma in 2021 - Computerphile
21:20
Computerphile
Рет қаралды 2,4 МЛН
Can programmers do math? What is a real number, really?
10:13
Sheafification of G
Рет қаралды 4,1 М.
Kernelless Kernel Programming (eBPF) - Computerphile
19:12
Computerphile
Рет қаралды 70 М.
Using Regular Expressions - Computerphile
11:39
Computerphile
Рет қаралды 124 М.
Where GREP Came From - Computerphile
10:07
Computerphile
Рет қаралды 930 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 329 М.
What's special about 288? - Numberphile
9:11
Numberphile
Рет қаралды 240 М.
Coding a Web Server in 25 Lines - Computerphile
17:49
Computerphile
Рет қаралды 315 М.
小路飞第二集:小路飞很听话#海贼王  #路飞
00:48
路飞与唐舞桐
Рет қаралды 18 МЛН