"Not everyone can do that, mainly because nobody knows how to write a Regex" TBH this got me off guard!! I even choked on my own saliva for a brief moment. In the end, still very educational and fantastic content as usual sir!
@FrederickMarcoux2 жыл бұрын
But it's so true. Nobody understands Regex.
@mandoschMUh2 жыл бұрын
That one is pure gold, I agree :D
@robertnull2 жыл бұрын
I'd say that most people can write a regular expression, but nobody can read it after than, including the author. It's easier to rewrite it than to understand it ;)
@GufNZ2 жыл бұрын
It's not true tho - I am very fluent in Regex, in various dialects, and for everyone else there's RegexBuddy.
@GufNZ2 жыл бұрын
@@FrederickMarcoux I do, very well.
@pilotboba2 жыл бұрын
I know this video wasn't about email... but... I think MS or some other people have determined there is no way to really verify an email with a regex. I think even MS changed it so they basically look for a single @ in the string to call it valid email format. The way to validate it is to send an email with a confirmation link. :)
@Mario-cr1ik2 жыл бұрын
This approach is the recommended way mentioned somewhere in the ms docs
@rezataba62042 жыл бұрын
What about the login situation? It's not common to send verification emails for logins.
@pilotboba2 жыл бұрын
@@rezataba6204 make the account pending untill the email has been verified. Pending accounts get no access.
@albe84792 жыл бұрын
@@rezataba6204 for login for an existing verified account it does not matter. If user with email as login exists, it's all ok. Maybe just to a length check.
@frossen1232 жыл бұрын
In the cybersecurity, abusing regex like this is a category of DoS attacks called a ReDoS
@rapzid3536 Жыл бұрын
Interesting, we call it the same thing outside the cybersecurity industry.
@KevinInPhoenix2 жыл бұрын
The is an old saying: If you have a problem that requires Regex then you now have two problems.
@Kommentierer2 жыл бұрын
Everything I see on your channel is super interesting and special. I never knew about those issues, but it is nice to know how to fix them. Sharing this with my colleagues.
@Denominus2 жыл бұрын
Excellent video and great advice. We've fallen prey to this twice in the past. First an attack directly against one of our APIs and then during Cloudflare's global outage due to a bad regex on their side (not our fault in this case, but still an outage). At the time we changed the regex, but there are only a handful of people who know how to do this confidently on a complex regex. I really like these "safety net" approaches.
@myroslavberlad44282 жыл бұрын
If you have a problem and you have a solution via Regex - now you have two problems
@LCTesla2 жыл бұрын
Do people really believe that or is it about making a cute "zinger" for the uncritical masses
@myroslavberlad44282 жыл бұрын
@@LCTesla yes, they do. And there are reasons for that. Hard to master, hard to debug, hard to update without breaking existing cases. It is not about the tool is bad. RegEx are actually powerfull instrument and there are nice places for its usages for sure, but it is hard to master. That is why, this saying was born
@LCTesla2 жыл бұрын
@@myroslavberlad4428 seems to just applying the KISS principle and restricting its use to appropriate use cases counters all that. The fact that a tool can be mis-used is case against the user, not the tool.
@myroslavberlad44282 жыл бұрын
@@LCTesla I do agree
@DuelingTreeMike2 жыл бұрын
Amazing find sir. I had no idea backtracking can be so dangerous. Thank you so much for creating this video.
@Hamza-Shreef2 жыл бұрын
this kinda thing has been really useful keep it up bro
@RayanMADAO2 жыл бұрын
that regex visualization site is really cool
@a13w12 жыл бұрын
That timeout option is quite cool when you know how long a normal regex will take to pass even under load. Plan to use it next time If makes sense when I write regex.
@antonmartyniuk2 жыл бұрын
nice call on the Regex problem!
@HalasterBlackmantle2 жыл бұрын
What's the downside to using NoBacktracking? Or rather, what would be a scenario where you would not want to use it?
@tmhchacham2 жыл бұрын
Very nice, as usual. Keep it up!
@anon02 жыл бұрын
ooh very cool i just started doing my phd on symbolic automata regex. glad to see it being relevant
@5hunt3r2 жыл бұрын
just a note: don't try to validate emails. It's nearly impossible to check if a mail is valid because so many special cases exist where it looks invalid but still is valid.
@nickchapsas2 жыл бұрын
The actual RFC regex is HUUUUGE
@humanesque2 жыл бұрын
Pretty much this; about the furthest you can go is checking if the domain exists; short of asking the receiving server if it will accept it. Useless checks like these are worse than the blindly copying code (which is what this RegEx is) and being surprised when it goes wrong.
@orterves2 жыл бұрын
My understanding is the best way to validate an email, is to send a verification email.
@nooftube25412 жыл бұрын
@@nickchapsas the real RFC Regex does not exist 😂 Because email like the domain cannot be parsed with regex. Actually there 2 normal solutions: either check @ sign and symbols existence before and after, and check that email is real. But the second option does not handle localhosts...
@EmptyGlass992 жыл бұрын
The only 100% guaranteed way to validate an email is to force the user to respond to an email sent to them i.e. sending a validation link or one-time validation code.
@magashkinson2 жыл бұрын
Very usefull video. Didn't know about this problem
@HadrielWonda2 жыл бұрын
Thanks for the insight nick
@FunWithBits2 жыл бұрын
Regex is a super powerful. I just wish people would format it a little bit more. Usually I see regex and it is just a line of characters. RegEx code can be much easier to read when there is spacing, multiple lines, using different indenting, adding comments, etc. Programmers don't put CSharp code in a single line with no spaces or comments but in regex this is accepted. (and because its hard to read it's impossible to see any performance issues it might have)
@chriskruining2 жыл бұрын
could you give me an example of such formatted regex? Because I always assumed it had to be a line of chars because every space and newline used to format is part of the query as far as I am aware. So I am curious how you do this, because I love clearly formatted code :D
@robertnull2 жыл бұрын
@@chriskruining There is a (?x) regex modifier than enables free-spacing mode, i.e. you can put spaces and newlines in your regex and they will be ignored, so you can make your expression multi-line, with each line containing a part that captures something significant. What's more, in this mode you can even use # comments at the end of each line!
@PeterK65022 жыл бұрын
@@robertnull True, but most input to be parsed is dependent on spaces, therefore this mode is useless in that situation (you could add comments however to increase readability).
@robertnull2 жыл бұрын
@@PeterK6502 Fret not, kind sir, for in free-spacing mode you just escape spaces with a backslash to make them part of the important expression and not part of the unimportant formatting :)
@PeterK65022 жыл бұрын
@@robertnull I did not know that, thanks for the info.
@IvanRandomDude2 жыл бұрын
Chapas flexing with 32 cores on us mortals @4:52
@AcidNeko2 жыл бұрын
and rtx 4090 and 128gb of ram :) it can run 100 instances of Rider, or 6 instances of Visual Studio 2022
@parkercrofts62102 жыл бұрын
Thank u for this ❤❤
@brianviktor82122 жыл бұрын
10 seconds to check if a given string is a valid e-mail? Sounds great! I mean I could do it with a little custom algorithm with ~0.001µs, but hey, it's regex!! We all love regex, don't we guys?! An E-Mail is setup like this: [text]@[domain].[ending] - Either split at the @ or get the index of it. If the result is !=2 elements in the array or -1 as index, you have either no @ or more than 1. Both should return "false" for the check. After that you get the last index of "." (apparently you can have multiple dots?). If it's -1, return false. Otherwise first part is the domain, the second part is the ending. Here you can verify if it's a valid e-mail address. It's really simple... I thought everybody would do this? Why even bother with Regex for this?
@brianviktor82122 жыл бұрын
@@billy65bob - Hmm yeah, that would require adjustments then. I've never seen those before though. In the worst case I'd have to loop through every char manually, but only once.
@matthewsheeran2 жыл бұрын
Brilliant!
@shingok2 жыл бұрын
I wonder if the Source Generator version was slower because it was compiled as debug. Maybe the dynamic compiled version generate optimized version regardless of compilation mode.
@carmineos2 жыл бұрын
DataAnnotations should be safe as RegularExpressionAttribute has a default timeout of 2s (at least from .NET 5, idk before)
@peledzohar2 жыл бұрын
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. ~ Jamie Zawinski
@tanglesites2 жыл бұрын
Excellent video as usual! I was wondering if anyone knows of any resources on how to scan Assemblies, I trying to build a setup for a minimal api project I am working on. I would like to pull all the classes that are using a particular interface or interfaces, register them in the IoC, so that it kind of auto-magically works. Do you Nick have any videos on this, or anyone know of anywhere I can look. Everything I have found are particular use cases. Sorry new to C#, I could figure it out I sure given enough time, just looking to speed up development a little and make the code a little more organized. Again great content. You have taught me more in the last month than I have learned in a year, and its more than beginner level, loving it.
@rbogdan89802 жыл бұрын
Thanks!
@masonwheeler65362 жыл бұрын
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
@PeterK65022 жыл бұрын
This kind of behaviour is frequently solved by using lazy capture instead of greedy capture, for example instead of using ()+ you should use ()+? I can see at least one greedy capture group in the shown expression. You should always try to avoid greedy captures, because of backtracking. Use ()*? or ()+? instead of ()* or ()+
@IAmFeO2x2 жыл бұрын
Great video as always! Personally I avoid Regex like the devil - it always takes so long to read and understand them in code.
@infeltk2 жыл бұрын
I use Regex for simple things. Everything has its purpose and limitations. And problem described in this episode is described on Microsoft leanr page net fundamentals - it is not a secret information.
@pmashurenko2 жыл бұрын
Well it worth to also read RFC 2821 and very quickly it will be get clear that regular expressions are bad tool for email validation - variety of options for names and domain names is so huge that it makes almost no sense to check beyond the point that there's "@" that isn't preceded by "\" character somewhere in the middle there.
@billy65bob2 жыл бұрын
Not even that is foolproof, as a @ inside quotes is also escaped. :) Granted, no one uses quotes in their email addresses, but it is allowed by the standard.
@casperhansen8262 жыл бұрын
I use Regex for small strings with simple use cases,
@nickhubbard36712 жыл бұрын
The best way to avoid issues with Regex is to not use it; and to avoid people that do use it!🙃
@cn-ml2 жыл бұрын
Thanks for the video, i already started using timeouts for regex wherever possible. However I don't fully understand what the non-backtracking option does. Why does it change the performance of the regex and what changes with the results?
@humanesque2 жыл бұрын
Non-Backtracking is basically lazy evaluation for your regular expressions, and it's implementation dependent. Unless you're using it for a throwaway match (instead of parsing, which is what regex is for), it will introduce weird, platform specific bugs and grief.
@cn-ml2 жыл бұрын
@@humanesque okay thanks, so it's basically unsafe but faster
@TribalBoss2 жыл бұрын
Few years ago I had to check if an HTML string contained any email addresses using Regex. Needless to say, I had to reboot the Azure server after pushing to production 😂
@coced2 жыл бұрын
6:36 I felt it
@djupstaten2328 Жыл бұрын
These patterns overuse capturing groups. (x) should be (?:x) more often than not, i.e. non-capturing groups. It makes a ton of difference in regards to bloat and lag.
@deepakkulkarni53562 жыл бұрын
Hey Nick, does SQL validation also increase exponentially with more records. Can you share any document link which proves the same?
@klekaelly2 жыл бұрын
I thought the same thing, SQL validation uses Regex a lot
@rumplin2 жыл бұрын
What a subtle way to show us that you have a RTX 4090 :)
@nickchapsas2 жыл бұрын
It’s the only reason I made the video
@zxopink2 жыл бұрын
What's the backdraws of nobacktracking?
@adassko60912 жыл бұрын
The option can’t be used in conjunction with RegexOptions.RightToLeft or RegexOptions.ECMAScript, and it doesn’t allow for the following constructs in the pattern: Atomic groups Backreferences Balancing groups Conditional Lookarounds Start anchors (\G)
@TonoNamnum2 жыл бұрын
Regex are not extremely hard lol. If you study them for about a week you should be able to create very powerful stuff. And also the secret to regexes in my opinion is to separate them in little chunks. When you study them you definitely learn what Nick is describing. I don't regret learning/using regexes. I also agree that they are not the most efficient option but if you understand what you are doing it saves a lot of time.
@ProtectedClassTest2 жыл бұрын
well, wait until you maintain other people's regex and come back here cryin hahaha
@RealMathewAdams2 жыл бұрын
You aren't coding for yourself, you are coding for the future. Regex can be unmaintainable if the use-case is non-trivial.
@TonoNamnum2 жыл бұрын
@@ProtectedClassTest the crying will be for people that do not understand them like you 🤣
@TonoNamnum2 жыл бұрын
Also this video encourages you to use them kzbin.info/www/bejne/iGallHt_gr-ArsU and that channel has a lot of subscribers. I guess the bottom line is you have to understand what you are doing just like everything else.
@nickandrews19852 жыл бұрын
My second biggest takeaway from this video is that Nick already has himself a RTX 4090 LOL
@billy65bob2 жыл бұрын
2:30 that is very some bad and inefficient code for that pattern. I'm guessing this tool is more to break down what the various regex implementations will do in an easy to understand manner, rather than to generate something actually worth using. I had looked at the specification of email addresses some time ago, I wanted to know what was valid, and how sub addressing was defined. Just the bits in common use are very complicated, and that's before you get to all the weird emails that no one sane would use, but are actually allowed by the standard, such as using quotes, escaping quotes inside the quotes, double @'s, non-ascii symbols, a % to set the route, sub addressing, etc. What the standard allows is insane, and trying to handle it via regex is a fool's errand. You're way better off writing a small program (or library) for the dedicated purpose of validating emails, by having it identify fragments, and validating them as defined.
@KanashimiMusic2 жыл бұрын
I find it funny that people keep saying "nobody knows how to write RegEx", because I don't find it TOO difficult. I mean it still takes me a while to do anything remotely complex, but like, it's manageable imo. Usually I will have RegExr open in another tab, since it contains a cheat sheet with the most important features, and it quickly lets me validate that my RegEx works the way it should
@KanashimiMusic2 жыл бұрын
@@karlfimm I really need to start using GitHub copilot.
@jerryjeremy40382 жыл бұрын
Wow that's a monster computer! Too many cores
@Victor_Marius2 жыл бұрын
It happened to froze my browser tab while testing a regex for matching file paths (in JS). It wasn't because of the length of the input but more like some spaces in the input. Why does it use backtracking? Can it be avoided with the format of the regex? If you use something as simple as /w0rd/ is it still going to use backtracking?
@gerakore89482 жыл бұрын
I've never decided to bother with regex. I see how it can be useful but its a clustered mess. Debugging and code maintenance would be a nightmare. I've done a lot of parsing and I doubt regex would be able to handle some of the inputs I've dealt with. For instance receipts with various formats printed that are cut off mid receipt and with inconsistent headers/footers scanned in low quality into an image format and placed into a pdf on which I would have to use OCR to extract the text. If you can imagine all the text is scrambled 5's tur in into S's 1's turn into I's etc. Sometimes characters are missing and you cant really rely on identifiable tags.
@nothingisreal63452 жыл бұрын
My rule of thumb is: if possible avoid regex. Hard to write. Extremely hard to read for others. If you use proper typed data you will not need it. And no matter how much effort you put into testing and thinking about edge cases: there are sittlich too many times it will fail. For many strings there are alternative ways to verify them: IP address, URI, file path… very often the need to regex is based an a bad design or due to have to connect to legacy systems.
@FunWithBits2 жыл бұрын
Thats odd. I wrote a longer comment and saw it in the comments but then it disappeared after a few minutes. Maybe the KZbin engine removed it after post-processing?
@nickchapsas2 жыл бұрын
KZbin is notorious for auto deleting comments especially in programming content. I don’t delete any comments so maybe try to repost it
@FunWithBits2 жыл бұрын
@@nickchapsas -I think that happed before on other channel's also. I wish youtube would be more careful on what they delete as it had nothing negative/bad. I'll repost. Thank you for the awsome channel - I learn so much here. I also like how you consider performance as a higher priority is most of your videos.
@zedmagdy2 жыл бұрын
I've tried this regex with php preg_match and it works fine I don't know if it's CSharp specific or what?
@attribute-46772 жыл бұрын
Which version is the NonBacktracking enum in? I'm targeting .Net framework 4.8 and it can't seem to find it (VS2022 automatically selects the language version, but even when forced to C# 8 it fails to find it).
@nickchapsas2 жыл бұрын
It’s .NET 7+
@attribute-46772 жыл бұрын
@@nickchapsas Ahh thanks! I misunderstood for C# 7.
@nooftube25412 жыл бұрын
I love that regex for email... and it doesn't work, because email cannot be parsed by regex.
@ToadieBog2 жыл бұрын
To me, Regex has always had the smell of something confusing to use, that I never really cared for. I'm looking forward to a replacement that humans can actually read.
@GZPlays_uno2 жыл бұрын
Soo you got the 4090, wondering what king of games you are playing :P
@janneforsell5252 жыл бұрын
Once again I've opened a PR during the video 😅
@mastermati7732 жыл бұрын
Validating emails is so ubiquitous that I wonder why tf Regex can't have a special symbol onyl for emails xD
@dmytrk2 жыл бұрын
In some cases, I write my own algorithm to scan the string, so I can actually debug that.
@McNerdius2 жыл бұрын
This is why i love the new regex source generators, being able to view and step through the C# equivalent is a great learning aid for me. I comprehend the basics of regex but if a string + nontrivial regex combo doesn't pass a unit test or whatever and i can't figure out why... i can step through that particular scenario now, yay !
@pilotboba2 жыл бұрын
Developer has a problem. Developer uses RegEx Developer now has 2 problems. :)
@ws_stelzi792 жыл бұрын
Well what is the saying "If you try to solve one problem with RegEx you have now two problems!"
@anonimzwx2 жыл бұрын
Regex is very easy to do tbh, the nonbacktracking option affects the result??
@codeforme88602 жыл бұрын
Does anyone acutely know how to use Regex
@ryanzwe2 жыл бұрын
Nope, I can't read or write it
@RougeEric2 жыл бұрын
I think it's fair to assume that anyone who's spent enough time with it can comfortably create some shorter regex and know what they're doing. But as soon as you start playing with complex nested systems and tons of lookahead stuff, even with significant practice, I have to test things extensively just to make sure they are doing what I think they're supposed to.
@geomorillo2 жыл бұрын
regwhat?
@stevejohnny11112 жыл бұрын
Nice RTX4090 & 128gb ram, bro!
@tarsala19952 жыл бұрын
Wut? You already have RTX 4090? 5:00
@gregcyrus27392 жыл бұрын
Hate regex! If you re-engineer foreign code you will never know what was intended to validate for. The LIKE operator is not that flexible but I could always validate everything (maybe with a sequence of LIKE-lines - and it was human readable)
@mirabilis2 жыл бұрын
No backtracking will break the regex.
@theMagos2 жыл бұрын
128 GB RAM? Yikes...
@FunWithBits2 жыл бұрын
Maybe for video editing?
@nickchapsas2 жыл бұрын
I wish I had a good reason….but I don’t….
@GaryJohnWalker12 жыл бұрын
Regex kills my brain so why not the computer too
@claudiufarcas2 жыл бұрын
Nice seeing you in person @dotnetdays. Keep doing great things! You're awesome!
@alirezanet2 жыл бұрын
Nick I know regex 😊 stop saying that if you don't man 😂 PS. just kidding ... I just can write regex but after a while only god knows what it is doing 😂😅
@StasAbrosimov2 жыл бұрын
If you decide to solve the problem with regular expressions... You now have two problems: the original problem and the regular expression. It's an old joke....
@abhishekbagchi60522 жыл бұрын
Clicked so fast
@Max_Jacoby2 жыл бұрын
nick@n.n.n.n.n.n.n.n.n.c should be a CPU benchmark.
@katerinaandrasko37552 жыл бұрын
how about - don't do regex?... i know crazy, but with emails check if there is "@" symbol if it is, cool, accept it. applications should try to send you that email to continue with whatever you want. want to register? cool - type in the verification code? want to recover your account? cool, click on the link in your email. at the end of the day that's what truly validates your email address - you get an email.
@AvenDonn2 жыл бұрын
Brb gonna go try signing up to everything with nick@n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.c
@jedimastermaniac2 жыл бұрын
lol. we still have to take into account for every action that the end user is gonan end up notories stupid bastard :D :P