How Regex in C# can kill your app

  Рет қаралды 31,829

Nick Chapsas

Nick Chapsas

Күн бұрын

Пікірлер: 129
@nidustash6964
@nidustash6964 Жыл бұрын
"Not everyone can do that, mainly because nobody knows how to write a Regex" TBH this got me off guard!! I even choked on my own saliva for a brief moment. In the end, still very educational and fantastic content as usual sir!
@FrederickMarcoux
@FrederickMarcoux Жыл бұрын
But it's so true. Nobody understands Regex.
@mandoschMUh
@mandoschMUh Жыл бұрын
That one is pure gold, I agree :D
@robertnull
@robertnull Жыл бұрын
I'd say that most people can write a regular expression, but nobody can read it after than, including the author. It's easier to rewrite it than to understand it ;)
@GufNZ
@GufNZ Жыл бұрын
It's not true tho - I am very fluent in Regex, in various dialects, and for everyone else there's RegexBuddy.
@GufNZ
@GufNZ Жыл бұрын
@@FrederickMarcoux I do, very well.
@pilotboba
@pilotboba Жыл бұрын
I know this video wasn't about email... but... I think MS or some other people have determined there is no way to really verify an email with a regex. I think even MS changed it so they basically look for a single @ in the string to call it valid email format. The way to validate it is to send an email with a confirmation link. :)
@Mario-cr1ik
@Mario-cr1ik Жыл бұрын
This approach is the recommended way mentioned somewhere in the ms docs
@rezataba6204
@rezataba6204 Жыл бұрын
What about the login situation? It's not common to send verification emails for logins.
@pilotboba
@pilotboba Жыл бұрын
@@rezataba6204 make the account pending untill the email has been verified. Pending accounts get no access.
@albe8479
@albe8479 Жыл бұрын
@@rezataba6204 for login for an existing verified account it does not matter. If user with email as login exists, it's all ok. Maybe just to a length check.
@frossen123
@frossen123 Жыл бұрын
In the cybersecurity, abusing regex like this is a category of DoS attacks called a ReDoS
@rapzid3536
@rapzid3536 Жыл бұрын
Interesting, we call it the same thing outside the cybersecurity industry.
@KevinInPhoenix
@KevinInPhoenix Жыл бұрын
The is an old saying: If you have a problem that requires Regex then you now have two problems.
@myroslavberlad4428
@myroslavberlad4428 Жыл бұрын
If you have a problem and you have a solution via Regex - now you have two problems
@LCTesla
@LCTesla Жыл бұрын
Do people really believe that or is it about making a cute "zinger" for the uncritical masses
@myroslavberlad4428
@myroslavberlad4428 Жыл бұрын
@@LCTesla yes, they do. And there are reasons for that. Hard to master, hard to debug, hard to update without breaking existing cases. It is not about the tool is bad. RegEx are actually powerfull instrument and there are nice places for its usages for sure, but it is hard to master. That is why, this saying was born
@LCTesla
@LCTesla Жыл бұрын
@@myroslavberlad4428 seems to just applying the KISS principle and restricting its use to appropriate use cases counters all that. The fact that a tool can be mis-used is case against the user, not the tool.
@myroslavberlad4428
@myroslavberlad4428 Жыл бұрын
@@LCTesla I do agree
@Kommentierer
@Kommentierer Жыл бұрын
Everything I see on your channel is super interesting and special. I never knew about those issues, but it is nice to know how to fix them. Sharing this with my colleagues.
@RayanMADAO
@RayanMADAO Жыл бұрын
that regex visualization site is really cool
@Denominus
@Denominus Жыл бұрын
Excellent video and great advice. We've fallen prey to this twice in the past. First an attack directly against one of our APIs and then during Cloudflare's global outage due to a bad regex on their side (not our fault in this case, but still an outage). At the time we changed the regex, but there are only a handful of people who know how to do this confidently on a complex regex. I really like these "safety net" approaches.
@HalasterBlackmantle
@HalasterBlackmantle Жыл бұрын
What's the downside to using NoBacktracking? Or rather, what would be a scenario where you would not want to use it?
@IvanRandomDude
@IvanRandomDude Жыл бұрын
Chapas flexing with 32 cores on us mortals @4:52
@AcidNeko
@AcidNeko Жыл бұрын
and rtx 4090 and 128gb of ram :) it can run 100 instances of Rider, or 6 instances of Visual Studio 2022
@Hamza-Shreef
@Hamza-Shreef Жыл бұрын
this kinda thing has been really useful keep it up bro
@DuelingTreeMike
@DuelingTreeMike Жыл бұрын
Amazing find sir. I had no idea backtracking can be so dangerous. Thank you so much for creating this video.
@FunWithBits
@FunWithBits Жыл бұрын
Regex is a super powerful. I just wish people would format it a little bit more. Usually I see regex and it is just a line of characters. RegEx code can be much easier to read when there is spacing, multiple lines, using different indenting, adding comments, etc. Programmers don't put CSharp code in a single line with no spaces or comments but in regex this is accepted. (and because its hard to read it's impossible to see any performance issues it might have)
@chriskruining
@chriskruining Жыл бұрын
could you give me an example of such formatted regex? Because I always assumed it had to be a line of chars because every space and newline used to format is part of the query as far as I am aware. So I am curious how you do this, because I love clearly formatted code :D
@robertnull
@robertnull Жыл бұрын
@@chriskruining There is a (?x) regex modifier than enables free-spacing mode, i.e. you can put spaces and newlines in your regex and they will be ignored, so you can make your expression multi-line, with each line containing a part that captures something significant. What's more, in this mode you can even use # comments at the end of each line!
@PeterK6502
@PeterK6502 Жыл бұрын
@@robertnull True, but most input to be parsed is dependent on spaces, therefore this mode is useless in that situation (you could add comments however to increase readability).
@robertnull
@robertnull Жыл бұрын
@@PeterK6502 Fret not, kind sir, for in free-spacing mode you just escape spaces with a backslash to make them part of the important expression and not part of the unimportant formatting :)
@PeterK6502
@PeterK6502 Жыл бұрын
@@robertnull I did not know that, thanks for the info.
@a13w1
@a13w1 Жыл бұрын
That timeout option is quite cool when you know how long a normal regex will take to pass even under load. Plan to use it next time If makes sense when I write regex.
@5hunt3r
@5hunt3r Жыл бұрын
just a note: don't try to validate emails. It's nearly impossible to check if a mail is valid because so many special cases exist where it looks invalid but still is valid.
@nickchapsas
@nickchapsas Жыл бұрын
The actual RFC regex is HUUUUGE
@humanesque
@humanesque Жыл бұрын
Pretty much this; about the furthest you can go is checking if the domain exists; short of asking the receiving server if it will accept it. Useless checks like these are worse than the blindly copying code (which is what this RegEx is) and being surprised when it goes wrong.
@orterves
@orterves Жыл бұрын
My understanding is the best way to validate an email, is to send a verification email.
@nooftube2541
@nooftube2541 Жыл бұрын
@@nickchapsas the real RFC Regex does not exist 😂 Because email like the domain cannot be parsed with regex. Actually there 2 normal solutions: either check @ sign and symbols existence before and after, and check that email is real. But the second option does not handle localhosts...
@EmptyGlass99
@EmptyGlass99 Жыл бұрын
The only 100% guaranteed way to validate an email is to force the user to respond to an email sent to them i.e. sending a validation link or one-time validation code.
@brianviktor8212
@brianviktor8212 Жыл бұрын
10 seconds to check if a given string is a valid e-mail? Sounds great! I mean I could do it with a little custom algorithm with ~0.001µs, but hey, it's regex!! We all love regex, don't we guys?! An E-Mail is setup like this: [text]@[domain].[ending] - Either split at the @ or get the index of it. If the result is !=2 elements in the array or -1 as index, you have either no @ or more than 1. Both should return "false" for the check. After that you get the last index of "." (apparently you can have multiple dots?). If it's -1, return false. Otherwise first part is the domain, the second part is the ending. Here you can verify if it's a valid e-mail address. It's really simple... I thought everybody would do this? Why even bother with Regex for this?
@brianviktor8212
@brianviktor8212 Жыл бұрын
@@billy65bob - Hmm yeah, that would require adjustments then. I've never seen those before though. In the worst case I'd have to loop through every char manually, but only once.
@tmhchacham
@tmhchacham Жыл бұрын
Very nice, as usual. Keep it up!
@antonmartyniuk
@antonmartyniuk Жыл бұрын
nice call on the Regex problem!
@anon0
@anon0 Жыл бұрын
ooh very cool i just started doing my phd on symbolic automata regex. glad to see it being relevant
@magashkinson
@magashkinson Жыл бұрын
Very usefull video. Didn't know about this problem
@HadrielWonda
@HadrielWonda Жыл бұрын
Thanks for the insight nick
@peledzohar
@peledzohar Жыл бұрын
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. ~ Jamie Zawinski
@nickhubbard3671
@nickhubbard3671 Жыл бұрын
The best way to avoid issues with Regex is to not use it; and to avoid people that do use it!🙃
@parkercrofts6210
@parkercrofts6210 Жыл бұрын
Thank u for this ❤❤
@matthewsheeran
@matthewsheeran Жыл бұрын
Brilliant!
@TribalBoss
@TribalBoss Жыл бұрын
Few years ago I had to check if an HTML string contained any email addresses using Regex. Needless to say, I had to reboot the Azure server after pushing to production 😂
@shingok
@shingok Жыл бұрын
I wonder if the Source Generator version was slower because it was compiled as debug. Maybe the dynamic compiled version generate optimized version regardless of compilation mode.
@rbogdan8980
@rbogdan8980 Жыл бұрын
Thanks!
@IAmFeO2x
@IAmFeO2x Жыл бұрын
Great video as always! Personally I avoid Regex like the devil - it always takes so long to read and understand them in code.
@infeltk
@infeltk Жыл бұрын
I use Regex for simple things. Everything has its purpose and limitations. And problem described in this episode is described on Microsoft leanr page net fundamentals - it is not a secret information.
@rumplin
@rumplin Жыл бұрын
What a subtle way to show us that you have a RTX 4090 :)
@nickchapsas
@nickchapsas Жыл бұрын
It’s the only reason I made the video
@carmineos
@carmineos Жыл бұрын
DataAnnotations should be safe as RegularExpressionAttribute has a default timeout of 2s (at least from .NET 5, idk before)
@coced
@coced Жыл бұрын
6:36 I felt it
@TonoNamnum
@TonoNamnum Жыл бұрын
Regex are not extremely hard lol. If you study them for about a week you should be able to create very powerful stuff. And also the secret to regexes in my opinion is to separate them in little chunks. When you study them you definitely learn what Nick is describing. I don't regret learning/using regexes. I also agree that they are not the most efficient option but if you understand what you are doing it saves a lot of time.
@ProtectedClassTest
@ProtectedClassTest Жыл бұрын
well, wait until you maintain other people's regex and come back here cryin hahaha
@RealMathewAdams
@RealMathewAdams Жыл бұрын
You aren't coding for yourself, you are coding for the future. Regex can be unmaintainable if the use-case is non-trivial.
@TonoNamnum
@TonoNamnum Жыл бұрын
@@ProtectedClassTest the crying will be for people that do not understand them like you 🤣
@TonoNamnum
@TonoNamnum Жыл бұрын
Also this video encourages you to use them kzbin.info/www/bejne/iGallHt_gr-ArsU and that channel has a lot of subscribers. I guess the bottom line is you have to understand what you are doing just like everything else.
@tanglesites
@tanglesites Жыл бұрын
Excellent video as usual! I was wondering if anyone knows of any resources on how to scan Assemblies, I trying to build a setup for a minimal api project I am working on. I would like to pull all the classes that are using a particular interface or interfaces, register them in the IoC, so that it kind of auto-magically works. Do you Nick have any videos on this, or anyone know of anywhere I can look. Everything I have found are particular use cases. Sorry new to C#, I could figure it out I sure given enough time, just looking to speed up development a little and make the code a little more organized. Again great content. You have taught me more in the last month than I have learned in a year, and its more than beginner level, loving it.
@masonwheeler6536
@masonwheeler6536 Жыл бұрын
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
@KanashimiMusic
@KanashimiMusic Жыл бұрын
I find it funny that people keep saying "nobody knows how to write RegEx", because I don't find it TOO difficult. I mean it still takes me a while to do anything remotely complex, but like, it's manageable imo. Usually I will have RegExr open in another tab, since it contains a cheat sheet with the most important features, and it quickly lets me validate that my RegEx works the way it should
@KanashimiMusic
@KanashimiMusic Жыл бұрын
@@karlfimm I really need to start using GitHub copilot.
@pilotboba
@pilotboba Жыл бұрын
Developer has a problem. Developer uses RegEx Developer now has 2 problems. :)
@PeterK6502
@PeterK6502 Жыл бұрын
This kind of behaviour is frequently solved by using lazy capture instead of greedy capture, for example instead of using ()+ you should use ()+? I can see at least one greedy capture group in the shown expression. You should always try to avoid greedy captures, because of backtracking. Use ()*? or ()+? instead of ()* or ()+
@pmashurenko
@pmashurenko Жыл бұрын
Well it worth to also read RFC 2821 and very quickly it will be get clear that regular expressions are bad tool for email validation - variety of options for names and domain names is so huge that it makes almost no sense to check beyond the point that there's "@" that isn't preceded by "\" character somewhere in the middle there.
@billy65bob
@billy65bob Жыл бұрын
Not even that is foolproof, as a @ inside quotes is also escaped. :) Granted, no one uses quotes in their email addresses, but it is allowed by the standard.
@casperhansen826
@casperhansen826 Жыл бұрын
I use Regex for small strings with simple use cases,
@djupstaten2328
@djupstaten2328 Жыл бұрын
These patterns overuse capturing groups. (x) should be (?:x) more often than not, i.e. non-capturing groups. It makes a ton of difference in regards to bloat and lag.
@billy65bob
@billy65bob Жыл бұрын
2:30 that is very some bad and inefficient code for that pattern. I'm guessing this tool is more to break down what the various regex implementations will do in an easy to understand manner, rather than to generate something actually worth using. I had looked at the specification of email addresses some time ago, I wanted to know what was valid, and how sub addressing was defined. Just the bits in common use are very complicated, and that's before you get to all the weird emails that no one sane would use, but are actually allowed by the standard, such as using quotes, escaping quotes inside the quotes, double @'s, non-ascii symbols, a % to set the route, sub addressing, etc. What the standard allows is insane, and trying to handle it via regex is a fool's errand. You're way better off writing a small program (or library) for the dedicated purpose of validating emails, by having it identify fragments, and validating them as defined.
@zxopink
@zxopink Жыл бұрын
What's the backdraws of nobacktracking?
@adassko6091
@adassko6091 Жыл бұрын
The option can’t be used in conjunction with RegexOptions.RightToLeft or RegexOptions.ECMAScript, and it doesn’t allow for the following constructs in the pattern: Atomic groups Backreferences Balancing groups Conditional Lookarounds Start anchors (\G)
@janneforsell525
@janneforsell525 Жыл бұрын
Once again I've opened a PR during the video 😅
@cn-ml
@cn-ml Жыл бұрын
Thanks for the video, i already started using timeouts for regex wherever possible. However I don't fully understand what the non-backtracking option does. Why does it change the performance of the regex and what changes with the results?
@humanesque
@humanesque Жыл бұрын
Non-Backtracking is basically lazy evaluation for your regular expressions, and it's implementation dependent. Unless you're using it for a throwaway match (instead of parsing, which is what regex is for), it will introduce weird, platform specific bugs and grief.
@cn-ml
@cn-ml Жыл бұрын
@@humanesque okay thanks, so it's basically unsafe but faster
@jerryjeremy4038
@jerryjeremy4038 Жыл бұрын
Wow that's a monster computer! Too many cores
@nooftube2541
@nooftube2541 Жыл бұрын
I love that regex for email... and it doesn't work, because email cannot be parsed by regex.
@deepakkulkarni5356
@deepakkulkarni5356 Жыл бұрын
Hey Nick, does SQL validation also increase exponentially with more records. Can you share any document link which proves the same?
@klekaelly
@klekaelly Жыл бұрын
I thought the same thing, SQL validation uses Regex a lot
@mastermati773
@mastermati773 Жыл бұрын
Validating emails is so ubiquitous that I wonder why tf Regex can't have a special symbol onyl for emails xD
@nickandrews1985
@nickandrews1985 Жыл бұрын
My second biggest takeaway from this video is that Nick already has himself a RTX 4090 LOL
@gerakore8948
@gerakore8948 Жыл бұрын
I've never decided to bother with regex. I see how it can be useful but its a clustered mess. Debugging and code maintenance would be a nightmare. I've done a lot of parsing and I doubt regex would be able to handle some of the inputs I've dealt with. For instance receipts with various formats printed that are cut off mid receipt and with inconsistent headers/footers scanned in low quality into an image format and placed into a pdf on which I would have to use OCR to extract the text. If you can imagine all the text is scrambled 5's tur in into S's 1's turn into I's etc. Sometimes characters are missing and you cant really rely on identifiable tags.
@Victor_Marius
@Victor_Marius Жыл бұрын
It happened to froze my browser tab while testing a regex for matching file paths (in JS). It wasn't because of the length of the input but more like some spaces in the input. Why does it use backtracking? Can it be avoided with the format of the regex? If you use something as simple as /w0rd/ is it still going to use backtracking?
@nothingisreal6345
@nothingisreal6345 Жыл бұрын
My rule of thumb is: if possible avoid regex. Hard to write. Extremely hard to read for others. If you use proper typed data you will not need it. And no matter how much effort you put into testing and thinking about edge cases: there are sittlich too many times it will fail. For many strings there are alternative ways to verify them: IP address, URI, file path… very often the need to regex is based an a bad design or due to have to connect to legacy systems.
@speakoutloud7293
@speakoutloud7293 Жыл бұрын
Soo you got the 4090, wondering what king of games you are playing :P
@ToadieBog
@ToadieBog Жыл бұрын
To me, Regex has always had the smell of something confusing to use, that I never really cared for. I'm looking forward to a replacement that humans can actually read.
@zedmagdy
@zedmagdy Жыл бұрын
I've tried this regex with php preg_match and it works fine I don't know if it's CSharp specific or what?
@FunWithBits
@FunWithBits Жыл бұрын
Thats odd. I wrote a longer comment and saw it in the comments but then it disappeared after a few minutes. Maybe the KZbin engine removed it after post-processing?
@nickchapsas
@nickchapsas Жыл бұрын
KZbin is notorious for auto deleting comments especially in programming content. I don’t delete any comments so maybe try to repost it
@FunWithBits
@FunWithBits Жыл бұрын
@@nickchapsas -I think that happed before on other channel's also. I wish youtube would be more careful on what they delete as it had nothing negative/bad. I'll repost. Thank you for the awsome channel - I learn so much here. I also like how you consider performance as a higher priority is most of your videos.
@attribute-4677
@attribute-4677 Жыл бұрын
Which version is the NonBacktracking enum in? I'm targeting .Net framework 4.8 and it can't seem to find it (VS2022 automatically selects the language version, but even when forced to C# 8 it fails to find it).
@nickchapsas
@nickchapsas Жыл бұрын
It’s .NET 7+
@attribute-4677
@attribute-4677 Жыл бұрын
@@nickchapsas Ahh thanks! I misunderstood for C# 7.
@ws_stelzi79
@ws_stelzi79 Жыл бұрын
Well what is the saying "If you try to solve one problem with RegEx you have now two problems!"
@dmytrk
@dmytrk Жыл бұрын
In some cases, I write my own algorithm to scan the string, so I can actually debug that.
@McNerdius
@McNerdius Жыл бұрын
This is why i love the new regex source generators, being able to view and step through the C# equivalent is a great learning aid for me. I comprehend the basics of regex but if a string + nontrivial regex combo doesn't pass a unit test or whatever and i can't figure out why... i can step through that particular scenario now, yay !
@anonimzwx
@anonimzwx Жыл бұрын
Regex is very easy to do tbh, the nonbacktracking option affects the result??
@jspesh
@jspesh Жыл бұрын
Nice RTX4090 & 128gb ram, bro!
@codeforme8860
@codeforme8860 Жыл бұрын
Does anyone acutely know how to use Regex
@ryanzwe
@ryanzwe Жыл бұрын
Nope, I can't read or write it
@RougeEric
@RougeEric Жыл бұрын
I think it's fair to assume that anyone who's spent enough time with it can comfortably create some shorter regex and know what they're doing. But as soon as you start playing with complex nested systems and tons of lookahead stuff, even with significant practice, I have to test things extensively just to make sure they are doing what I think they're supposed to.
@geomorillo
@geomorillo Жыл бұрын
regwhat?
@mirabilis
@mirabilis Жыл бұрын
No backtracking will break the regex.
@GaryJohnWalker1
@GaryJohnWalker1 Жыл бұрын
Regex kills my brain so why not the computer too
@tarsala1995
@tarsala1995 Жыл бұрын
Wut? You already have RTX 4090? 5:00
@gregcyrus2739
@gregcyrus2739 Жыл бұрын
Hate regex! If you re-engineer foreign code you will never know what was intended to validate for. The LIKE operator is not that flexible but I could always validate everything (maybe with a sequence of LIKE-lines - and it was human readable)
@theMagos
@theMagos Жыл бұрын
128 GB RAM? Yikes...
@FunWithBits
@FunWithBits Жыл бұрын
Maybe for video editing?
@nickchapsas
@nickchapsas Жыл бұрын
I wish I had a good reason….but I don’t….
@StasAbrosimov
@StasAbrosimov Жыл бұрын
If you decide to solve the problem with regular expressions... You now have two problems: the original problem and the regular expression. It's an old joke....
@abhishekbagchi6052
@abhishekbagchi6052 Жыл бұрын
Clicked so fast
@alirezanet
@alirezanet Жыл бұрын
Nick I know regex 😊 stop saying that if you don't man 😂 PS. just kidding ... I just can write regex but after a while only god knows what it is doing 😂😅
@claudiufarcas
@claudiufarcas Жыл бұрын
Nice seeing you in person @dotnetdays. Keep doing great things! You're awesome!
@Max_Jacoby
@Max_Jacoby Жыл бұрын
nick@n.n.n.n.n.n.n.n.n.c should be a CPU benchmark.
@katerinaandrasko3755
@katerinaandrasko3755 Жыл бұрын
how about - don't do regex?... i know crazy, but with emails check if there is "@" symbol if it is, cool, accept it. applications should try to send you that email to continue with whatever you want. want to register? cool - type in the verification code? want to recover your account? cool, click on the link in your email. at the end of the day that's what truly validates your email address - you get an email.
@AvenDonn
@AvenDonn Жыл бұрын
Brb gonna go try signing up to everything with nick@n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.c
@jedimastermaniac
@jedimastermaniac Жыл бұрын
lol. we still have to take into account for every action that the end user is gonan end up notories stupid bastard :D :P
Every single feature added in C# 11
27:07
Nick Chapsas
Рет қаралды 111 М.
Writing C# without allocating ANY memory
19:36
Nick Chapsas
Рет қаралды 148 М.
Running With Bigger And Bigger Lunchlys
00:18
MrBeast
Рет қаралды 87 МЛН
Inside Out 2: BABY JOY VS SHIN SONIC 3
00:19
AnythingAlexia
Рет қаралды 8 МЛН
Cute
00:16
Oyuncak Avı
Рет қаралды 12 МЛН
Settling the Biggest Await Async Debate in .NET
14:47
Nick Chapsas
Рет қаралды 144 М.
Stop using the Process class for CLI interactions in .NET
15:04
Nick Chapsas
Рет қаралды 52 М.
Using EF Core’s Coolest Feature to Audit in .NET
26:06
Nick Chapsas
Рет қаралды 34 М.
Testing in .NET is About to Change
12:54
Nick Chapsas
Рет қаралды 62 М.
What is Span in C# and why you should be using it
15:15
Nick Chapsas
Рет қаралды 254 М.
Intro to Regular Expressions - How to use Regex in C#
55:52
IAmTimCorey
Рет қаралды 39 М.
Catastrophic Backtracking ‒ When Regular Expressions Explode HD
22:14
The fastest way to iterate a List in C# is NOT what you think
13:42
Nick Chapsas
Рет қаралды 157 М.
Stop Using FirstOrDefault in .NET! | Code Cop #021
12:54
Nick Chapsas
Рет қаралды 69 М.
8 await async mistakes that you SHOULD avoid in .NET
21:13
Nick Chapsas
Рет қаралды 313 М.
Running With Bigger And Bigger Lunchlys
00:18
MrBeast
Рет қаралды 87 МЛН