Undefined Behavior in C++: What Every Programmer Should Know and Fear - Fedor Pikus - CppCon 2023

  Рет қаралды 23,045

CppCon

CppCon

3 ай бұрын

cppcon.org/
---
Undefined Behavior in C++: What Every Programmer Should Know and Fear - Fedor Pikus - CppCon 2023
github.com/CppCon/CppCon2023
This talk is about You-Know-What, the thing in our programs we don’t mention by name.
What is this undefined behavior every C++ programmer has grown to fear? Just as importantly, what it isn’t? If it’s so scary, why is it allowed to exist in the language?
The aim of this talk is to approach undefined behavior rationally: without fear but with due caution. We will learn why the standard allows undefined behavior in the first place, what actually happens when a program does something the standard calls “undefined,” and why it must be taken seriously even when the program “works as-is.” As this is a practical talk, we will have live demos of programs with undefined behavior and sometimes unexpected outcomes (if you are very lucky, you might see demons fly out of the speaker’s nose). Also, as this is a practical talk, we will learn how to detect undefined behavior in one’s programs, and how to take advantage of the undefined behavior to gain better performance.
---
Fedor Pikus
Fedor G Pikus is a Technical Fellow and head of the Advanced Projects Team in Siemens Digital Industries Software. His responsibilities include planning the long-term technical direction of Calibre products, directing and training the engineers who work on these products, design, and architecture of the software, and researching new design and software technologies.
His earlier positions included a Chief Scientist at Mentor Graphics (acquired by Siemens Software), a Senior Software Engineer at Google, and a Chief Software Architect for Calibre PERC, LVS, and DFM at Mentor Graphics. He joined Mentor Graphics in 1998 when he made a switch from academic research in computational physics to the software industry.
Fedor is a recognized expert in high-performance computing and C++. He is the author of two books on C++ and software design, has presented his works at CPPNow, CPPCon, SD West, DesignCon, and in software development journals, and is also an O'Reilly author. Fedor has over 30 patents and over 100 papers and conference presentations on physics, EDA, software design, and C++ language.
---
Videos Filmed & Edited by Bash Films: www.BashFilms.com
KZbin Channel Managed by Digital Medium Ltd: events.digital-medium.co.uk
---
Registration for CppCon: cppcon.org/registration/
#cppcon #cppprogramming #cpp

Пікірлер: 52
@ArminHasitzka
@ArminHasitzka 4 ай бұрын
Love Fedor's unique presentation style, and very important talk to listen to for everyone!
@dickheadrecs
@dickheadrecs 3 ай бұрын
GCC “wait forever? you’re the boss!” Clang “Don’t be ridiculous”
@temdisponivel
@temdisponivel 3 ай бұрын
I don't think the compiler (current or future) would optimize the original g() to return true. The compiler would optimize it to: g(int i) { if i == INT_MAX return false; else return true; } The compiler can only assume that "i" is not INT_MAX inside f(), not inside g() as in the g() there was an explicit check for the INT_MAX case. What am I missing?
@bernb
@bernb 3 ай бұрын
That exactly what I got as a result when I tested it with latest gcc and clang.
@francoisandrieux7954
@francoisandrieux7954 2 ай бұрын
You are correct. I suspect the example used was oversimplified from an originally correct example. Calling `g(INT_MAX)` is required to return `false`.
@DMStern
@DMStern Ай бұрын
I suspect there's a typo in the slides, and that the first line of g() was supposed to read "if (i == INT_MAX) return true;"
@JacksonBockus
@JacksonBockus 28 күн бұрын
I think the intention was to say that f(i) always returns true, since any time the result is defined it is true, and the compiler doesn't care about what happens if the result is undefined.
@feisty-trog-12345
@feisty-trog-12345 3 ай бұрын
In the example shown at 5:00, `g(INT_MAX)` is blatantly not UB and must return false. That supposed second half of the optimization that no compiler currently performs would be a miscompilation. Something clearly went wrong with the presentation here, I'd expect anyone giving a talk on UB to notice that the example they're currently testing on multiple compilers simply doesn't cause any UB.
@gamekiller0123
@gamekiller0123 3 ай бұрын
Isn't the first example wrong? If i is INT_MAX, then there is no undefined behavior because the else branch won't be taken. If I is any other value, then calling f is fine.
@sjswitzer1
@sjswitzer1 3 ай бұрын
The classic example of a contract that’s too expensive to validate is that a binary search must be given sorted data. Validating the ordering is as expensive as a linear search, making the binary search pointless.
@gfasterOS
@gfasterOS 3 ай бұрын
I'm still not convinced that making g never return false in the first example is a valid optimization. f assumes the the input will never be INT_MAX, but the check before it diverts control flow and strictly dominates the call. If that were valid, that would allow the compiler to optimize out all checks for UB, including null pointer checks. It would be literally impossible to guard against any operation capable of causing UB.
@LucasSantos-ji1zp
@LucasSantos-ji1zp 3 ай бұрын
The code at 6:05 does not have undefined behavior. The branch checks if it is legal to call f, and, if so, calls it. If this were undefined behavior, it would be impossible to prevent undefined behavior in any program. The generated machined is correct, just checked on godbolt (and it always will be, unless the compiler has a bug).
@bernb
@bernb 3 ай бұрын
Thanks for pointing it out. That's what I wondered. "If the check gets optimized away, how do you even avoid UB?".
@Kriby88
@Kriby88 3 ай бұрын
Fedor is a fantastic speaker, always love to see talks from him!
@djouze00
@djouze00 3 ай бұрын
This topic is always interesting! Thank you!!
@paradox8425
@paradox8425 3 ай бұрын
Great talk! UB effecting previous code and what happens with debugger was truly eye opening
@AlfredoCorrea
@AlfredoCorrea 3 ай бұрын
23:55 at the end of slide 29, it is clear that at some point Fedor, exchanged what he meant for x, he really meant it for y.
@Roibarkan
@Roibarkan 3 ай бұрын
23:56 [slide 29] I think Fedor meant that a compiler might have optimized this code in case the y variable was declared to be “const int” AND the call f(x) would have been changed to f(y)
@Bolpat
@Bolpat 3 ай бұрын
23:30 I don’t think it’s UB to cast away const, it’s UB to cast away const _and_ change the value. A common pattern for getters is to overload a non-static member function of a class on the const’ness of the instance and in the mutable version, you cast the object to const, call the member function for const, and cast away const of the result. That is valid as the original object wasn’t const. An example could be std::vector::operator[]. For a given index, it returns a reference to the exact same integer only typed const if the vector was const. The mutable version of the function doesn’t actually mutate anything, it differs from the const version only by preserving the non-const’ness in the type system.
@aniketbisht2823
@aniketbisht2823 2 ай бұрын
The first example does not have any UB whatsoever. No preconditions of any invoked expression is being violated. If "i" equals INT_MAX then g() returns true otherwise f() is invoked given that (i+1) will not overflow and hence a valid expression. The compiler knows that (i+1) is always greater than "i" (because signed integer overflow would mean UB), therefore, it simplifies call to f() to returning true. With that the invocation of g() is simplified to (i != INT_MAX). Now the optimization that Fedor is talking about might be triggered if somehow f() is called unconditionally because then the compiler can assume that (i != INT_MAX) and return true. Something like this ... bool f(int i) { return i+1 > i;} bool g(int i) { f(i); return i != INT_MAX; }
@Digrient
@Digrient 3 ай бұрын
Thanks for that talk, very interesting! I’m still not entirely clear though on why compilers don’t emit more warnings when they optimize away code based on the assumption of the absence of undefined behavior, when it in fact seems much more likely that the programmer has intended something else or made a mistake.
@johnmcleodvii
@johnmcleodvii 3 ай бұрын
I've written a line of code with undefined behavior that destroyed my hard disk twice. The second time I was single stepping through the code and went one line too far. Long = short * short without casts. So the 2 shorts could multiply to be too large for a short. Sometimes that would set the sign bit. Next step is to seek from the start of the file, when write some data. That hit the engineering sectors of the disk.
@n0ame1u1
@n0ame1u1 3 ай бұрын
I still don't understand how the example with f and g is undefined behavior. As written, f is never called if i is INT_MAX, and f is valid for all other i, so there is no case in which UB happens. What am I missing?
@n0ame1u1
@n0ame1u1 3 ай бұрын
I also couldn't get the optimization to happen on godbolt
@austinsiu2351
@austinsiu2351 3 ай бұрын
9:53 I remember having to put `asm("nop");` inside the while loop to state that it is intentional. I had a ncurses program that i simply want to make sure it inits the screen properly. I put the empty infinite loop and clang decides to remove it.
@polmarcetsarda
@polmarcetsarda 3 ай бұрын
Great presentation! I just wanted to point out that the code snippet about integer overflow is not true; otherwise pointer guards would be useless. I'm sure that was a small mistake while changing the code to fit in the slides, and this does not change at all the point of the presentation
@Bolpat
@Bolpat 3 ай бұрын
IIRC, I remember your name from the conspiracy talk. That one was hilarious and I hope this one has gets some good laughs out of me as well. The topic most definitely allows for it.
@TerjeMathisen
@TerjeMathisen 2 ай бұрын
UB is a good (maybe even sufficient?) reason to switch to Rust. I have written C since around 1983, C++ a bit later, and in the beginning C was in fact defined to be a "machine-independent, portable assembler replacement", and early compilers did just that, i.e. they would output the expected asm for pretty much all constructs. In that world incrementing an int until it wrapped around was perfectly fine. The same goes for the classic pointer check to make sure it was non-NULL, it would always be there in the compiled program unless the code was inlined and the compiler could see that in this particular instance, it could not be NULL. What happened a lot later was that C was coopted to be this compiler research exercise where someone/some group thought it was a great idea to use UB to silently remove a lot of code, even though the actual speed improvements for real production code have been shown to be trivial. As Fedor stated, some sanity is returning, in the form of (too slowly) moving stuff from UB to Implementation Defined which does at least obey the least surprise principle.
@GeorgeTsiros
@GeorgeTsiros 3 ай бұрын
omg Fedor 🥰
@X_Baron
@X_Baron 3 ай бұрын
Does Example 01 on slide 9 (4:17) imply that, to be completely correct, the numeric limits check must always be inside the function that uses the int (or in another function called by that function)? This seems like a pretty severe limitation.
@kuhluhOG
@kuhluhOG 24 күн бұрын
18:05 So, what's with hardware where (on kernel level) you have to access things at address 0 (aka null) for certain operations because the hardware dictates it? Does this mean that you theoretically just can't use C++ on such hardware?
@LaserFur
@LaserFur 3 ай бұрын
4:19 I don't think that is a good example. The compiler should not be making an assumption on a code path that never happens. f() never gets called with INT_MAX due to the check and return. So it can't assume that i can't be INT_MAX when g() is called. I agree that the optimization of f() is correct as it can only return true. but this case the compiler would be guessing at UB that can't happen.
@Peregringlk
@Peregringlk 2 ай бұрын
In the first example I think fedor meant `i > INT_MAX` or maybe he was thinking about INT_MAX as the "next after the last", like if it were the upper bound of a range.
@ConceptInternals
@ConceptInternals 3 ай бұрын
Can someone explain how g returned true? I get that f returned true, but that should result in g() to be `return i != INT_MAX;` by compiler instead of `return true;`, correct?
@sverkeren
@sverkeren 3 ай бұрын
g() cannot simply return true. He is WRONG, you are right.
@rssszz7208
@rssszz7208 3 ай бұрын
Please add time stamp in every video it will be helpful
@clementdato6328
@clementdato6328 3 ай бұрын
Why is const cast-able to non-const? Does that mean if I see a function taking const ref as input, it is not correct to assume it does not alter the input?
@Digrient
@Digrient 3 ай бұрын
Trying to modify a const value via const_cast is undefined behavior. The only legitimate use of const_cast that I remember is when you need to use a legacy function/API (like a C function) where the parameter is not defined as const but you know from the documentation that the argument value will not be modified.
@woodandgears2865
@woodandgears2865 2 ай бұрын
Yes, a poor programmer might do a const cast and mess with the const & value. I think you'll have general acceptance from the c++ community to block that code at review time. The interesting bit here is that such bad code is . If it was, UB. If not, just sad code.
@kwitee
@kwitee 3 ай бұрын
It's a shame that the first example (at 5:00, with f and g functions) is analysed wrongly (as others have pointed out). There are valuable optimizations that can result from UB. A Fortran example: program fortran implicit none integer :: az,xw xw = 42 call test(az,xw) stop xw contains subroutine test(a,x) integer, intent(out) :: a ! starts undefined integer, intent(inout) :: x integer:: h read(*,*) h if (even(h+1)) a = 666 ! legs akimbo if (even(h)) x = a end subroutine test logical pure function even(h) integer, intent(in) :: h even = mod(h,2) == 0 end function even end can be optimized down to program fortran read (*,*) stop 42 end and I am not even sure whether the read can be omitted (probably not).
@ABaumstumpf
@ABaumstumpf 3 ай бұрын
Why did you first give example functions "f" and "g", and then go and introduce A DIFFERENT "g" ? Cause the original "g" does not introduce UB as it explicitly prevents that by checking against INT_MAX. This is just unnecessarily error-prone. Undefined behaviour would be a lot less of a problem if it wasn't silently introducing problems and also being so corrosive - many naive checks and attempts to avoid it will be optimised away. This would actually be a good use of attributes or some other alternatives (if attributes weren't also fundamentally broken and useless as of C++23). Give us something that allows programmers to influence (aka defining it) undefined behaviour: For the signed integer overflow that would be "overflow_saturating", "overflow_wrapping", "overflow_exception" or even "overflow_unspecified". Now the compiler, for that specific section of code, must check the target platform against the given specifier and act accordingly. With "unspecified" we don't care what the actual behaviour of that operation is, the compiler is just not allowed to introduce UB into the rest of the code. With "wrapping" on a lot of hardware it wouldn't need to do anything. This would be a simple mechanism to allow all the optimisations of UB to still exist while also giving programmers better control (and specially prevents UB from causing bugs).
@zachansen8293
@zachansen8293 3 ай бұрын
It sure seems like there are better talks on this topic in many different years of cppcon.
@woodandgears2865
@woodandgears2865 2 ай бұрын
Links?
@davidsicilia5316
@davidsicilia5316 3 ай бұрын
that first example of UB makes no sense to me
@anon_y_mousse
@anon_y_mousse 3 ай бұрын
I still don't get why there's so much hoopla because of overflow. Every major platform defines it in basically the same way and it's a natural function of 2's complement negation. It's easy to account for it at the compiler level because x86 and ARM processors both set a flag that can be conditionally jumped due to it being set, and it's easy to avoid it in your own code by simply checking any important calculations either before or after a series of operations. This covers at least 95% of platforms in regular use, maybe more, and yet people keep complaining about it. If a calculation needs to be error free, and you don't know the possible outcome based on the inputs, then check it, but ultimately, I think this boils down in large part to not sanitizing user input and that's far more problematic than the possibility of integer overflow.
@err6910
@err6910 2 ай бұрын
My opinion on integer overflow is that it does not matter if it's UB or not, if your operation overflows, then it's most probably a bug anyway (99% of the time).
@robertolin4568
@robertolin4568 3 ай бұрын
This shows how awkward c++ has become. As a high level language, the only obvious way to prove or disprove undefined behavior exists before runtime is to "manually inspecting the assembly output." And people do that in every cppcon talk, completely giving up that fact that it should be a high level language. What an irony!
@kristiannyfjell8097
@kristiannyfjell8097 3 ай бұрын
Always been this way, even the first C standard had UB, etc. This is why people stress "follow the (ISO) standard when programming." Also, C/C++ is not meant to be high-level languages. They were meant to be 'system languages', fast, efficient, and only assembly underneath. C++ have their core guidelines. If you follow them, you will not get any form of UB.
@robertolin4568
@robertolin4568 2 ай бұрын
@@kristiannyfjell8097 C is not meant to be high level, but C++ is. Otherwise you don't need everything after C++14 and all those allegedly "memory safety" features. C is much better in that it is rather consistent in its concepts and standards. There are only that many ways things could go wrong (although quite recurring ones). And most new features have been in, say GNU C and Linux kernel, for a long time before added to the standard. C++ is not. Every year you see the "How C++XX changes the way we write code" talks. Most of them reject at least some of the "best practices" mentioned cppcon in the last year. C can at least have consistency. C++ is a totally disaster.
@johannesschneider1784
@johannesschneider1784 2 ай бұрын
But most modern compilers have sanitizers, right?
@voxel1554
@voxel1554 3 ай бұрын
I love hrt 🏳️‍⚧️
Back to Basics: C++ API Design - Jason Turner - CppCon 2022
1:00:42
Which one will take more 😉
00:27
Polar
Рет қаралды 80 МЛН
Stupid man 👨😂
00:20
Nadir Show
Рет қаралды 27 МЛН
格斗裁判暴力执法!#fighting #shorts
00:15
武林之巅
Рет қаралды 24 МЛН
What Software Architects Do That Programmers DON'T
12:51
Healthy Software Developer
Рет қаралды 93 М.
CppCon 2015: John Farrier “Demystifying Floating Point"
47:34
Back to Basics: Testing in C++ - Phil Nash - CppCon 2023
1:02:33
Купите ЭТОТ БЮДЖЕТНИК вместо флагманов от Samsung, Xiaomi и Apple!
13:03
Thebox - о технике и гаджетах
Рет қаралды 61 М.
How about that uh?😎 #sneakers #airpods
0:13
Side Sphere
Рет қаралды 9 МЛН
Самая важная функция в телефоне?
0:27
Опросный
Рет қаралды 216 М.
Apple, как вас уделал Тюменский бренд CaseGuru? Конец удивил #caseguru #кейсгуру #наушники
0:54
CaseGuru / Наушники / Пылесосы / Смарт-часы /
Рет қаралды 3,7 МЛН
Any Sound & Call Recording Option Amazing Keypad Mobile 📱
0:48
Tech Official
Рет қаралды 325 М.
❌УШЛА ЭПОХА!🍏
0:37
Demin's Lounge
Рет қаралды 344 М.