CppCon 2016: Chandler Carruth “Garbage In, Garbage Out: Arguing about Undefined Behavior..."

Рет қаралды 86,932

Күн бұрын

CppCon.org
-
Presentation Slides, PDFs, Source Code and other presenter materials are available at: github.com/cppcon/cppcon2016
-
There has been an overwhelming amount of tension in the programming world over the last year due to something that has become an expletive, a cursed and despised term, both obscene and profane: **undefined behavior**. All too often, this issue and the discussions surrounding it descend into unproductive territory without actually resolving anything.
In this talk, I'm going to try something very bold. I will try to utterly and completely do away with the use of the term "undefined behavior" in these discussions. And I will unquestionably fail. But in the process of failing, I will outline a framework for understanding the actual root issues that the software industry faces here, and try to give constructive and clear paths forward, both for programmers and the programming language.
And, with luck, I will avoid being joined on stage by any unruly nasal demons.
-
Chandler Carruth
Google
C++ Lead
San Francisco Bay Area
Chandler Carruth leads the Clang team at Google, building better diagnostics, tools, and more. Previously, he worked on several pieces of Google’s distributed build system. He makes guest appearances helping to maintain a few core C++ libraries across Google’s codebase, and is active in the LLVM and Clang open source communities. He received his M.S. and B.S. in Computer Science from Wake Forest University, but disavows all knowledge of the contents of his Master’s thesis. He is regularly found drinking Cherry Coke Zero in the daytime and pontificating over a single malt scotch in the evening.
-
Videos Filmed & Edited by Bash Films: www.BashFilms.com
*-----*
Register Now For CppCon 2022: cppcon.org/registration/
*-----*

Пікірлер: 132

@jacob_90s 3 жыл бұрын

Not gonna lie; hearing Chandler criticize the standard put a big ass smile on my face. I've lost count of the number of times I've tried to discuss both possible, and legitimate issues with languages with other developers over the years, only to have them fanboy up and refuse to admit there could be anything wrong with their perfect little language. This isn't to say I was right in all cases, but the general stubbornness to admit there could possibly be an issue, or that something could be done better, has just absolutely driven me nuts over the years. So frankly it's refreshing to just hear someone else criticize one for a while, much less someone with as much weight and authority as Chandler.

@kamilziemian995 Жыл бұрын

I often feel the same.

@teekwick 7 жыл бұрын

Having Chandler in the committee gives me hope for the future of the language. Good talk as always!

@MrAbrazildo 7 жыл бұрын

I guess the best thing for the "narrow vs wide contract" is compiler warnings: keeps C/C++ narrow(ness), but not without trying to save us.

@NoNameAtAll2 3 жыл бұрын

47th slide is so confusing... "good" and "bad" words don't correspond to UB...

@UGPepe 5 жыл бұрын

people are not upset because the effects of violations are latent (hence, write more sanitizers) but because the contracts themselves are stupid

@flatfingertuning727 4 жыл бұрын

Not only that but compilers' interpretation of the C Standards is in direct defiance of the Committee's intentions as stated in the published Rationale. "Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior." The authors of the Standard note (in discussion of translation limits) that it makes no attempt to forbid someone from contriving an implementation that is "conforming" but "succeeds at being useless", the Standard's failure to mandate that all implementations process something usefully hardly implies any judgment that quality implementations shouldn't do so anyway when practical.

@UGPepe 5 жыл бұрын

@flatfingertuning727 4 жыл бұрын

Not only that, but platforms where x

@jmille01 7 жыл бұрын

"Presentation Slides, PDFs, Source Code and other presenter materials are available at: github.com/cppcon/cppcon2016" I don't see the presentation at the supplied location.

@Yupppi 5 ай бұрын

The biggest issue with undefined behavior to me is that it's poorly named. Undefined behavior doesn't sound scary. It sounds like "I don't know if it's cloudy tomorrow" like nobody's scared of it being cloudy or not tomorrow, they'll still wake up and go to work despite not knowing before hand which it is. They dress up respectively. And then you also see these "this is actually undefined behavior" like in some Sean Parent talks I remember, and it ends up being something that doesn't do anything interesting, really nothing to worry about but the standard has not defined accurately what it should do. And some code depending on undefined behavior. It just doesn't sound too scary because it's such an enigma, nobody just has said what it should be doing so technically it could do anything imaginable and unimaginable (but many times it also won't do anything bad, possibly even desired). So what I'm understanding is that the committee can't fix bad programming and illegal use of the language?

@MrMidjji 4 жыл бұрын

Why not make the shift more than nr of bits in the type a compile error then? Its super common for these to be compile time known. For the runtime case, make the operator autocast to a range type which has compiler configurable behaviour and can be either free and undefined if wrong, or excepts or performs modulo sizeof(type)*8?

@joestevenson5568 10 ай бұрын

It is - thats why he had to slap volatiles on it to stop the static analysis telling him it was a bug

@gregkrimer1000 3 жыл бұрын

Very insightful talk. Thank you!

@CppCon 3 жыл бұрын

You are so welcome!

@grisevg 7 жыл бұрын

What if you have "char*" that you increment. Does it do same nasty wrap around handling or is it as fast as signed int?

@Berdes1 4 жыл бұрын

For the people still watching this and reading the comments, this is as fast as signed int. If you increment a "char*", I'm assuming your char* is pointing to an element of an array of char. In that case, the result of the increment must be pointing to an element of that array or point to one-past-the-end. If the result is not one of these, it is undefined behavior. Given that your array is contiguous in memory, it cannot wrap around.

@smiley_1000 3 жыл бұрын

36:36 this error is actually not related to overflows at all. it behaves *exactly* as expected: we allocate 16 + (0-1)*8 = 16 + (-1)*8 = 16 - 8 = 8 bytes. so basically, we say that since we don't want any of the 8-byte rtunions, we'll just not allocate any - not even the one in the latter 8 bytes of the 16-byte rtvec_def. which means we'll try to allocate 8 bytes for a 16-byte struct, which is of course an error. but the error is a completely logical error and is not related to overflows at all.

@mononix5224 2 жыл бұрын

It is due to overflow, since `typeof(sizeof(x)) = size_t`, so `(−1) * 8 ≠ −8`. The `−1` is converted to a `size_t` which results in the value `1**bitsize(size_t) − 1` (** is used for exponentiation). On a 64-bit arch (assuming 2's complement) this will result in `((2**64 − 1) * 8) % 2**64 = (2**64 − 8) % 2**64 = 2**64 − 8`, then we still need to add 16: `(2**64 − 8 + 16) % 2**64 = (2**64 + 8) % 2**64 = 8`. So the result is still 8, but is was _due_ to overflow.

@kered13 4 жыл бұрын

37:00 Doesn't the math actually work out here to produce the expected result? Chandler says that unsigned multiplication is defined as modular arithmetic, so the calculation should go as follows: Promote -1 to unsigned -> 2^32 - 1 (or 2^64 - 1, doesn't matter) (2^32 - 1) * 8 mod 2^32 = 2^32 - 8 8 + (2^32 - 8) mod 2^32 = 0 And 0 is the exact size you would expect when the input is n = 0.

@smiley_1000 3 жыл бұрын

exactly. but it's still a memory error, because you'll be allocating 8 bytes for a 16-byte struct. so the error isn't related to the overflow at all.

@ephimp3189 4 жыл бұрын

the simplest way to detect cyclic graph is to keep a single counter for node traversal and check it against total graph size

@ChristianBrugger Жыл бұрын

What if you don't know the graph size. In the example I think only a pointer to a note was given.

@J-Random-Luser Жыл бұрын

@@ChristianBrugger I feel if you're passing an arbitrary node instead of an overall "graph" structure that doesn't keep track of the number of nodes, then you have bigger problems. As nodes are added to the graph, you can just increment it.

@dannystoll84 8 ай бұрын

@@J-Random-LuserHaving an arbitrary node in a graph is a common situation. Often, the graph is not even a latent structure in memory, but rather a mathematical object that arises due to iteratively applying a function. As a concrete example, consider Pollard’s rho algorithm for integer factorization.

@valetprivet2232 7 жыл бұрын

Why when we multiply int and unsigned int at 35:39 we convert int to unsigned, not he other way around, which seems more logical?

@smiley_1000 3 жыл бұрын

you're right that it'd make sense to convert to signed int, but they both in fact produce the exact same output with two's complement

@kostikvl 7 жыл бұрын

Interesting talk. Example why one should prefer signed integers is really great. I think, the key problem with UB (and why it is so hateful) is that compiler allowed not only do something terrible, but also not to do something. For instance: for (int i = 0; i < 10; ++i) cout

@PieterKockx 7 жыл бұрын

Great example! GCC does warn about "iteration 3 invoking undefined behavior" under "aggressive-loop-optimizations" but I haven't managed to actually trigger the optimization.

@JuddMan03 7 жыл бұрын

Konstantin Vladimirov great example. I can think of how this might happen, but if a compiler likes to optimise code so aggressively, surely it can see that a loop of length 10 is more efficient than one of length infinity? It should clearly choose to unroll the loop and discard all iterations that invoke undefined behaviour, which is a true improvement down to 3 iterations. But to take examples from the Twitter posts in the slides, it could also be excused for discarding the loop itself, the function that called it, the program containing the function. It would stop at deleting the whole OS because that would be increasing the number of operations required

@jonesconrad1 5 жыл бұрын

could you explain a bit more please, ? I don't understand why the loop would be made infinite

@animowany111 5 жыл бұрын

+Conrad Jones A value of `i` larger than 3 is "impossible", because the multiplication would then overflow, and the compiler can assume UB never happens. Since with that assumption `i` is always smaller than 4, `i < 10` always evaluates to true, the compiler then "simplifies" the conditional, making the loop infinite.

@MsJavaWolf 4 жыл бұрын

@@animowany111 The loop is actually fine, the value of i doesn't change in the print statement.

@MrMidjji 4 жыл бұрын

It would be a good thing if the compiler could statically find contract violations compiletime though, a function which takes AcyclicGraph not Graph e.g.

@SolomonUcko 3 жыл бұрын

30:54 Isn't left-shifting a negative number always UB? "Otherwise, if E1 has a signed type and non-negative value, and E1*2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined." (expr.shift, 5.8.0.2, www.open-std.org/JTC1/SC22/WG21/docs/papers/2010/n3092.pdf#page=131, N3092 page 116)

@iamvfx 7 жыл бұрын

47:53 This should be a compile error. Like, "Error: you can't use 32 bit values for 64 bit pointer indexing". Compiler has all the information for that. No silent promotions to 64 bit is needed. One can use (u)int_fast32_t types for platform specific size.

@MrAbrazildo 7 жыл бұрын

And what if you want to pack 2 32-bits values in a 64-bits var?

@iamvfx 7 жыл бұрын

Value packing is not related to pointer indexing, or at least I can't think of any example of that right now.

@BlairdBlaird 2 жыл бұрын

Well yes and modern languages do that (the downside being that it often leads to very noisy code). However C++ inherits integer promotion rules from C, and because integer promotion is a thing its use is *ubiquitous*. Furthermore, because it doesn't technically trigger a bug (with respect to abstract interpretation) I don't think any compiler will warn on this. On implicit narrowing conversions maybe, optionally (because there's a risk of information loss), but not widening. In fact there are regularly proposals of adding "convenience" implicit integer widening on languages which are currently stricter than that (like Go, Swift, Rust, ...).

@EvanED 7 жыл бұрын

I don't know if you're reading Chandler, but re. the signed/unsigned optimization discussion around 48:00, if you had your druthers would you suggest using signed types even for sizes and such, over size_t? E.g. it looks like the LLVM SmallVector templates' size/max_size/capacity/etc. use size_t. Is that more legacy stuff, or it doing the right thing?

@dizekat Жыл бұрын

Regarding integer overflow, the sensible behavior would be for the compiler to apply optimizations when it can prove that the overflow won't occur. When it can't prove that overflow won't occur, and it is doing that optimization, that's a security exploit waiting to happen.

@SolomonUcko 3 жыл бұрын

10:50 Isn't an infinite loop without side-effects UB in C++? 6.5.0.1 (in stmt.iter) states: "A loop that, outside of the for-init-statement in the case of a for statement,-makes no calls to library I/O functions, and-does not access or modify volatile objects, and-performs no synchronization operations (1.10) or atomic operations (Clause 29) may be assumed by the implementation to terminate. [Note: This is intended to allow compiler transformations, such as removal of empty loops, even when termination cannot be proven.- end note]" (www.open-std.org/JTC1/SC22/WG21/docs/papers/2010/n3092.pdf#page=143, N3092 page 128)

@ronensuperexplainer 2 жыл бұрын

That happened to me once

@JackAdrianZappa 5 жыл бұрын

47:20 yes if you size_t that will also also avoid the problem. How does it avoid the problem? Is there something special about size_t that says that it won't wrap, stopping the compiler from adding extra code?

@DaNikeTrations 4 жыл бұрын

If you use size_t there, on that specific platform, it will be 64 bits, and will wrap at the 64 bit boundary, just like the 64 bit addressing mode. It still wraps, it just wraps at the right size.

@dannystoll84 8 ай бұрын

The example on slide 48 actually produces the mathematically correct value when n=0. You would get the same value if the numbers were signed. The issue isn’t the overflow, it’s that 0 should never have been input in the first place (as the resulting 8 bytes are insufficient space for the rtvec_def object).

@ManOfThrills 7 жыл бұрын

I thought the title referred to SG12 meetings...

@szirsp 2 жыл бұрын

I understand that not all UB can be defined, but a lot of them could at least be implementation defined (or require diagnostic output, and not be silent UB). For example just saying that casting a byte/character pointer type to a pointer of float type is undefined behavior could mean that the compiler can just stop generating instructions for the code after it encountered UB, (since it's UB anyway doesn't matter what the rest of the code would do). But we know that this should be fine on all platforms if the pointer if properly aligned, and it's fine on some platform even if it is not aligned. This could/should be implementation defined even if it just says the behavior is architecture dependent, but the compiler (or toolchain) can guaranty that it will at least output machine code/instructions and not just give up on you. I'm fine if the compiler says to me "Hey, we don't know what this will do, but at least we tried. Maybe you should look at this once more if this is really what you want." instead of the compiler doing 'Look at this stupid human! I recognize that this is UB, so I'm just gonna stop trying to even generate code, and I'm not gonna tell anyone, not gonna notify the user/programmer'.

@199t8 Жыл бұрын

firstly, it is impossible to read a float from a char * unless you explicitly say that this is what you want to do via reinterpret_cast. also, unless you know exactly how your compiler and platform works (unlikely for any modern tech stack), you do not know whether this is fine. you can somewhat achieve the behavior you want in your last paragraph by turning off optimizations. it is pointless to make stuff like this implementation-defined because implementators would have two options: maintain docs on the inner workings of their entire compiler, os, and hardware (impractical for both author and reader when considering code optimization, address randomization, speculative execution, etc), or just define it as ub in the implementation spec making things worse, since now you have to read the docs for the compiler, the os and the cpu in addition to the standard to determine if something is ub.

@sinom Жыл бұрын

What happened to expects/ensures/assert? It's still not in the standard afaik. Was there a reason for that?

@TheLeontheking 4 жыл бұрын

Good documentation of the library and language-features, careful and conscious programming, as well as good compilers should be the ways to not get into undefined territory.. A library which constantly does runtime-checking, or tries to circumvent UB even in obvious cases of contract-misuse does not sound like a good option to me.

@d.m.3316 7 жыл бұрын

Isn't incrementing a uint32 the same as incrementing a uint64 but ignoring the higher 32 bits? If so, why can't the compiler generate the optimised code, and make sure it doesn't go on to rely on the higher 32bits? [I'm talking about slides 49-50]

@ManOfThrills 7 жыл бұрын

How can the compiler not rely on the higher 32 bits when it needs a full 64 bit address? It's adding a 32 bit unsigned int to a 64 bit address on slide 49, and that base address can have some lower bits set. The compiler cannot easily emulate overflow wrapping of the original 32 bit index after it has added the index to the 64 bit base address.

@OMGclueless 7 жыл бұрын

It could. The problem is that "make sure it doesn't go on to rely on the higher 32bits" means applying a bitmask, probably with the AND bitwise operator. It turns out this is even slower than the "leal" instruction which is why the compiler does that instead.

@douggale5962 7 жыл бұрын

x86-64 zeros the upper 32 bits whenever it stores a 32 bit value to a register. Consider: movabs $0x123456789ABCDEF0,%rax ; mov %eax,%eax. this mov will zero the upper 32 bits of %rax. If the second instruction were add $0,%eax it would also zero the upper 32 bits of %rax.

@Asdayasman 4 жыл бұрын

Yo I'm only 21:00 in but what about dual implementation of things that can exhibit UB? The wide contract shitty slow one runs when a compiler flag is given to say "yo be slow and tell me if I'm stupid", and the narrow contract good and sexy one runs without that flag. Compile with that flag, run the test suite (you wrote full coverage, right?), and away you go.

@MrAbrazildo 7 жыл бұрын

- I got a crash on gcc/Linux, 64 bits, using: for ( ; bits; bits >>= 1). It was fixed after for (int j=0; j < MAX_MEANINGFUL_BITS; j++, bits >>= 1) - Inside a gcc header file, it warning us that the reverse iterator, (pointing) to the (reverse) end of a container, may be an INVALID pointer! - Also on gcc/Linux, I got a Segmentation Fault (not a compile error!) writing: int some_f () { blablablablablabla; extensive calculation ready to be send; //WITHOUT the return keyword. }

@cptroot 5 жыл бұрын

It's worth noting that swapping the uint32_t on slide 49 for size_t also does the job. Which one you prefer probably depends on how register limited the other parts of the code are.

@JackAdrianZappa 5 жыл бұрын

Why is that? What is special about size_t that would prevent modulo arithmetic? Is it because it must be the max width of the register size, thus it will just automatically wrap?

@cptroot 5 жыл бұрын

@@JackAdrianZappa Chandler mentions the reason for the behavior change is that size_t wraps at the same point that the register does. This means that they can use the instruction that wraps from 2^64 to 0.

@richardbarrell4043 7 жыл бұрын

I'm having trouble understanding the bug on slide 48. (size_t)-1 is SIZE_MAX. (SIZE_MAX * 8) rolls over to (SIZE_MAX - 7). 16 + (SIZE_MAX - 7) should roll over to 8? under the assumption that all arithmetic on size_t values takes place modulo some power of two. Are there multiple conversions that I'm missing or something, please?

@Som1Lse 7 жыл бұрын

The bug is that the value passed to `obstack_alloc` is huge when in fact, there is no more memory is needed.

@OMGclueless 7 жыл бұрын

I think the problem is just that the code allocated 8 bytes of memory, when sizeof(struct rtvec_def) is 16. He doesn't show this happening, but it's easy to assume someone later deferences that memory as a (struct rtvec_def *) and runs into a problem.

@ManOfThrills 7 жыл бұрын

What's the point of the example on slide 48? Is it that unsigned calculations here are correct by "mere coincidence"? Had they been all signed, it would have neither fixed the bug, nor hit signed's undefined behavior, nor let Chandler's hypothetical tool give a warning about signed/unsigned multiplication. So what is the beneficial narrow contract he's talking about? The only thing that would help here is his suggested distinct unsigned integer types with undefined overflow behavior, but they are not mentioned in this example.

@vaughncato 7 жыл бұрын

Slide 48 is a bit unclear to me as well. He seems to be arguing that along the lines that if unsigned overflow wasn't legal, then you could have a tool that would show that the code would be incorrect in the case of n==0. I think this is supposed to support the general idea that narrow contracts can be useful as opposed to a specific recommendation about how to improve the code.

@OneWheelGuy1 7 жыл бұрын

That was my analysis also. And, since rtvec_def contains a rtunion, and it isn't needed (because size is zero) then everything works out perfectly. Now, on a 64-bit system it would not work as well, I don't think, because -1 would convert to UINT_MAX-1 not SIZE_MAX-1, and then we end up trying to allocate ~32 GB of RAM.

@OneWheelGuy1 7 жыл бұрын

47:08 - Today it's fairly easy to get enough bits, without having to use the sign bit. 47:26 - The downside of using size_t is if ... is in a data structure it uses more space. So, we it is easy to get enough bits, except that it's not. I think that using size_t is the ideal solution. It gives you efficient code *and* it ensures that the code will work if you ever need to sort 10 GB of data. If your structures don't need to support more than 4 GB of data then you can store uint32_t in your structures and load into a size_t local variable.

@naxaes7889 3 жыл бұрын

But he's talking about the sign bit. `ssize_t` is signed and gives you 2^63 bits of addressable data, which is much larger that 10 GB.

@derekli3604 7 жыл бұрын

can't find slides for this talk

@aaaab384 7 жыл бұрын

They're basically empty slides... Why would anyone want them?

@GeorgeTsiros Жыл бұрын

I am curious, if any of you knew who "rygorous" is...

@annazolkieve9235 2 жыл бұрын

Why UB instead of IB?

@MrMidjji 4 жыл бұрын

Still think signed int should wrap around. In practice today, portable code needs to use a wrapper around signed integer types which guarantees wraps when needed. That has a penalty, but which processor architecture does not wrap signed ints on addition?

@andik70 3 жыл бұрын

Actually asserts should stay in production release (or production asserts should exists) and not become a precondition which if violated will become UB.

@lunakid12 2 жыл бұрын

"Production asserts" do exist: they are just our normal runtime error checks.

@WilhelmDrake Жыл бұрын

@~30min I think the lesson here is don't do bit manipulation with anything but unsigned.

@kwinzman 7 жыл бұрын

What's the easiest way to get an std library that uses ssize_t instead of size_t ? Edit: ssize_t may or may not be the type I am looking for (it seems to be intended for negative error codes).

@grisevg 7 жыл бұрын

I can't reproduce it - signed and unsigned assembly is nearly identical on both latest gcc and clang: godbolt.org/g/KysL1T

@ManOfThrills 7 жыл бұрын

Clang's assembly for signed and unsigned are very much different and correspond to what Chandler shows, especially if you add a couple more iterations. GCC's assembly is inefficient in signed case, I think because it doesn't treat signed overflow as undefined and faithfully implements two's complement signed wrap-around.

@connorhorman 5 жыл бұрын

What I would really like is Signed Integer Overflow/Underflow well defined. Its Well defined in java, and I have hashcode functions that can easily Overflow a signed integer, which have to be signed because interop.

@connorhorman 5 жыл бұрын

And then you show me the optimization of that code, and it makes me question that statement. (47:00 - 50:00) Oof. Still kind of annoying that I either need to write UB, or have additional code to cast out of signed. Interop with C++ is hard.

@flatfingertuning727 4 жыл бұрын

@@connorhorman What's needed for real optimizations is a class of situations with *loosely*-defined behavior, which *programmers could safely allow to occur* in cases where all of the allowable behaviors would meet requirements. For example, recognizing a class of implementations where "x+y > z" would be guaranteed never to do anything other than yield 0 with no side-effects or yield 1 with no side-effects would mean that a compiler could transform the above into "y > 0" in cases where it could prove that x and z are equal--something which it wouldn't be able to do if the programmer had written the expression as "(int)((unsigned)x+y) > z" for purposes of avoiding UB.

@UGPepe 5 жыл бұрын

defining behavior for all platforms doesn't mean that the behavior has to be the same for all platforms, how on earth did that implication came about?

@BatmanAoD 4 жыл бұрын

Yeah, this completely baffles me, especially given that someone actually asks "why not make it implementation defined".

@GRHmedia 7 ай бұрын

I've never really had an issue with it. It is my job as the programmer to make sure the program does what I want it to do. That means using the language properly. I've noticed though a lot of the current generation of programmers are more the ones who have issues with it. I think in part this is because of how we were raised. They are looking for other people or a system to make it easier on them. Were we were taught to make due with the tools we had if we didn't like them create our own. Don't expect others to solve your problems so much. I've been programming since 1983. I've never had an issue with pointers in that nature. It is now 2023 that is 40 years without an issue. I don't think I am someone spectacularly exceptional. So how is this a problem? To me it is just making an issue out of something that has never been an issue to me.

@sirhenrystalwart8303 5 ай бұрын

Tens of thousands of CVEs written by your generation would like a word.

@Fetrovsky 7 жыл бұрын

48:50 Aren't 32-bit registers available in AMD64 Long Mode? Also, the comments about wrapping don't make sense because int32_t's will also wrap except at half the distance (from [(2^32)-1] to [-(2^32)]. So there's really no advantage to using int over uint.

@Myriachan 7 жыл бұрын

The difference is that the compiler is allowed to assume that signed integers can't overflow, because to do so would be undefined behavior.

@Fetrovsky 7 жыл бұрын

Does C++ not assume 2's complement representations in CPUs?

@Myriachan 7 жыл бұрын

Quoting the Standard: "this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types." (Note that it theoretically could support others.)

@Fetrovsky 7 жыл бұрын

Ah, ok. It now makes sense.

@Fetrovsky 7 жыл бұрын

Thanks!

@metallitech Жыл бұрын

Damn, this makes c++ look like a dog's breakfast. I have to decide whether to get into Rust now.

@FalcoGer 9 ай бұрын

I don't see what the problem is. Define what it means for an nullpointer to be dereferenced. On a C6502 it means dereferencing address 0. on linux running on x86 it means a segfault, in kernel code it means a panic. In fact, why bother defining it. Let the hardware manufacturer or the operating system developers decide what it means. Because at the end of the day, this will generate some assembly instructions. xor eax, eax; mov edi, [eax]; or similar will be generated. The cpu will attempt to run that code. What happens next is not your concern anymore. Just generate the assembly like a good little compiler and be done with it. 10:00 well you just defined all the cases. congratulations, your UB is now defined. and what defined it? the implementation and the platform it runs on. that's a programmer's job. writing a program to run on a platform. The program defines the behavior for the platform it was written for. Again, no problem. 20:00 fine, let's talk language features then. Say a standard library function with a narrow API that accepts only sorted containers as an input, take std::equal_range. for example. It's undefined to give it a container which is not sorted. Or is it? I say it is very well defined because the code is right there in the library! It probably gives you the subrange from the first occurrence to the last occurrence that is consecutive to that first occurrence. Whatever it does, just describe it and you're done. Boom, defined behavior. Yes, you might be using it wrong, or you might want exactly what it does anyway. And what it does is described in the code. 25:40 fine. then define it individually for each platform. generate the assembly and let the CPU deal with it. the programmer knows the platform he's dealing with. 27:00 but it already IS implementation defined. on a 32 bit machine this is apparently a bug, even though it does what I would expect on an x86 architecture if I run it on such. and on 64 bit it just runs fine anywhere. What about integer overflow? not a single cpu since at least 1980 has had anything other than 2's complement. why is integer overflow STILL not defined? It's silly is what it is.

@UGPepe 5 жыл бұрын

"narrow contract" easily includes the set of all absurdly narrow contracts

@andreashohmann5994 7 жыл бұрын

the Video is broken ? i See always a Grey half of the Screen??

@pointlessone3702 4 жыл бұрын

bzip example is dishonest. bzip was conceived when there was no 64bit arch around. When compiled for 32bit x86 it's as efficient as the modified code on 64 bit arch. If anything this example demonstrates that C portability is not quite as perfect as one could wish.

@UGPepe 5 жыл бұрын

but I am paying for something I'm not using by having to mask the shift because you're throwing portability down my throat

@smiley_1000 3 жыл бұрын

well yes, you'll have to do the thing the compiler would otherwise do for you. but if you hide it behind a #ifdef and make it depend on the architecture (basically what the compiler would do otherwise), it incurs no overhead

@origamibulldoser1618 7 жыл бұрын

Maybe I'm missing the point, but this sounds like the same kind of defensive talk on c++'s behalf, like Bjarne gave this year, where it all boils down to is: "we can't fix it because we can't do both a and b if and b aren't orthogonal features." I understand some of the reasons given, but do not see a possible solution to any of this. Ref the shift examples: If this language is supposed to be platform agnostic, why does the shift operator even exist? Sounds like C is violating it's own principle of abstracting the assembly code. Shift is an intrinsic and has no place in a platform agnostic language. Well. I'm only a regular developer, so maybe it isn't for me to understand what the point of all this is supposed to be.

@kostikvl 7 жыл бұрын

There is nothing wrong in shift operation. Bitwise operations shall present in every language because they do present in every architecture and they are useful. Problem is how to define those operations in the most portable way. And sometimes undefined behavior for corner cases is the best answer.

@origamibulldoser1618 7 жыл бұрын

But they're apparently not implemented the same way, and that abstraction leaks through, if I understand Chandler correctly. Anyway, I thought about what I said, and I don't really have the experience to make these kinds of sweeping, general statements, so someone else will have to continue this discussion, if there's a point to be made.

@origamibulldoser1618 7 жыл бұрын

aa haha. oh look, a troll.

@aaaab384 7 жыл бұрын

_"C is violating _*_IT'S_*_ own principle"_

@origamibulldoser1618 7 жыл бұрын

Hahahah, go fuck yourself.

@0xCAFEF00D 7 жыл бұрын

Well the conclusion I can draw here is that C++ is a very poor language for most people because almost nobody writes programs which could fit the narrow path which avoiding all the UB/programming issues let you walk on. Not knowingly anyway. Because almost nobody knows enough to manage that or can even hold all the edge cases in their heads. So most of us are stuck with dealing with the latent bugs, always. Unless you make (for example) an expensive shift and a cheaper shift where I have to deal with the potential issues. Leaving you more potential for making consistent results across platforms or whatever situation you're in. Also chandler seems really upset over people making jokes on Twitter. Quite likely nobody thinks your compiler will delete files when there's UB. Or make a male cat pregnant. Edit: Watching this again (4 months later), because it's a good talk, I realise that chandler makes the exact suggestion about a slow vs fast shift except he does it for the compression example. I wouldn't mind having a lot of these options. But maybe there should be facilities in the language to express expectations rather than having a bunch of different types that imply behavior. If programmers could annotate when they expect or want modular integer behavior or not you'd have a more pleasing consistency between unsigned and signed types while still having all the benefits of narrow contracts.

@Myriachan 7 жыл бұрын

The problem with undefined behavior in C++ is not that there's a problem with undefined behavior itself. The problem is that C++ considers too many things to be undefined behavior that should not be. Signed integer overflow comes to mind.

@Spillerrec 7 жыл бұрын

Back when C was created, different platforms had different representations of signed integers so there was no way to define the behavior of signed overflow. Just think of the performance penalty if you had to check for overflow on every single arithmetic expression (which isn't unsigned) on certain platforms... Maybe not that relevant today, but I don't know the consequences of trying to change it now.

@andik70 5 жыл бұрын

@@Spillerrec then why not make it implementation defined. UB is a very differebt beast

@Spillerrec 5 жыл бұрын

@@andik70 That is certainly a valid point and I don't know why they decided on that. Do you have any specific use cases for signed overflow btw? However due to recent experiences with finding several unexpected overflows using UB-san I'm actually happy that it is UB and wished I had a similar tool for unsigned overflow. Since overflow is rarely what we expect with a random arithmetic expression, I think it would be neater if it was always an error and you had to annotate somehow that it should have a defined overflow behavior (and which for signed) for those special cases. (Like the fallthrough attribute for switch statements.) It would help the programmer spot the intended use of overflow, it would make it easier to catch incorrect arithmetic at run-time, and it would make otherwise implementation defined behavior cross-platform.

@UGPepe 5 жыл бұрын

standards leave too much undefined while compilers act like an AI that exploits every loophole to reach its simplistic objective function which is to make code run faster.

@UGPepe 5 жыл бұрын

then compiler writers have to justify their eagerness to optimize with strawmans such as "since you can't define every single behavior anyway... might as well run wild"

@akashpatel2898 5 жыл бұрын

The actual Car Here something tricky involving undefined behivour and no it is not “To Nasal Demons”. By the way, I do NOT work here. Too Low Preformacne and they pick high power electronics: We created a blackout.. yes energy used up. Maybe put up the no outlet on the road with deadends or live with blackouts. And yes the second part is nonsense… it really is plugs and power. Why Its Not Prisoners Dillema: If it were Prisoner’s Dilema and I am player 2 then I would have the choices there but its not so it can’t be prisoner’s dilemma.. before someone argues it is. By the way, game theroy is actually something I fairly good at.. I know in about a second what the best move is. The Pointers in Functions Aproach: Well this program itself has quite a few program optimizations left in it but it is already using very low level C style code.. the kind most people belive to be faster but isn’t and also less mantaible. Pointers without Restrict Alising and not efficient algorithms.. I have the second most effiecnt algorithm. By the way, data structures and algorithms really are not quantifiable… text and hexadecimal and “colors “ (or just a therotical construct) are not quantive on the level of measurement. Unless you’re a computer sceintest and take a = 64 and b = other number and text being a list of numbers… then yes its quantativive. The Simplex Algorithm Without Known Center of Mass known. Builds of the work of Prof. Elegnbogen… his reserch into linear programming. This works on doubles or floats or something that was written as scalars in linear algebra. Without Splitting Vector Quantinities it’s a 1d.. split vectors to get 2d. Experimental Huerstic When one part moves whole rest of it is still. Also uses 2d netwonain physics on that one part with rest still. Worked okay on a robot for a few years… later on would be worried it breaks / something goes horribly wrong. Semi- Definite Quadratic Constraint Quadratic Objective Function These are allowed to be non-negative on each number.. This one corrects the simplex algorithm for forces that are either quadratic or linear forces involved. By the way, for moving car with moving part, this also has something do with relative motion and possibly adding on another force. Anways, adding another foce is only another row or col in the linear algebrea (after vectors split up). This still uses the simplifying assumption that the center of mass point is somewhere on the diagram rather than anywhere. Without it.. normative question: should the car continue self driving or be put in manual. Undefined, Implemention Defined, and In The Standard First, our code is required to be meeting the C++ standard fully for this, including preconditons and postconditions. If it is undefined behviour then the optimizer is a little bit less compared to undefined behviour. The optimizer… uses undefined behivour (which I am okay with). Now here comes the tricky part - it shaves off bits from double or floating point so that it is lower power. Hecne, the execption to R^3. By the way, it is pretty much never a scalar type like that.. polynomial for the real number that it is. This is where the disagreement is in… undefined behivour here with the shaving off from bits to reduce the energy used. By the way, in C++ choices they have include: floating point and double type. Use tolerance anaylsis on distances of bridges built - now you have a choice to make. Which is affected by that other one… I will pick double and let bits be shaved off. Or if we are allowed this as alterantive: ffastmath without bits not there. Go ask them, I am perfectly fine with the first one.

@UGPepe 5 жыл бұрын

the fact that you have programming errors that you cannot detect or would be prohibited to detect is not an argument for anything. it doesn't give you license to make languages unusable by humans by riddling them with UB

@UGPepe 5 жыл бұрын

a+b is defined behavior on every platform for the native number types that they support. i'm ok with that, i don't want you to abstract over that. my contract is with the platform. you're in my way. just do the optimizations that work best for each platform and stop there. the C standard is misinforming you about what your role should be and it's giving me, the user, a crappy language