CppCon 2018: JF Bastien “Signed integers are two's complement”

Рет қаралды 18,280

Күн бұрын

Пікірлер: 89

@Bourg 6 жыл бұрын

Post-CppCon update! The final approved wording for C++20 is present in P1236R1 (as voted by the committee on November 2018 in San Diego). It has math-y wording (instead of my engineering wording), leaves a bit more implementation freedom for bool, and doesn't resolve LWG3047 atomic compound assignment (Library will resolve it separately for C++20, including resolving the same issue in atomic_ref and atomic).

@MatthijsvanDuin 6 жыл бұрын

Is the text online somewhere? The link I find is wg21.link/p1236r1 but it's not (publicly) accessible.

@Bourg 5 жыл бұрын

@@MatthijsvanDuin It's available now. Mailings come out a few weeks after each meeting.

@movax20h 4 жыл бұрын

Leaving bool a bit implementation defined is good. the storage 2's complement, is a good move forward. it simplifies a language a lot. Thanks for your work on this.

@fdwr 6 жыл бұрын

55:30 As an American, there is no need to apologize. That is the sane way to write dates. Nice to see the practical reality (that integers are 2's complement) and the spec align.

@TranscendentBen Жыл бұрын

There's so much here ... I learned sign-magnitude, ones and twos complements in the late 1970s, and from the 8 and 16 bit microprocessors at the time, it was clear things were leaning toward twos complement. I learned (or started learning) C in 1986 and indeed by then virtually everything used twos complement. I knew that things such as referencing a null pointer was "undefined behavior" but not that signed integer overflow was. I forget when I finally learned that but it was years or decades later. It always bothered me because I KNOW what the equivalent assembly/machine code does, and I saw the purpose of a compiler was to generate equivalent code to the source code, and overflow was a natural occurrence of exceeding the bounds of the integer size, and everyone knows what the twos complement result will be. That's another thing, somewhere along the way - "integer" size was "clearly defined" by what original C standards there were as the size of the register word, but at least 16 bits (thus compilers generated 16 bit code even for 8 bit processors), and it commonly became 32 bits as compilers targeted newer 32 bit processors. C99 introduced the int8_t, uint8_t, int16_t, uint16_t etc. types and I thought to myself 20 years ago, why do people still use int? If you're not sure what processor you're targeting (my career has been embedded, so it could be 8-bit to 32-bit wordlength), or you're targeting several(!), you don't know how big an int is! So I started using the new types exclusively, so I always know variable size at a glance. Interesting that you mention MATLAB doing saturation, but that's also the standard operation for most DSPs. I was reading in the late 1990s how "mainstream" processors were adding DSP instructions (such as MAC, multiply-accumulate, multiply two numbers and add the product to a register), and as I recall, they may have been adding a saturation mode as well. Saturation is a much more appropriate way (better approximation to what the signal "should be") to handle overflow than "wraparound" in signal processing. Of course saturation is NOT part of any C or C++ standard that I've heard of, yet C and C++ are used almost exclusively for DSP programming. Programmers just know and accept that that's how DSPs work. But I can (now) see where different people expect certain things, and the standard committees have to somehow take these things into account. I've read such things about Microsoft Windows, people wrote production code that called Win system functions with wrong values, but the code still did something useful, and rather than "fixing" things MS has to make sure newer versions of Windows still work with such improper calls so that older apps don't break.

@TranscendentBen Жыл бұрын

58:18 Bool always guaranteed to be 0 or 1 may seem at first insignificant, but it's a great feature, you can now write: out_of_bounds_count += (value > Max_value); // comparisons always return type bool which compiles to straight-line code, rather than if (value > max_value) out_of_bounds_count++; which (unless the compiler is Really Smart, often true but not always) compiles to branching code which on modern processors may clear the instruction cache, slowing things down substantially. The first expression is also quite easy to read, once you see that the comparison returns 0 or 1.

@OnWhenReady 6 жыл бұрын

Cool story :-) Awsome talk btw !!

@Bourg 6 жыл бұрын

Exactly the response I was looking for :)

@movax20h 4 жыл бұрын

I think it is good that signed integers are being defined as two complements, but that is not going to solve the signed integer overflow being undefined by itself. 2-compleness was always implementation defined, and is just a crust worth removing from the standard. There were no machines using it for last 30 years. Maybe there were some emulators for PDP-11 using it, but that it is all. If you want old stuff on this machines (and there is probably less than 10 people using it), just stick to old compiler version. Done.

@TheMrKeksLp 4 жыл бұрын

I really don't get their reasoning. If integers are stored as two's complement why not also define them as wrapping? That's what every platform will do, so why not define it. Processors with non-wrapping arithmetic are even rarer than ones not supporting two's complement, in fact I doubt _any_ exist

@hemerythrin 3 жыл бұрын

@@TheMrKeksLp I know this comment is 1 year late, but for anyone else reading this in the future: Overflow on signed types is not defined, because that provides optimization opportunities for the compiler. Basically, the compiler assumes that every math operation on signed integers cannot overflow, and so it can generate more efficient code. This is not just theoretical, compilers really do use this. So if you just defined overflow for signed types, that would make existing code slower, sometimes much, much slower. Now you might say, "Why did they decide to tie the wrapping behavior to the signedness? Instead of forcing `unsigned` to also always implicitly mean `wrapping`, and `signed` to also mean `overflow_impossible`, why didn't they just add distinct `wrapping signed int` and `wrapping unsigned int` types?" And you would be completely right, that's probably what they should have done. But now C exists, and C++ is compatible with it, and people don't want their existing code to get slower. But hey, maybe one day they could fix this design mistake that's still haunting us so many decades later. I'm not holding out much hope, though.

@TranscendentBen Жыл бұрын

I did mention DSPs and their use of saturation (it's not just MATLAB), and there's a lot more than 10 people using saturation, but still it's apparently too niche of a feature.

@movax20h 4 жыл бұрын

12:25 Damn. What a prophet. I did watch this video, checked how is D compiler dealing with this on my platform, and it did actually good on "obvious" code (primary reason is that D has defined behavior on integer overflow and defined integer representation), but not so good on "workaround" cases. So yes, I did fill some bugs to gcc and llvm. :D Fortunately I can use __builtin_sadd_overflow in gdc very easily, and yes it does optimal code (especially after inlineing).

@styleisaweapon 11 ай бұрын

good evidence out of the math guys that sum infinite series that twos complement is more fundamental than programmers realize

@User-cv4ee Жыл бұрын

So, does this mean one can now rely on and assume 2's complement implementation after this passes the commitee?

@thomasweller7235 Жыл бұрын

3:22 how's that supposed to work? What about overflows(2,-1)? Don't consider UB at this stage. That code won't work in the first place. 6:40 won't that give a compiler warning since an unsigned is compared to a signed?

@filippol1138 6 жыл бұрын

So, if you make the storage 2 complement, but integer owerflow is still ub, then you cannot really rely on the fact that addition wraps on overflow? So I do not really see the point... Unless you do the addition yourself, but then that is way less expressive than writing a+b or using builtins. I do not see really the point of many suggestions. The overflow thing for example: the only example it would fix is that the overflow check (which to me is too weird anyways, much more expressive to cast to unsigned and then check) would be nicer and a bunch of infinite loops would disappear due to optimization, but is it really worth it? In the end, if I write something like (a+b) < a for natural numbers, I just wrote a statement which is always false for positive b, and integers are supposed to represent integer numbers. So the overflow check at the beginning is just madness to me. Because you are reasoning in terms of internal storage, instead of what an integer is supposed to represent...

@iddn 6 жыл бұрын

Wouldn't having both unsigned and signed overflow be UB break some std::hash algorithms?

@seditt5146 5 жыл бұрын

@Peterolen There are some optimizations for RingBuffers as well that rely on overflow to behave properly to prevent needing to perform a check every single iteration. Much easier to make the container sized as a multiple of 2 and use Bitoperations to wrap as it can greatly increase performance as you are getting savings on every single lookup or write into the buffer.

@TheMrKeksLp 4 жыл бұрын

Unsigned and signed overflow ARE always UB

@yasserarguelles6117 3 жыл бұрын

@@TheMrKeksLp No only signed integer overflow

@seditt5146 5 жыл бұрын

18:50 Atomic Ghandi was pretty much disproven by the developers. They just did not read state like that in a way that overflow could have mattered. Cool story , just sadly not real.

@TranscendentBen Жыл бұрын

Modern Use of Something Other Than 2's Complement (and it's not just MATLAB): en.wikipedia.org/wiki/Saturation_arithmetic#Implementations

@timothymusson5040 6 жыл бұрын

If volatile goes away, how does memory mapped IO work?

@nullplan01 6 жыл бұрын

The way it works right now, using inline assembly to force reads and writes to happen in program order. And the inline assembly is typically portable, because it contains no actual code.

@timothymusson5040 6 жыл бұрын

Could you elaborate or point to an example? Setting up with ‘volatile uint32_t* m_StatusReg = m_BaseAddr + m_STATUS_REG_OFFSET’ and then using m_StatusReg in a straightforward and obvious way has been working great. Is there unexpected code generated for this direct memory access?

@styleisaweapon 11 ай бұрын

memory mapped {fill in blank} is driven by either exceptions or hardware translation tables on all modern hardware .. really isnt anything about any programming language here

@Verrisin 3 жыл бұрын

EDIT: EVERY fing time..... I write a comment, and like magic, it's addressed a minute after XDXD Regarding overflow... It's insane people would have to write code to check it. I remember from school, CPU will _tell you*_ in a register. Shouldn't there be a built in way to check? Some kind of "add with check" like: add a b rslt = eax didOverflow = ... I always assumed this is how it is implemented... * and I remember the nice diagram, showing the carry bit setting the overflow flag

@Verrisin 3 жыл бұрын

yeah: status register; just looked it up - There is no way to access it from C++ without target specific asm ???

@obfuscator2 2 жыл бұрын

5:32 how is that code working? If lhs is INT_MAX and rhs is 1, you'll end up with an unsigned int with the value of "INT_MAX +1", which is roughly UINT_MAX/2, and isn't less than INT_MAX. So you're not detecting overflow from positive to negative ints, are you?

@styleisaweapon 11 ай бұрын

its your last assertion (that it isnt less than INT_MAX) that is in error. The result of that addition is INT_MIN which is most certainly less than INT_MAX

@cbehopkins 6 жыл бұрын

sizeof(void *) == 8: is this implying that c++ is not for use in the embedded (32 bit) world?

@chrishopkins2506 5 жыл бұрын

It's an interesting world if you're bothered enough about optimisation to use C++, but not bothered enough to use 32 bit pointers when you can get away with it. I'm not saying it could not exist, but the embedded world is almost certainly decades off being able to abandon

@flatfingertuning727 5 жыл бұрын

@Peterolen If an application needs to store a large number of pointers, but accesses less than four gigs of storage, keeping everything needed by the program within a 4-gig region of address space and using 32-bit pointers would likely improve cache performance even on a 64-bit machine. Given that many applications would have no need to access even four megs of storage--much less four gigs--I would expect that the performance benefits of 32-bit pointers to remain on any platforms that continue to support them.

@ssl3546 2 жыл бұрын

better solution - we sell a cheaply available CPU that uses 1s complement and one that uses sign-magnitude (like a the raspberry pi) so that people can test and fix their non-portable code. if code does not work on a big-endian, sign-magnitude machine it is broken.

@MatthijsvanDuin 6 жыл бұрын

54:10 _what_? why on earth would you consider "char" to be signed, given that it in practice it means "a byte from an utf-8 string or maybe a string that uses some legacy 8-bit encoding"?

@Bourg 6 жыл бұрын

Because in practice it is signed?

@MatthijsvanDuin 6 жыл бұрын

@@Bourg In practice it is architecture-dependent. char is unsigned on ARM for example.

@MatthijsvanDuin 6 жыл бұрын

@@Bourg It is also unsigned on PowerPC.

@Hauketal 6 жыл бұрын

It was the original sin of the IBM PC. K&R wrote in their C manual, before ANSI-C Editions, in one of the first paragraphs about char being unspecified wrt signedness; but all machine character set values were positive. The PC extended ASCII to 8 bits, but C compilers for it never acknowledged that as an extension, so they continued with the traditional char for Intel being signed. That's what you get for not RTFM. Now about 40 years later we have still to deal with it. :-(

@MatthijsvanDuin 6 жыл бұрын

@@Hauketal Bad or careless decisions getting enshrined really sucks. I also really hate that integer division is specified as being round towards zero rather than round down (with the obnoxious fallout that -1 % 4 is -1 instead of 3, and x/2 is not the same as x>>1)

@joshingaboutwithjosh 2 жыл бұрын

Ah atomic gandhi we meet again

@kwinzman 6 жыл бұрын

Good talk. But could you make the font on the slides a bit smaller? It's still readable sometimes.

@Bourg 6 жыл бұрын

ᴵ ᶜᵒᵘˡᵈ ᵐᵒˢᵗ ᶜᵉʳᵗᵃⁱⁿˡʸ ᵐᵃᵏᵉ ᵗʰᵉ ᵗᵉˣᵗ ˢᵐᵃˡˡᵉʳ!

@NicolayGiraldo 2 жыл бұрын

I would like to have fixed point representation for numbers between 1 and 0. Seems both very fast, and relevant now for neural networks.

@user-ni2od5lu6j 6 жыл бұрын

deprecating of volatile qualified member functions (p1152) is mistake, because if you miss one volatile qualifier in old code, you would get compiler error (calling non volatile member function from volatile ref/pointer) or still correct behavior (lost of volatile near root pointer, but calling still volatile function member, and may be warning by some tools). but if you deleted all volatile s from top of hierarchy and from member function qualifier, and putting them only to build in data types, you could easily lose one std::byte for example (and check tool could lose it too), and then if you really unlucky then all test and even test rocket launches passed ok, but some years after that new compiler may decide to optimize access to that std::byte other way around, and you would get rocket blowup. if whole region of memory marked as volatile any pointer/reference(that includes custom aggregate types with member functions) which points inside it should have volatile qualifier (no tool would warn about using memcpy from such pointer if volatile qualifier removed)

@TranscendentBen Жыл бұрын

I've done lots of embedded, and I don't know how a compiler would know not to optimize away or not to delay a write to a register (that otherwise looks like any memory location) without using the volatile keyword. Otherwise it thinks "the value at this location is never read back anywhere else, so I don't have to write it."

@FalcoGer Жыл бұрын

Name one cpu architecture, not even one targeted by c++, that doesn't use 2s complement for signed integer representation and that has signed integer representation. 2s complement is the natural choice because it allows for easy addition, which in turn results in less transistors, which makes chips smaller, cheaper and use less power. I think we can ditch the 0.000000000002% of programmers that deal with super special and niece hardware and make them do the workaround (or buy sane hardware) instead of everyone else having to deal with the compiler destroying our code. What are the most common architectures? intel family and intel compatible amd64 make up nearly the entire market, then arm family, power pc, z80, whatever crap apple produces, risc, atmel, and then a whole array of other micro-controllers, most, if not all, of which use 2s complement. Why would you want to support a 70 year old processors? Anything before 1970 doesn't exist anyway. Time began on The first of January 1970 after all. If overflow is a bug, then it's a bug. But unsigned integer overflow being a bug is also still a bug. The compiler silently optimizing out parts of my code is simply worse. If wrapping or trapping doesn't fix it, it would at least be a more noticeable error. If you have a bug in your program you want it to give a positive signal, that the programmer or the user has to handle, not silently do something. Throw an exception or trap. Spit out a stacktrace. How would you fix pacman or donkey kong? Using a larger integer type would just push the problem back. Would players actually reach level 2^32? Probably not. But it's still a bug.

@BlackBeltMonkeySong 6 жыл бұрын

Listened to the talk, still not sure why this is important.

@Bourg 6 жыл бұрын

Read your comment, still not sure how it's relevant.

@User-cv4ee Жыл бұрын

@@Bourg The talk was great! However, it did left me wondering what did we gain by defining the storage but still not being able to rely on it since the arithmetic is undefined. Can you expand on that please? Much appreciated.

@dipi71 6 жыл бұрын

I wish C++ had an elementary, built-in and highly optimized Integer type that never overflows but transparently expands the range of a specific integer value - like in Ruby. At 47:03 the objection to this is »I don’t want my addition to allocate«. Well, I don’t mind - especially if this extra allocation occurs about once in millions of additions. As soon as such a »BigInt« is to be incorporated into a fixed-size data structure you’d have to check its size, but for storage purposes BigInts can be thought of just as variable-sized Unicode strings. Computers have gotten pretty good at that, or so I’ve heard. Cheers!

@brenogi 6 жыл бұрын

Is it possible to implement that without doing a check for every operation to know if it will overflow and allocate?

@dipi71 6 жыл бұрын

Breno Guimarães replied (although his comment not showing up here): »Is it possible to implement that without doing a check for every operation to know if it will overflow and allocate?« - I don’t think it’s possible to avoid _some_ kind of check if you’re aming for correct results. The cases where you have to squeeze every bit of performance out of integer arithmetic may be less common than you think, though: maybe in the case of data moshing for fast randomization or some kinds of real-time DSP stuff where overflows just become part of noise; or algorithms where you can afford to perform a large amount of unchecked integer calculations and then check the overflow bit once at the end. Exceptions aside, and if overflow bugs like those listed by JF Bastien in this video from 16:30 to 24:24 are to be avoided, we ought to value _robust and correct code_ over the theoretical extremes of the fastest execution possible. (This guideline ought to be applied to hardware as well - consider debacles like Rowhammer, Meltdown and the variations of Spectre having one thing in common: fetishizing speed over safety and putting performance over security.)

@brenogi 6 жыл бұрын

@@dipi71 Well, in my world, those checks are unacceptable. I use C++ (also) because I need every bit of performance I can get. I don't want to pay for what I don't care about. And if I need the BigInt, there are library solutions for that, which are more than enough for where I need it. But I have no idea what type of code is out there, so I can't say what is the preference of the majority of the C++ codebases. I can only add my 2 cents.

@zekilk 6 жыл бұрын

The datatype you're looking for is a bignum. The C++ Standard Library doesn't have a bignum class but you can easily create your own with basic C++. Bignums incur a lot of overhead since most processors don't have built in mechanisms to support it and it's overkill for most projects. If you are that crazy for a number that'll never overflow in proper program execution, you can always use a signed 64bit integer. Those things can represent numbers lower than -9'000'000'000'000'000'000 to numbers higher than 9'000'000'000'000'000'000. It'll still preform much faster than the most optimal bignum implementation.

@dipi71 6 жыл бұрын

@@zekilk Again, I stress the importance of safe and robust and correct code over maximum execution speed. Yes, lacking a proper native data type, you can use a Bignum class, but it will affect the readability of the code. And of course, every CPU I know has a mechanism for fast overflow checks: the overflow bit.