What Happens After The Compiler in C++ (How Linking Works) - Anders Schau Knatten - C++ on Sea 2023

  Рет қаралды 12,195

cpponsea

cpponsea

Күн бұрын

cpponsea.uk/
---
What Happens After The Compiler in C++ (How Linking Works) - Anders Schau Knatten - C++ on Sea 2023
We all know roughly what the compiler does, it translates your source code into machine code. Thanks to Compiler Explorer, many of us now even look at the generated Assembly code.
But wait a minute, that code is full of labels and function names, the CPU knows of no such things! Most of these are also defined in different compilation units, how can we jump to code we don't know where comes from? And even for our own compilation unit, how can the compiler know where in memory the machine code will eventually be loaded, so it can generate the right jumps? Even worse, what if that function comes from a dynamic shared object?
This talk gives an introduction to how the compiler, linker, loader and operating system cooperates to get from a compilation unit to a running process. We'll look at static and dynamic linking, relocations, position independent code, sections and segments and virtual memory. The talk covers Linux only, but similar principles apply on Windows and Mac.
---
Slides: github.com/phi...
Sponsored By think-cell: www.think-cell...
---
Anders Schau Knatten
Anders started programming in Turbo Pascal in 1995, and has been programming professionally in various languages since 2001. He's currently a principal developer at Ascenium, working on a new general-purpose CPU design. He's the author of cppquiz.org and the blog C++ on a Friday.
---
C++ on Sea is an annual C++ and coding conference, in Folkestone, in the UK.
- Annual C++ on Sea, C++ conference: cpponsea.uk/
- 2023 Program: cpponsea.uk/20...
- Twitter: / cpponsea
---
KZbin Videos Filmed, Edited & Optimised by Digital Medium: events.digital...
#cpp​ #cpponsea​ #compiler

Пікірлер: 23
@hedgechasing
@hedgechasing Жыл бұрын
Around 8:50 the mov eax, 0 before the call is actually not about main returning zero if you don’t specify anything, but actually part of the C abi. The functions here are written with nothing in the parenthesis as is normal in C++ but in C empty parenthesis does not mean no arguments, it actually declares a K&R style function with an unknown number of arguments of unknown types. The actually definition would need to specify args in order to use them, but callers could just write extern void whatever() to declare them since K&R function calls are not type checked. What the 0 specifically represents is the number of vector arguments (usually floating point values) passed to the function. Variadic functions need to know how many registers to save and so that value allows them to have an upper bound. If the empty parens were replaced with (void) the mov eax, 0 would go away even on -O0 and without making that change it will persist even at higher levels of optimization (assuming that the two functions are actually in two translation units so the function doesn’t get inlined)
@andersknatten
@andersknatten 11 ай бұрын
Thanks for correcting that! I had forgotten about this difference between C and C++. I'm mostly writing C++, I guess it shows.:)
@nitsanbh
@nitsanbh 11 ай бұрын
I learned so much from this talk, Thank you!
@deckard5pegasus673
@deckard5pegasus673 11 ай бұрын
50:14 -fpic = enforce memory limits on the size of the GOT. -fPIC = no size limit for the GOT
@Byynx
@Byynx 6 ай бұрын
This video is a gem!!!
@widnyj5561
@widnyj5561 Жыл бұрын
The part about function calling near the end was the most interesting
@VincentZalzal
@VincentZalzal Жыл бұрын
Excellent talk, the clearest I've seen on this topic!
@cpponsea
@cpponsea 11 ай бұрын
Great to hear! Thank you for your comment.
@andersknatten
@andersknatten 11 ай бұрын
Thank you! I'm very happy to hear that.
@alx9r
@alx9r 11 ай бұрын
I can also recommend James McNellis’ “Everything you wanted to know about DLLs” on this topic.
@denisfedotov6954
@denisfedotov6954 11 ай бұрын
Nice talk! However, lazy binding is disabled by default in modern Linux distributions as one of the attack mitigation techniques so that plt table is read-only during program execution. This is known as RELRO.
@andersknatten
@andersknatten 11 ай бұрын
Thanks, I didn't know that.
@rezwanarefin3493
@rezwanarefin3493 8 ай бұрын
18:05 Actually the compiler does know which compute() function you are calling in this example, compute() was in the same file. In fact, even at -O1 it will remove the call and inline compute(). The compiler wouldn't know that if compute() was not available in the same translation unit.
@dascandy
@dascandy Жыл бұрын
@6:21 Middle line on the right has "48 89 e5" which is the start of your compute function, bottom left has "b8 01 00 00 00" which moves 0 into eax, followed by 5d (pop ebp) and c3 (ret).
@andersknatten
@andersknatten Жыл бұрын
Yeah, that's what I'm trying to point out at @7:25 too.
@Danielm103
@Danielm103 Жыл бұрын
Awesome talk, interested to know what, Use Link Time Code Generation, and other optimizations like COMDAT folding and /OPT:REF do
@rinket7779
@rinket7779 4 күн бұрын
jhe didn't explain why it says "call 21" (at 11:42) nor did he explain why a reolocation is necessary either at this point - he might explain it later, but it's super confusing he didn't explain WHY at the beginning, esp since 'compute()' is in the same translatino unit as 'main'
@mikefochtman7164
@mikefochtman7164 11 ай бұрын
Boy this explains a lot of nitty-gritty details. We had an application that required several separate processes to have access to a large block of common memory (about 64kbytes). We did this by defining a large int-array in a shared object and initializing it to non-zero. This was 20 some years ago, it might not still work, I don't know. But by initializing it the array was put in the shared .data segment. So each process had access to the same large array and one process could 'see' what another process wrote. (yes, there were other concerns about collisions and such, but the gist of it was that the DLL and its .data segment where shared by all)
@kayakMike1000
@kayakMike1000 10 ай бұрын
Well, I suppose you could put a lock on that shared memory to ensure concurrent integrity.
@gustavbw
@gustavbw 11 ай бұрын
53:20 (on lazy-loading): I understand the concept as being partially preparing data when declared, and only loading the full extent when used (or not even then) - or, disguising accessing some data as actually fetching it first, meaning it is declared, you can reference it, but it's not actually there. Instead the instructions to get it there is. What you're describing sounds to me like caching - i.e. storing the output of some functionality in an easily accessible way so that you do not have to invoke said functionality again. But I might be off here (also I come from a very much not systems/compiler background so I completely understand if "lazy-loading" is the term used for it in your field). Side question: Would this mean that you could have runtime dynamic linking if you implemented cache invalidation for this step of the process? (i.e. be able to change bits of the machine code as stored on disk, which when the invalidation occurs, would take effect?)
@andersknatten
@andersknatten 10 ай бұрын
Yes, lazy loading is a good way to describe this! I guess you could do some sort of runtime dynamic linking if you had some way of resetting the GOT to point back into the PLT stub and then convince the dynamic linker/loader to load something different next time. Provided that you have prepared GOT/PLT entries for everything during compilation. Depending on what you mean by runtime dynamic linking of course, I'm just replying very generally here. Note, btw, that we never change any *machine code* here, we only change data. It's just pointers in the GOT that are updated, from pointing at the stubs in the PLT to pointing at the real functions.
@rudalert
@rudalert 11 ай бұрын
Thank you for the interesting talk! Question about the last chapter: will the loader copy ("load") the function from the shared object into the .got section? I am confused how the state (if the function has any) is differentiated between the processes using the same shared object.
@andersknatten
@andersknatten 11 ай бұрын
What kind of state are you thinking of? If you're thinking of function arguments and local variables, these go on the stack or in registers, which are unique to each process and in fact each invocation in that process. If you're thinking of local static variables, these go in data sections like `.data`, which each process gets a unique copy of. Only the read-only segments are shared between processes.
Typical C++, but Why? - Björn Fahller - C++ on Sea 2023
50:49
Angry Sigma Dog 🤣🤣 Aayush #momson #memes #funny #comedy
00:16
ASquare Crew
Рет қаралды 49 МЛН
Or is Harriet Quinn good? #cosplay#joker #Harriet Quinn
00:20
佐助与鸣人
Рет қаралды 50 МЛН
Modus males sekolah
00:14
fitrop
Рет қаралды 15 МЛН
What Happens When You Click a Link? - Computerphile
9:51
Computerphile
Рет қаралды 199 М.
Compilers, How They Work, And Writing Them From Scratch
23:53
Adam McDaniel
Рет қаралды 175 М.
Angry Sigma Dog 🤣🤣 Aayush #momson #memes #funny #comedy
00:16
ASquare Crew
Рет қаралды 49 МЛН