Why LLVM is a Game Changer for Compilers

Рет қаралды 25,715

The Coding Gopher

Күн бұрын

Пікірлер: 109

@empathy_monster 2 ай бұрын

Me: "Hello, world" in C Chris Lattner: programs in LLVM IR 💀

@TheCodingGopher 2 ай бұрын

We all start somewhere 😂... except if you're Chris Lattner

@empathy_monster 2 ай бұрын

@TheCodingGopher True lol! Btw you should check out the MLIR (Multi-Level IR) he's working on. Really fascinating stuff! Definitely not surprised it's his new venture.

@TheCodingGopher 2 ай бұрын

@@empathy_monster Will check it out! Also looks to be a potential future video idea

@TheCoder-1924 Ай бұрын

@@empathy_monster how would MLIR be better than normal IR?

@HamzaAli-hh7ub 2 ай бұрын

A very good video. I like the concept of optimizing a pipeline to be used for different languages by just switching the front-end.

@TheCodingGopher 2 ай бұрын

It's an excellent, modular architecture - glad you enjoyed :)

@gackmcshite4724 Ай бұрын

Keep in mind, the front is an end, the back is an end, the middle is just the middle. The ends are at the front and back.

@TheCoder-1924 Ай бұрын

@@gackmcshite4724 isn't middle also an end? supports front and back

@Ntare_12 2 ай бұрын

Thanks for the great explanation

@TheCodingGopher 2 ай бұрын

You're welcome!

@rubensr28 2 ай бұрын

Just found your channel. Congratulations. The content is really good

@TheCodingGopher 2 ай бұрын

Thank you for the kind words!

@dosergiobr Ай бұрын

Fantastic!!

@TheCodingGopher Ай бұрын

❤

@kvelez Ай бұрын

Very interesting.

@TheCodingGopher Ай бұрын

🚀

@caffeine01 Ай бұрын

Great video! Explained very well! Inspires me to create my own llvm frontend for fun!

@TheCodingGopher Ай бұрын

That's the best outcome of the video - glad you enjoyed 😊

@bjo004 Ай бұрын

Can you please compare gcc with llvm?

@TheCodingGopher Ай бұрын

Yes - stay tuned. That's going to be a follow-up video

@b1ank_0111 2 ай бұрын

Very nice explanation 👌

@TheCodingGopher 2 ай бұрын

Thanks!

@sidreddy7030 Ай бұрын

Wow this is actually such agreat video. Loved it

@TheCodingGopher Ай бұрын

Thank you! 👊

@TheCoder-1924 Ай бұрын

really really good content

@AjinkyaJoshi-g5j 2 ай бұрын

Is the middle end optimization done entirely in assembly?

@TheCodingGopher 2 ай бұрын

It's done on LLVM IR (which retains more high-level info than assembly). All of the optimizations (e.g. inlining, loop unrolling, constant propagation, etc.) happen on this low-level code representation

@TomLeg 2 ай бұрын

Wonderful marketing video! But I would have liked a comparison with the other popular system, GCC. What are the benefits and downfalls of each? After all, GNU supports more languages, to varying degrees, and supports more backends.

@TheCodingGopher 2 ай бұрын

Good question! LLVM is modular and has clear diagnostics, more flexible optimizations, and a permissive license - so, it's strong for custom tooling / commercial use. The downside is that it supports fewer languages and architectures than GCC (as you mentioned). As for GCC - it supports more languages and architectures with mature optimizations, but it’s less modular and has less clear diagnostics - so it's ideal for traditional compiling but harder to customize. I'll make a follow-up video distinguishing between the two :)

@redcrafterlppa303 2 ай бұрын

I'm currently debating rather I write a full compiler for my language or just a llvm frontend. The later being considerably easier but limiting to the possibilities in ir. Possibly making complex low level features I want in my language more difficult to implement.

@vilian9185 Ай бұрын

What limitation would IR have

@redcrafterlppa303 Ай бұрын

@vilian9185 in my case I have some special stack layout that is complicated to keep sound. I'm not sure to what extent I would/will be able to describe that in ir. Also the optimizer might introduce bugs when ut encounters such unique code and memory layout. But I haven't tried it yet so I can just speculate.

@TheCodingGopher Ай бұрын

@@redcrafterlppa303 If you’re aiming for a unique stack layout and tight control over low-level features, a full compiler will give you more control over optimization. LLVM will definitely be easier implementation-wise - but you can get unexpected behaviour with very specialized code.

@gackmcshite4724 Ай бұрын

You're going to need a parser at least. If you keep the interface clean, you can output to your own IR, or switch that to llvm IR. Your own IR can be a serialized version of the parser results, so no rework required in that case.

@redcrafterlppa303 Ай бұрын

@gackmcshite4724 I'm already writing the parser. But rather I write my own IR and a backend for that or csn use llvm IR and backend is the question.

@nnov_tech_chan7891 Ай бұрын

There is tiny c compiler (or TCC for short) that does the same thing but easier

@TheCodingGopher Ай бұрын

TIL!

@LifeIsACurse 2 ай бұрын

a great video @The Coding Gopher! explained it spot on... also perfectly timed, since i was just reading up on llvm ir for a day or two, which made it feel a bit uncanny to get the recommendation now lol it's for a reverse engineering project. do you happen to know if llvm ir is readable or can be shown with a public tool? (just to gain more understanding how it looks and works like) in particular i am interested in tools/libraries which seem to be able to reverse the process by generating llvm ir from machine code. so far i have remill and angr in my eyes for future testing, but nothing done yet.

@TheCodingGopher 2 ай бұрын

Hey, glad you liked the video! 😄 It’s always cool when things sync up like that. To answer your question, yes, you can definitely view LLVM IR with some tools! E.g. llvm-dis (part of the LLVM suite), can turn LLVM bitcode into a readable format, which is great for inspecting how IR looks / functions. Remill can lift machine code to LLVM IR - and is ideal for binary analysis. Based on a cursory look, Angr does the same but not for LLVM IR - it uses VEX IR. Will do some digging, and see what else I find

@LifeIsACurse Ай бұрын

@@TheCodingGopher you ain't obligated to, just if it's interesting to you ^^

@TheCoder-1924 Ай бұрын

@@LifeIsACurse big coincidence

@modolief Ай бұрын

"Middle end?" How would it be an "end" if it's in the middle?

@PerriPaprikash Ай бұрын

the actual term is layer, front-end, back-end, middle layer

@TheCoder-1924 Ай бұрын

@@PerriPaprikash I think layer is a better term. but it sounds like front-end and back-end come from full stack?

@izumiosana 2 ай бұрын

How is even possible for Clang to use LLVM while LLVM itself written in C?

@TheCodingGopher 2 ай бұрын

Both are written in C++. Clang and LLVM communicate through IR, so LLVM's implementation language (in this case, C++) doesn’t affect their compatibility. Clang generates LLVM IR from source code - and LLVM then optimizes and compiles that IR.

@trendysupastar 2 ай бұрын

I was also confused the first time I heard something like a compiler could be written in the language it is supposed to compile (like C compiler built with C). But here’s how I think about it now: Imagine LLVM as a tool that converts IR to machine code. What LLVM itself is written in doesn’t really matter, it's just a program that does a job. In this case, the job is to take IR and produce machine code from it. Using C, or any language really, you can write a compiler frontend (like Clang) that reads C files as input and generates IR, which is then fed to LLVM to get machine code. So, whether LLVM is written in C++, Rust, or anything else, it’s just a tool in the toolchain that does what it’s designed to do.

@TheCodingGopher 2 ай бұрын

@@trendysupastar Nice explanation! 💯

@TheCoder-1924 Ай бұрын

@@trendysupastar how would you decide to build something in C, C++, or rust? is there some rules to follow

@alikhatami6610 2 ай бұрын

thank you chatgpt !!!

@TheCodingGopher 2 ай бұрын

Gemini :)

@suchiman123 2 ай бұрын

One should note though that LLVM JIT is hardly used because it is just too slow, JIT compilers run in constrained time and must focus on the optimizations that matter the most, which LLVM doesn't

@TheCodingGopher 2 ай бұрын

Yes - good point! Would like to point out that LLVM's JIT is still used in some languages like Julia. But yes, in cases where fast JIT responses are needed / there exist tight timing constraints, we tend to go for more lightweight options like V8 or LuaJIT.

@kshitijbhagawati1157 2 ай бұрын

great video

@TheCodingGopher 2 ай бұрын

Thank you

@AterNyctos 2 ай бұрын

Nice video

@TheCodingGopher 2 ай бұрын

Thanks for watching

@dadecky5276 Ай бұрын

wow nice video

@dadecky5276 Ай бұрын

keep it up man

@TheCodingGopher Ай бұрын

@@dadecky5276 👊👊!

@hyunhocho7106 2 ай бұрын

I wonder why GCC doesn't have a framework like LLVM.

@oserodal2702 2 ай бұрын

AFAIK, GCC generates its IR in bytecode in memory during compilation.

@TheCodingGopher 2 ай бұрын

It's mainly due to GCC and LLVM having different architectures. LLVM is designed as a modular framework (i.e. has reusable components like LLVM IR). GCC is more focused on generating its IR on-the-fly in memory (direct compilation), which limits modular reuse.

@TheCoder-1924 Ай бұрын

@@oserodal2702 i thought only java uses bytecode. or is it like an analogue

@ellehooq 2 ай бұрын

Bro recorded this in a cave

@TheCodingGopher 2 ай бұрын

That's a good one 😂

@1Lll_llllllLLLLllllll_llL1 Ай бұрын

man u have a nice icon!

@TheCodingGopher Ай бұрын

It's so cute right?

@whtiequillBj Ай бұрын

wouldn't loop unrolling reduce optimization due to possible improper branch prediction? if you optimize bad code you have, bad optimized code. you don't get good coding practices from good optimization.

@TheCodingGopher Ай бұрын

Correct, but I wouldn't say that as a blanket statement - it depends on the specific code/hardware. Loop unrolling can impact branch prediction negatively if the control flow becomes irregular; but - in many cases, unrolling reduces the number of branches (which improves prediction/performance). Optimization works best when the code is already well-structured, where it refines existing logic rather than correcting fundamental issues (which it fails to do!). Good optimization and good coding practices are complementary. :) Good point!

@StevenSiew2 2 ай бұрын

Julia uses LLVM. This allows automatic speed upgrade that is independent of Julia high level language.

@kenamreemas3295 2 ай бұрын

So, we transpile our code to llvm code (something like bytecode in java but from a range of languages) and than compile that llvm code?

@TheCodingGopher 2 ай бұрын

Yep! The flow looks like: 1. Source Code → LLVM IR (e.g. bytecode in Java - but platform-independent) 2. Optimization 3. LLVM IR → Machine Code

@Elektrolite111 2 ай бұрын

May I suggest using KZbin subtitles instead of embedded subtitles so we have the option of turning them off

@TheCodingGopher 2 ай бұрын

Thank you for the suggestion. I'm a bit conflicted at the moment - as on one hand the subtitles may make the video more engaging (and hence, improve retention). On the other hand, some viewers like yourself prefer them off. I think I should A/B test this

@Elektrolite111 2 ай бұрын

@ I think if you add subtitles on KZbin they are on by default, so it would be almost the same thing

@ramos_4892 2 ай бұрын

@@TheCodingGopher Keep the embedded subtitles. It does improve engagement

@w花b Ай бұрын

@@TheCodingGopher I'm definitely more engaged in the video with these. With KZbin subtitles, I just disable them manually and end up leaving the video. Gotta forcefeed us your vegetables or they're ending up on the plate's border if given the choice.

@TheCodingGopher Ай бұрын

@@ramos_4892 Thanks for the feedback

@guilherme5094 2 ай бұрын

👍

@TheCodingGopher 2 ай бұрын

🚀

@Person-who-exists 18 күн бұрын

Better than JVM!

@rursus8354 2 ай бұрын

Well ... the technical description of LLVM is correct, it's just that other compiler infrastructure do it the same way - in particular gcc, while the video seems to insinuate they don't, and that these features are the huge advantage of LLVM. That's not it. The great advantage of LLVM is that it is better documented, and (probably) that the organisation around it is more bazaary in contradistinction to the pretty cathedraly gcc development model.

@vilian9185 Ай бұрын

GCC don't compile to IR and optimize there

@TheCodingGopher Ай бұрын

You’re right that LLVM’s IR isn’t unique - GCC has similar features. What sets LLVM apart is its modularity, easier integration, and its permissive licensing (Apache 2.0), which makes it more flexible / developer-friendly. A future video will distinguish between the two :)

@raheelrehmt Ай бұрын

5+3+0=80

@TheCodingGopher Ай бұрын

🍄

@raheelrehmt Ай бұрын

&:₩!

@raheelrehmt Ай бұрын

@TheCodingGopher 🌵

@raheelrehmt 23 күн бұрын

🥦

@TheCodingGopher 23 күн бұрын

@raheelrehmt You got me confused 😂

@revengerwizard 2 ай бұрын

Definitely don’t use LLVM for a JIT compiler, unless you want abysmal performance…

@TheCodingGopher 2 ай бұрын

Yes - this is true but context-dependent. LLVM is slower than other options for JIT compilation because it prioritizes generating highly optimized code. For use cases that demand extremely fast JIT compilation (e.g. dynamic scripting environments / cases with frequent recompilation), LLVM will feel "heavy". That being said, LLVM is still widely used in JIT compilers for projects that need optimization quality (e.g. Julia, Clang). So, if high peak performance is critical and slower compile time can be tolerated, LLVM is still a solid choice. But if you really need faster compilation with decent optimization, LuaJIT is probably a better choice.

@TheCoder-1924 Ай бұрын

@revengerwizard isn't that a main use case

@revengerwizard Ай бұрын

@@TheCoder-1924 Definitely not. I mean, for a static language it might be good enough. JIT compilation is based on the theory that it would take less time to compile to native code than it would take to execute it normally (usually in some kind of interpreter). LLVM doesn’t satisfy this criteria

@TheCoder-1924 Ай бұрын

@@revengerwizard why only for static languages? I thought the idea was to check if there is frequently executed code and then to pre-compile it? or just re-use it. maybe I am misunderstanding but I think there are two paths: 1 is the interpreted, 2 is the jit-compiled. is this correct?