1000: Ben Titzer
1:47:14
4 ай бұрын
111: Kay Li
1:18:12
4 ай бұрын
110: Rick Altherr
2:16:38
4 ай бұрын
101: Matt Godbolt
2:29:23
5 ай бұрын
100: Nathanael Huffman
1:47:46
5 ай бұрын
Пікірлер
@axelBr1
@axelBr1 28 күн бұрын
Compiler Explorer is amazing. One of the design choices in C++ (and possibly other languages, (Java, Javascript? a long time since I used them)) is where to create your instances. A best practice is where they are used, but what happens if that is in a loop? I always thought that for performance reasons that it would be better to create the instances required in the loop, before entering the loop. Then one day it dawned on me that compilers are pretty smart and I can create the instance within the loop and the compiler will move the creation of the instance to before the loop. Using Compiler Explorer I was surprised the find that the compiler is even smarter than that, as because it knows the instance isn't used outside of the loop, it doesn't need to create the instance at all.
@Dygear
@Dygear 2 ай бұрын
No, they are putting a racing slick on it tho.
@compu85
@compu85 2 ай бұрын
Great interview!! Thanks for posting it!
@MichaelKahle
@MichaelKahle 2 ай бұрын
Nice interview. I'd be interested to know WHAT parts of the Oxide stack are they not able to get support and code from the vendor. What incentive does a hardware manufacturer have to keep their drivers closed source?
@THB192
@THB192 2 ай бұрын
On their podcast they've said they don't own the PSP, and they've said they also don't own the firmware on some of the power supply parts. Probably some other stuff, too. I'm pretty sure there is no open FTL layer on any SSDs, so that's probably also closed off.
@jasonleschnik9780
@jasonleschnik9780 3 ай бұрын
Years of abstraction and poorly designed firmware to "get the job done" have accumulated an enormous amount of technical debt. This feels like a really refreshing approach, "Clean slate Cloud" with a hint of Bryan wanting a similar visibility in Hardware that DTrace afforded software. Bravo.
@24playermaker
@24playermaker 3 ай бұрын
Kay seem to be a very smart guy, however lacks understanding on full ASIC product life cycle. Throughout the conversation, he made statements that were ambiguous or flat-out wrong. For instance, claiming that the reason XXX company's chip was delayed was due to an escape of a bug....While that may be true or false, there's always more to a delay of a chip. Ranging from poor initial planning, last minute features, process technology yielding issues, etc.
@24playermaker
@24playermaker 3 ай бұрын
I would have to disagree on the statement made about Verification. Most of the work is in verification, hence therefore most people end up doing it no matter if they were top of their class. In fact, i know many highly smart individuals who have only work in verification.
@newtonchutney
@newtonchutney 3 ай бұрын
Dan, what IEMs do you use? It kinda looks like your spects' arms reach into your ears.. 😅 Anyhow, please do continue making clips, I really wish to listen to your podcasts, but I end up not having time.. 🥲 Awsm job!
@angeldude101
@angeldude101 4 ай бұрын
The biggest issue that I see is that people rely on this crutch _so much_ that an immense amount of effort and money has been put in to making it as fast as possible, to the point that it can often be faster than the much simpler integer arithmetic.
@adama7752
@adama7752 4 ай бұрын
1:45:00 Web Assembly could catch up to hardware of 20 years ago (Thread/MultiCPU).
@alecsei393ify
@alecsei393ify 4 ай бұрын
Very good, content!
@KabelkowyJoe
@KabelkowyJoe 4 ай бұрын
Tried to listen all he was sooo annoying T H here was involved in making Itanium cant even mention his name AI of YT removes any comment cause he devunks climate scam, got me interested in Itanium this goy dont even understand this small part, so unlucky architecture it was just like Transmeta real VLIW not fake as their
@borealis8uno
@borealis8uno 4 ай бұрын
If you have in the RAM speed a bottleneck, Z80 wins. Amstrad/Schneider CPC is an example: if designed with the MOS 6502, it would have had it at 1Mhz, while with the Z80 it was able to run at 4Mhz (~3.3 effective, but still something more in terms of computing power compared to the MOS)...
@BGBTech
@BGBTech 4 ай бұрын
For testing my hobby CPU core, I have mostly been using Verilator to good effect (and only occasionally get around to running on an actual FPGA), but thus far it mostly works OK. Though, I also make fairly heavy use of an instruction set level emulator, as it can run programs at roughly the same speeds they would run on the FPGA, whereas Verilator is several orders of magnitude slower. But, I can also note that I don't use any 3rd party logic or modules (and pretty much no budget; as my CPU/ISA is a single-person hobby project), ... I don't think I have done too badly though (and it can pass the "does it run Doom?" test at least). Nevermind the seemingly never ending battle with deficiencies and bugs on the software side of things (the "OS" I am running on it still falls well short of a real OS; and porting another more mainstream OS to my custom ISA seems like a fairly big effort; as I am also using a non-mainstream C compiler, with binaries based on a modified PE/COFF, etc...).
@johnjakson444
@johnjakson444 4 ай бұрын
I was in need of a Verilog cycle simulater some 30years ago, I was writing a Verilog style code in C and it was a horrible experience, I even had a crude C to verilog for FPGA synthesis. I took a few weeks off to create a cross compiler from Verilog to C (V2C) and within a short time had a decently useful tool, that could handle single clock domain RTL code. There was a rule that had to be followed, every Verilog module had to have all outputs registered and all assign statements had to be in time/event order as if it were C code. If those constraint are adhered to, then 1000s of lines of Verilog could be exploded into 10,000s lines of C code with the gloabl register at the end as C global ints. The only snag I hit was that the Visual C++ compiler could not compile a function bigger than maybe 500 LOC, so I had to add a partitioner to break the 10,000 LOC single function into 100s of functions about a 100 LOC each. All the states of the chips modulles become an int array of maybe 100,000 nets. The final performance was that every Verilog assignment would be replaced by a similar C style assignment, with the tool taking care of building the rats nest. Its a shame I did not maintain the code and keep it up to date, I have been meaning to go back to it but I don't do ASIC /FPGA design anymore. Such a tool could be useful in allowing C++ code to be written as a mix of C procedural code with HDL/RTL blocks in C form that can be guaranteed to perform properly as if they were hardware. I even envisauge a GUI frontend written as hardware with RTL widgets.
@womp6338
@womp6338 4 ай бұрын
They don’t
@kayakMike1000
@kayakMike1000 4 ай бұрын
x86... Sadly you can't really know because of the management engine. Arm has Trustzone, which is only as good as you trust Arm or the vendor.
@Zaniahiononzenbei
@Zaniahiononzenbei 4 ай бұрын
It's so regrettable that we haven't advanced much from 1970's software. Backwards compatibility is great, but jesus, the structuing of powershell/NuShell is so much better.
@andrewgrant788
@andrewgrant788 4 ай бұрын
The BBC Micro used a 6502, the ZX Spectrum used a Z80. Clearly the 6502 must be better.
@HarshKapadia
@HarshKapadia 4 ай бұрын
Interesting! Thank you!
@franciscotoro827
@franciscotoro827 4 ай бұрын
I have a question I hope some one can shed some light on for me. These 2 processors, were developed in the 70's and used though out the 80's even into the 90's even in use today for some things. question 1 was the 6205 made in the 70's the exact same as ones made in 1988. like a modded 6502 was used in a 2600, but then standard 6502 was in the 400 and 800, the 5200 and lynx, it was used in all early Comidors, and apple 1,2,3..... my question were processors super simple in there function back then, or sooo capable that the rest of the computer had to catch up with it, like ram and co processors, and video components ? Like if you look at games from 1980 on home pc's and systems compared to a game from 1989, the difference is almost as big as the jump from Wolfenstien to Halo, but now imagine I told you both games were run on the same CPU.... I think you would have questions
@BobBeatski71
@BobBeatski71 4 ай бұрын
Hmmm, Java byte code to MC68000 assembly... brb....
@jk55.
@jk55. 4 ай бұрын
👍
@testolog
@testolog 4 ай бұрын
Anyway chine will destroy half internet in the world when they invade to Taiwan. Because a lot telecommunication company just a buy electronics what was made in China)
@josephlunderville3195
@josephlunderville3195 4 ай бұрын
Love the slow pace and the combination of history and expertise here. I don't know if what you're doing here will be broadly popular but it's deeply meaningful to me as a technologist who loves knowing more about how the tools i work with came to be, and I bet it's important to a swathe of historians in the future too.
@capability-snob
@capability-snob 4 ай бұрын
2:01:43 that's The Mill. I'm not entirely on board with Matt's point here. It's less about trusting compilers, and more about putting the power in the hand of the developer - whether that's someone writing a compiler, a language runtime, or hand cobbling assembler. It sure would be nice to have better tools for indirect branches in ia64, but it's hard to argue with advance and speculative loads, and being able to branch on whether they are successful. No spectres here.
@Calilasseia
@Calilasseia 4 ай бұрын
Ah, so the Z80 only had a 4-bit ALU? That explains how they were able to implement a half carry bit for the DAA instruction in the flags register! And solves neatly a problem for emulator writers too. Wish I'd known this years ago!
@chaitanyakumar3809
@chaitanyakumar3809 4 ай бұрын
Where should one look for documentation of these reverse engineering processes?
@johnjakson444
@johnjakson444 4 ай бұрын
in 1979 I had the privilage of reverse engineering about a dozen processor chips, F8, 1802, 8080, 8085, Z80, 9900, 8086, 68000, Z8000, and others. By far my favourites were the 16/32 bit machines. For its MOS simplicity the 9900 was a marvel, it was really only a 1bit serial cpu so was very economical with transisters and at speed could really only run at maybe 250kips taking atleast 18 cycles for 16 bit register ops but it had the grown up architecture of a mini computer which it was derived from TI990. I had programmed the 9900 at TI too so I was biased. Later the 68000 would be my wife. As for the 6502 vs z80, 2 very different beasts neither of which I would want to write grown up code for. Of course my real job at the time was to work on the design of a UK microprocessor for parallel programing. I was impressed with my BBC though, decent piece of kit with loads of language compilers for it, I was much less impressed by that Sinclair Ql POS with the crippled 68008 in it but that was replaced with a Mac ASAP.
@imgod2u
@imgod2u 4 ай бұрын
Itanium wasn't the only failure of VLIW for general computing. You all are old enough to remember Transmeta and their efforts and its spiritual and technical successor, nVidia's Project Denver. The argument of VLIW being capable of general compute works fine (even with the complexity of software memory management) if and only if you avoid variable latency. Which can be done if you can fit everything inside SRAM. As soon as you go to DRAM that entire argument gets thrown out. Especially if you go to DRAM and you're not the only processor trying to access DRAM. For less general compute workloads (DSP's, AI processors), sure, VLIW is great. Yet -- as almost every AI accelerator architect is learning -- there's still a lotta general compute that they need to stick a (usually small) CPU like a SiFive or even a not-so-small CPU (like Grace) because in between chained models (not to mention dynamism within models), you need some branchy code and real-time buffer management that simply can't be offline compiled.
@KabelkowyJoe
@KabelkowyJoe 4 ай бұрын
0:45 Wait what? Wasn't MacOS compiled to other platforms ever since NeXT Step 90s !? Wasnt compiled to ARM already? Oh yeah "quickly port their hardware performance analysis tools" Their hardware PERFORMANCE analysis TOOLS! Debugers, profilers because it wasn't cross compiled anymore.
@KabelkowyJoe
@KabelkowyJoe 4 ай бұрын
1:45 PDF kinda thing
@KabelkowyJoe
@KabelkowyJoe 4 ай бұрын
2:10 That is sign of lie
@ssl3546
@ssl3546 4 ай бұрын
I wish the full videos would have the video, since you already recorded it and use it for the snippet videos you publish. It's way more interesting to watch a video when you can see people's faces.
@capability-snob
@capability-snob 4 ай бұрын
The am29k is such a beautiful chip, outstanding you got Philip in!
@ssl3546
@ssl3546 5 ай бұрын
it's so wild that a guy with under a thousand subscribers gets these interviews and does a good job
@haiphamle3582
@haiphamle3582 5 ай бұрын
What an inspiring story! Godbolt does not become a verb for no reason. It helps people have a deeper look into what happening under the hood.
@andrewdunbar828
@andrewdunbar828 5 ай бұрын
I'm an old Z80 coder on the Speccy and only ever wrote the tiniest little bit of 6502 once on the Apple II and I always just believed the 6502 was faster, because everybody always says so. But then the CERBERUS came out about two years ago, with a 6502 and Z80 on it and all of their tests, including BBC Basic running on both, showed the Z80 was actually faster than the 6502.
@Calilasseia
@Calilasseia 4 ай бұрын
The picture is a little more complicated. Memory copies on a Z80 are far faster, because after the register initialisation, you run a single instruction - LDIR. You have to write a software loop to achieve the same result on a 6502. But 6502 table indexing is faster, because the instructions to do so were native from the beginning, not bolt-ons post-8080. If you need to use IX or IY for your table indexing, that's slower than the 6502 equivalent. Tasks that take advantage of the Z80 being able to perform 16 bit arithmetic in one instruction, or take advantages of goodies such as LDIR, will always outpace other 8-bit CPUs, but interrupt handling is complicated on a Z80 (sometimes requiring support chips) and pushing 8 registers will always be slower than pushing 4. Also, any Z80 instruction requiring a prefix byte (DD/FD for IX/IY register use, CB for BIT instructions etc) will run more slowly than instructions on a CPU that doesn't use a prefix byte.
@andrewdunbar828
@andrewdunbar828 4 ай бұрын
@@Calilasseia In Speccy game code nobody uses LDIR and friends in fast loops because they're famously slow. The fastest way to clear memory is to move the stack pointer to the end of the memory block you want to clear and push all the registers over and over. First game I disassembled that did this was 1985's Starion. There's a similar technique for copying. Nobody uses IX/IY in fast loops either.
@Calilasseia
@Calilasseia 4 ай бұрын
@@andrewdunbar828 ... that approach won't work for COPYING memory blocks though, which is the use case I specified.
@andrewdunbar828
@andrewdunbar828 4 ай бұрын
@@Calilasseia The approach does work and is used. KZbin deletes comments with links but Google for "How To Write ZX Spectrum Games - Chapter 13" "Double Buffering". Another source if you Google "Chasing the raster on the ZX Spectrum in Sidewize". There's probably a bunch out there.
@smoothemoveexlax
@smoothemoveexlax 5 ай бұрын
Can we get support for other CPU platforms including ARM and RISC-V targets? That would be super useful.
@Evan490BC
@Evan490BC 4 ай бұрын
There is support for both ARM and RISC-V, as far as I know.
@momoanddudu
@momoanddudu 5 ай бұрын
Re predicting branches before it's decoded, the CPU does it based on the address in which the command is stored. That is, when PC is the address of the branch, the branch predictor predicts which where the branch command stored there, and seen in the past, will go. Possibly, that memory block holds a DLL, and it was replaced by the time execution returned to the same address. That means that at decode time, the CPU has to handle the possibility the address no longer contains a branch, or a bit trickier - it contains a different branch. Usually the CPU can easily tell it's a different command, flush the pipe and branch predictor, and restart. If it happens to contain exactly the same branch command (as in binary memory content), which would behave differently due to preceding, I don't know if CPUs actually detect this, or suffers a few branch misses until it learns the new behavior.
@haiphamle3582
@haiphamle3582 5 ай бұрын
For the case having the same address, that should be when a context switch happens, and the same virtual address appears from another process, right? Maybe hardware will also flush the branch predictor upon a context switch. For the case where the branch behaves differently, based on previous conditions, CPU designers have many different strategies to cope with, for example, the CPU can store a few recent results and use them to select the final result. There are many other techniques described by Agner Fog (Also mentioned by Godbolt) here: www.agner.org/optimize/microarchitecture.pdf
@AnnatarTheMaia
@AnnatarTheMaia 5 ай бұрын
I dream of building an OpenSPARC T2 server someday, with the first prototype implemented in FPGA.
@AnnatarTheMaia
@AnnatarTheMaia 5 ай бұрын
SPARC is still one of my favorite architectures, and I still do a lot of porting of modern open source software to SPARC in 2024.
@andrewdunbar828
@andrewdunbar828 5 ай бұрын
through which they went by
@andrewdunbar828
@andrewdunbar828 5 ай бұрын
"Sine Nordic Country" = Denmark
@insu_na
@insu_na 5 ай бұрын
I bet he must hate `consteval`, people not just compiling their code on his AWS instances but also letting it compute stuff 😂
@surters
@surters 5 ай бұрын
Yeah it quite wild that its actually the instruction fetcher that must guess where to go as the decode is 6-7 cycles down and the retirement is maybe 200 cycles down ... all this though the power of xor and saturation counters taking a vote on which saturation counter to use.
@oidpolar6302
@oidpolar6302 5 ай бұрын
So, "Parallela" became "Tilera"?
@jamesphilemon8010
@jamesphilemon8010 5 ай бұрын
It's so good to hear Australian and English technogists telling their stories in entertaining ways as only they can.
@walterpark8824
@walterpark8824 5 ай бұрын
And, Thank you so much for introducing me to Agnes Fog's work.
@AnnatarTheMaia
@AnnatarTheMaia 5 ай бұрын
WIll you add Sun Studio compilers too? (If it must be, there are Sun Studio compilers for GNU / Linux).
@MattGodbolt
@MattGodbolt 5 ай бұрын
We'll add pretty much anything that installs easily and folks submit a PR for. There's two PRs usually required; one to add the installation to our infra repo and then another to configure it (if it's simple and looks like clang/gcc). If it requires more work there are per compiler customisation points. Google for "how to add a compiler to compiler explorer" if you're interested 🎉
@SueDoeNym-b4d
@SueDoeNym-b4d 5 ай бұрын
Great talk 🦜
@Heckatomba
@Heckatomba 5 ай бұрын
The name mentioned at 1:45 is Agner Fog.