"We're not gonna run this on the iMac, we're not gonna run it on the Pi, we're gonna run it on the Atari behind me" I love this guy so much.
@TheSulross4 жыл бұрын
Nick1wasd, he's taking CPU caching out of the equation, but on modern architectures, CPU cache friendly data structures can make enormous difference in performance - arrays can be designed to be more CPU cache friendly
@radzdahound30994 жыл бұрын
TheSulross You probably already know this by now but he goes into the major speed differences made due to CPU cache later into the video
@farhanyousaf56163 жыл бұрын
Let’s run it on the Amiga and C64 and 800xl
@monsterhunter4452 жыл бұрын
@@TheSulross isn't that the point of why you use arrays more than linked lists just the fact it's CPU cache friendly?
@Triantalex6 күн бұрын
ok?
@flavioalarcon55336 жыл бұрын
I learned java programming from an institute, in there I basically learned the basics on how to program, develop websites and things like that, but they never even began to explain how things actually worked, it still worries me how little I actually know about computers, since all I know is basically what years of progress in the computer world have achieved, and I'm just using the tools they gave me without questioning, videos like this are very important for someone like me, I appreciate it.
@abdulhfhd Жыл бұрын
they didn't teach you data structures?
@jaydavis6357 Жыл бұрын
In my opinion this doesn't really have too much to do with Data Structures but more so highlighting compilers, CPU Processors@@abdulhfhd
@abdulhfhd Жыл бұрын
@@jaydavis6357 you are right
@Nerdnumberone7 жыл бұрын
The more important difference is the difference is how the performance scales for different operations and what you actually plan to use the data structure for. Both of these were O(n) operations, but selecting an arbitrary element in an array is O(1) in an array and O(n) in a linked list. However, trying to dynamically increase the size of an array would be O(n) (since you have to make a new array and copy the data) while a linked list can add another item to the front in O(1). Adding or removing arbitrary items next to a node you're currently looking at is O(1) for linked lists and O(n) for arrays (though this would be O(n) for linked lists if you had walk to the element you want to change). They are different tools for different things. If you're only using operations where the two structures are comparable (like the summation mentioned in the video) and speed is particularly important, then such a test makes sense, but what you use the data structure for is likely to be far more important.
@ChrisHinton427 жыл бұрын
I agree with your point about choosing the right tool for the job, but insertion into a dynamically resizing array is amortized θ(1) if you change the size geometrically.
@mastablation7037 жыл бұрын
Chris Hinton I think Nerdnumberone was referring to inserts into the middle or beginning of the array, which is O(n).
@ivoivanov74077 жыл бұрын
"Amortized" means allocate once, expand if needed, use many times. Yes, this is the case when array has to be used. But in case you have to do many insert/delete operations, there is no amortization.
@Nerdnumberone7 жыл бұрын
Having worked mostly on web since school, I jumped to the absolute stupidest way to resize an array: Making a new array that is one element larger and copying the data into it, assuming that any array is necessarily a continuous block of information. Sorry, I haven't dug into the low level implementation of data structures since college, where an array was the continuous block and anything else was either a different data structure or a hand-waved black-box. This assumption was idiotic of me and required the programmer to fail to allocate enough memory for their application. If I'm skimming this Wikipedia article correctly, then dynamic arrays just allocate more memory, just in case. You still have this problem every time you hit that limit, but it should be rare enough to be negligible (as in every time you hit your cap you double the memory allocation).
@hellterminator7 жыл бұрын
+Ivo Ivanov That is not what amortized means at all. Amortized θ(1) means that *on average* the time per operation is constant. Yes, whenever you have to reallocate the array, that particular insertion will take θ(n) time, but because you're increasing the size of the array geometrically, the interval between such insertions also grows linearly, so *on average* every insertion is performed in constant time.
@AlexandriaRohn7 жыл бұрын
05:22 We're going to visit every single element (125k) in an array & linked list and add up all their values. 11:18 Running linked list version on Atari computer takes ~166 clock ticks. 13:07 Running array version on Atari computer takes ~179 clock ticks. 17:55 On the Atari computer. Traversing the array backwards is ~114 clock ticks. 18:20 Why? What's happening is that the instructions the Atari machine can execute favors walking backwards. 21:00 Summary of results for Atari, Raspberry Pi, and iMac. On the Raspberry Pi and the iMac, the array is much faster than the linked list. 22:15 What's happening? Caching. The Atari doesn't have a cache. Every bit of data has to be fetched from memory each time. But the CPU isn't much faster than the memory. But for the iMac and the Pi, the CPU is much faster and has to wait for things to be fetched from memory.
@gwenynorisu68836 жыл бұрын
Almost, but you haven't properly annotated *why* the linked list is marginally faster than plain walking forwards on the ST (your next address is prebaked and can be used as a JMP IMMediate, so you don't have to depend on using the CPU's own, slightly slower address calculation), nor why the cache speeds things up so dramatically for the plain array in both directions vs the LL on the others (reading an uncached memory address actually pulls in a whole block of data surrounding that one location into the cache; if you're simply traversing a large chunk of memory either wholly linearly or with very small jumps for each step, that means you could potentially have a few dozen extra cache hits after the initial miss before crossing the next block boundary, massively accelerating the processing speed; the linked list, if it doesn't allocate each new value close to the previous one but instead at an essentially random location, cannot benefit from that speedup)... Also, going backwards on a 68k series chip is markedly faster because of one particular instruction group: Decrement and Branch if is met. Typically DBI-zero - ie it will spin rapidly through a loop that does some particular task, finishing only when the content of a particular register hits zero (and sets its easily testable zero flag). There's no particularly easy or fast equivalent for counting upwards - instead you have to repeatedly test whether your loop-count register is equal to some particular arbitrary value, and simply supplying *that* and doing the more complicated equal/not equal test takes a nontrivial amount of extra processor (and possibly memory bus) time. You can somewhat split the difference by using a decrementing counter to control the loop but either incrementing a separate counter to control the forwards motion, or setting a register to minus the decrementing one, but it's only going to shave the penny and will still be quite a bit slower than just running through the array backwards. The memory in those older machines has no burst output mode (which typically only works in incrementing-address mode) or even anything much like FPM or EDO (which allow reasonably fast, zero-page-esque random addressing within a single block of memory), so you get no compensating speed advantage in terms of reduced memory latency; wholly random access is exactly as fast as wholly linear forward stepping. Which in a way is also why the linked list is faster, besides the lack of cache. That particular instruction is also why, if it was better optimised for later 68k-series chips, the program would show rather curious results on the Falcon (I wouldn't want to try and predict them, in fact) or any other similarly engineered 68030 machine, and show an even stronger bias towards backwards array walking on an ST upgraded to a 68010 (or maybe 020, which has a rather poorer cache than the 030) - the very tight "Loop Mode" instruction introduced with the 010 that is literally engineered for doing nothing more than blasting through very simple memory-walking routines bounded by exit conditions such as DBZ. For the most applicable uses, optimising code to use that instruction can give a 2 to 3 times speedup vs the more explicitly written-out plain 68000 version, and in mixed-instruction code that can occasionally make use of it you still get acceleration measured in dozens of percent. Essentially, if you thought already thought 68k code was strange because you're more used to how x86 does things, Loop Mode just makes it even weirder... the 010 and later processors are pretty much tuned, in cases where burst mode or caches are unavailable or no use, for skimming through memory, very quickly, _backwards._
@blinded65024 жыл бұрын
I wish people didn't create such bloated videos in the first place. Thanks for the laconized info!
@MrDeeb004 жыл бұрын
thx, nice summary
@aidangarvey70493 жыл бұрын
@@gwenynorisu6883 as someone who's programmed for the 68k a bit, I'm curious, how would a data register being decremented by DBRA speed up array access when walking backwards? Wouldn't you have to increase or decrease a value in an address register in every iteration of the loop? How does the value in the data register get used to calculate the address of an array element? Off the top of my head I can't really think of anything that would be faster than just adding a constant value (i.e. the size in bytes of one element) to an address register every iteration.
@Triantalex6 күн бұрын
??
@gormster7 жыл бұрын
"Confusing variable names, but hey, I'm a C programmer" 😂
@zyaicob4 жыл бұрын
As a student learning in C i can confirm that variable names are either long and confusing or short and confusing
@anuraghooda84394 жыл бұрын
@MLU8811 Same
4 жыл бұрын
hell0
@pedronori89993 жыл бұрын
@MLU8811 Did exactly the same AND like the video
@kahnfatman3 жыл бұрын
Say that to the Assembly programmer, they don't even have VARIABLEs.
@Nathan57917 жыл бұрын
This stuff is worth Gold. Not many modern Computer Science degrees teach this because modern day CPUs have cache and compliers have optimization which means that there is virtually no difference in the speed. So people have started assuming that everything runs as quick. Scale the CPU down and you understand the true answer. Very well put video! Worth the 30 minute
@i.donthaveanameanymore7 жыл бұрын
Any "Computer Science degree" that doesn't teach linked lists is not computer science.
@Rififi507 жыл бұрын
What do you mean by "this"? Teach about caches? Teach about unrolling loops?
@BubbaYoga7 жыл бұрын
What practical difference is there if there is no difference? Academic?
@markallen977 жыл бұрын
There's effectively no difference in small problems, or most common teaching assignments (which are usually fast enough to be instant), then you go out into the real world with huge datasets or untidy problems where the differences can be a LOT more noticeable. Especially working in games where performance can be critical and you're sometimes working with very limited hardware - this is one of the things I always try to drum into new graduates
@BubbaYoga7 жыл бұрын
Thanks, I understand what you're saying but I guess I was just hung up on the word practical. As in, in practice. If there is a difference, there is a difference.
@stephenelliott70714 жыл бұрын
Steve is by far the clearest when it comes to explaining anything Computerphile. Clear, Concise, C programmer - all the C's! :)
@DeusExAstra7 жыл бұрын
Between caches and prefetchers, in modern hardware arrays are almost always better than linked lists, or other data structures much of the time. Especially in things like game engine architecture, putting items in arrays versus linked lists of randomly allocated elements (i.e. items all over in memory) can yield very significant speed benefits.
@D4no002 жыл бұрын
that is true, however this is mostly because intel have made their backdoor optimizations, I can bet that the performance on an arm or amd is much more different. The problem with this is that at application level you have no idea what is going on and you can't optimize it, it is just a question of time until this will fall off, just like directx compared to vulkan.
@dale116dot72 жыл бұрын
@@D4no00 The same performance hit happens in PowerPC architectures, and for that matter, any processor with a data cache. Also, arrays are safer than pointers stored in memory. Some safety critical coding standards prohibit stored pointers.
@JulianColeman037 жыл бұрын
videos like this make me really appreciate C as a language
@PauloConstantino1673 жыл бұрын
C is the ultimate language.
@soraaoixxthebluesky5 жыл бұрын
“...which means I need a very high tech piece of equipment” *showed us a floppy disk*
@MasterNeiXD7 жыл бұрын
Second term computer science students, rejoice. Computerphile has come to save you.
@KevinFu51007 жыл бұрын
i swear, this channel has saved me many times.
@wilsenhc7 жыл бұрын
Saved me when studying recursion
@Stigsnake57 жыл бұрын
I've already graduated but better late than never, although I'm doubtful it'll be useful in my job
@knightshousegames7 жыл бұрын
I remember when Computerphile was brand new, I was in my Data Structures and Algorithms class, and doing horribly. I think I would have done a lot better if they had started this channel about a year sooner. I did tell my professor about this channel on one of the last days of class, so I never found out if he ever used it in any of his lectures or not.
@Ultraice987 жыл бұрын
Damn son. Saved me again.
@TheJaguar19836 жыл бұрын
First impression. Depends on the operation. I tend to use arrays when I need contiguous memory, or I have a known length, and if I need random access. I use linked lists when building lists with unknown length and where random access is not required.
@philippbeer3 жыл бұрын
A fantastic video. It’s set up great. And the fact you do “myth busting” and teaching people how to test a question like is simply awesome.
@MorRobots7 жыл бұрын
This will completely be a function of use case and implementation. I love to see what you all setup for this test.
@WeatherWonders7 жыл бұрын
He's talking about complex computer stuff and I'm just intrigued by the little ASCII spinner at 9:37. Edit: Haha, they talk about it later on. :D
@taragnor5 жыл бұрын
lol same here. The first thing I wondered was "I wonder how many valuable CPU cycles are being wasted animating that."
@rustycherkas82293 жыл бұрын
Used that spinner on VDTs, back in the day... { putchar('\b'), putchar( "\\|/-"[ a++ & 03 ] ); } His was displayed inside "()", so I suspect some kind of "printf()" that certainly wouldn't help speed things along during initialisation (on the Atari)... Not sure why he's generating random numbers to initialise all those 'p's... Any (smallish) number would do just as well: struct.p = loopcounter; Having written lots of code for diverse tasks, imo, each approach has its use and purpose. Arrays when their size is not prone to fluctuation, single or double linked lists when there's lots of inserting/deleting to be done. There's no one "optimal" scheme that is best in all circumstances. (A recipe book has a fixed number of recipes, but each recipe has a variable number of ingredients... Thankfully 'C' has dynamic heap allocation and pointers...)
@rustycherkas82293 жыл бұрын
@S B C's "volatile" keyword makes optimisation a little less secure about its assumptions of what stays and what can be replaced. Scary will be the time when optimisation replaces some code's explicit linked list with array operations, or vice versa, and coders come to rely on that. The demo would work just as well with recursive function calls for the linked list example using the stack as the 'chain' of object instances.
@subsystemd3 жыл бұрын
Wow. That has to be one the best video on the topic of computer hardware differences and how we as programmers should not make any assumption on how a machine is supposed to function when executing code. I love the "I'm writting in a high level language" when actually it's C and it's one of the least abstract language you can use without resorting to assembly. Just shows you how much this man knows his stuff.
@SHUBHAMGI7 жыл бұрын
a + b = b + a you can watch Numberphile for that 😂😂 LoL
@ruben3077 жыл бұрын
more like LoL are three letters => Half life 3 confirmed
@MazelTovCocktail7 жыл бұрын
SABER - FICTIONAL CHARACTER Not in computer science. ad does not equal ba.
@gdogawesome7 жыл бұрын
seems legit
@vadimuha7 жыл бұрын
But you forget that we can extend list easily but if we want to add 1 extra element to an array we must create new one
@panda42477 жыл бұрын
wait, i thought that there was a conjecture, that a+b=c so it that not valid?
@DanRos2Coder Жыл бұрын
I stumbled across this video this evening, some 5 years after it was published. Brilliant work, mate! I'll be searching to watch more of your videos right now!
@nab-rk4ob7 жыл бұрын
Great info about the differences in the way the chips and the compilers handle the same simple code. Stuff like that fascinates me. Thanks.
@vavassor7 жыл бұрын
Also, depending on the fragmentation of the allocator from which the linked list nodes are taken from, on a cached CPU it can be much much slower than an array which always has its elements on the same cache lines relative to one another. And then there's a funny case where supposing you have an allocator which allocates things linearly and you allocate the whole linked list at once, it basically gives you an array with a bunch of (useless) pointers to next nodes between each element. This will basically perform not quite as fast as the array, because it won't be able to fit as many elements per cache line due to the space wasted by the pointers, BUT it'll be much closer to the array than prior. Overall, the speed of operating on a full linked list on a cached CPU will vary depending on where the nodes were put in memory relative to one another. Another thing to note is how in my work anyhow, you're very unlikely to see a list as big as 125k elements. So, the effects of the cache on full array loops is increased even moreso.
@RakeshPGopal7 жыл бұрын
Ironically I see this video next on the auto-play list "Bjarne Stroustrup: Why you should avoid Linked Lists"
@tanveerhasan23825 жыл бұрын
why indeed
@notthedroidsyourelookingfo40263 жыл бұрын
@@tanveerhasan2382 Watch it, then you'll know. But generally, because growing an array geometrically reduces the amount of time you need for expanding/shrinking it, and because insertions/deletions, which are a Linked List's strength, still require traversing it in O(n). From my own and limited experience, I've not yet come across a case where anything was ever actually faster than an array. In one case, a hash map came close. Might even have been the same speed, and was definitely more convenient because I had key/value pairs, but I ended up ditching it and using a sorted std::vector, because I needed the simplicity for MPI compatibility..
@dragonore20096 жыл бұрын
8:11 "p inside p, were using p inside p, confusing variable names, but then I'm a C programmer" LOL.
@Spacecookie-7 жыл бұрын
Thank you for clearing that up. I came from programming in the end of the 1980's. I didn't get the linked lists thing and why arrays weren't used. People were talking about linked lists instead.
@RobBCactive3 жыл бұрын
A lot of it was about just allocating the memory you used, then recycling it, which favoured dynamic approaches storing structs/records together. Traditional arrays require sizing and static allocation which put arbitrary limits. You didn't want to allocate a massive array on start up for some feature which wasn't used, or would restrict what the system could do. The languages used weren't necessarily as flexible as C which limited options for long running programs. Allocate at beginning, process and terminate worked fine for short lived programs running jobs on multi-user systems. As a user began using larger GUI programs, static pre-allocation was less feasible.
@bus7 жыл бұрын
Good call on running it on a simple machine. More entertaining and more consistency, win win!
@jabersyed9787 жыл бұрын
Would love to see Dr. Jason Atkin next time. Had him for my C++ module. He is amazing. Dr. Bagley is awesome as well.
@UntouchedWagons7 жыл бұрын
gcc can unroll loops too, it's not a clang-specific feature.
@hanneshauptmann7 жыл бұрын
Every decent compiler can.
@recklessroges7 жыл бұрын
for(p=0; p
@naikrovek7 жыл бұрын
He wasn't talking about unrolling a loop, he was talking about SIMD.
@nicholaswallingford36137 жыл бұрын
When GCC switched to the GPLv3 license, Apple forked GCC. I believe the GCC version shipped on OSX is something like 4.2, mainline GCC is up to 7.1. This was in like 2006 or something. GCC has been significantly improved since then. So it's not unlikely that his version of GCC won't unroll and vectorize the code, even though real GCC will.
@alexandrupana60377 жыл бұрын
Reckless Roges Fails to compile. You smartass...
@ArjanvanVught7 жыл бұрын
When using the Raspberry Pi 3 there is the NEON co-processor. Compiling with -O3 could generate vectorized code. To be sure it is not, the vector optimalization should be disabled. Vectorized code would be increasing the speed on array's manipulation.
@cyberbadger7 жыл бұрын
It's always a good idea to have an understanding of what's going on under the hood.
@Madsy97 жыл бұрын
I can understand that the benchmark had to be trivial in order to run on the Atari ST, but I think simply summing integer elements in a loop gives a bad indication on the performance differences between linked-lists and arrays/vectors on modern architectures. If the pointers in a linked list point to a larger data structure, it can absolutely trash the cache and cause big performance hits. It is even worse if the elements in the linked list are scattered around in memory. With three levels of cache which we have now, getting cache misses gives a hefty penalty. Using arrays/vectors is almost always the right choice nowadays when performance is important, even if you don't do random-access. The only exception I can think of is for huge data sets with frequent insertions and removals.
@madeso7 жыл бұрын
Madsy9 worth noting is that if you don't care about the order of items in your array, both insert and remove operations are O(1)
@jellorelic7 жыл бұрын
The point wasn't to address a typical use case. The point was to illustrated that the fundamental architecture underlying your code can in fact cause things to behave in unexpected ways. Yes, most of the time X may be true.. but *assuming* it's a universal case can cause problems.
@markallen977 жыл бұрын
It depends on your use case, I often end up in cases where an unsorted array and a bit of brute force is faster than a tree or hash based set when your data size is tiny (for example under 10) but you're doing a huge number of lookups.
@bigbenhebdomadarius62527 жыл бұрын
+Madsy9 Sorting is another area where linked lists can be faster, especially if the record size is large, since sorting a list only requires altering the pointers, not moving entire records around.
@SyntheToonz7 жыл бұрын
Possibly the compiler is doing something less than optimal when making the code for an array reference. While in the Air Force in the early 90s we had a problem writing a document paginator on a worst case 5MB report file. (Ultrix workstations). The program sat there and churned for over 30 seconds while the paginator routine determined the starting location of every page in 17 different page formats. Users found this waiting intolerable. I read an article then about how C compilers may generate a multiplication to reference array elements, and that traversing an array by pointer reference instead of subscripts would not include this extra math behind the scenes and should run faster. I tried this idea out on my Amiga 3000 and the paginator using pointers ran in 12 seconds for an equivalent 5MB text file. I was pretty sure the 25MHz 030 should not be a lot faster than the Ultrix boxes at work. The recoded pointer version of the paginator ran in 5 seconds on the ultrix workstations. Happy users.
@123123raoul7 жыл бұрын
Taking the median would give more accurate results. The OS scheduler might interrupt the program sometimes resulting in distorted results when taking the average.
@softwarelivre23892 жыл бұрын
True
@ResandOuies7 жыл бұрын
But in the end the array was faster on all the CPU with a cache, which should be all CPUs now. Right?
@jellorelic7 жыл бұрын
Generally, yes... across most use cases and particularly with modern cpu/compiler/memory architecture arrays will be faster than linked lists. But the point of the video is that it's not a fundamental given. Even in cases where you can math out the number of steps involved in the logic, your real-world systems that the code runs on can affect the actual results.
@bruterasta7 жыл бұрын
Even CPUs considered obsolete have caches. CPUs without one are old enough to consider them as antiques.
@rylaczero37407 жыл бұрын
correct mè if i am wrong, linked list are slower but memory efficient, slower than arrays as they don't provide random access, but heaps can overcome all that stuff, is there anything more it in the video?
@n00dle_king7 жыл бұрын
Not all modern processors have caches. For instance, most Arduinos do not have a cache.
@kantdestroyer7 жыл бұрын
Linked lists are not mem effecient, as they require more data to keep ( every node keeps its key and adresses of prev. and next nodes). However, it is easier to add or remove nodes from the middle pf the list. If you search data structures on wikipedia, i think they say both mem. Efficiency and time efficiency for different operations
@TimJSwan4 жыл бұрын
"There's technically a cache but not really but essentially no but there's something like a prefetch"
@Chriva7 жыл бұрын
I'm certain Mr. Bagley already know about this but on 68k specifically an array will ALWAYS be faster than a linked list when coding in assembly since you can predecrement/postincrement the address-pointer. (as in direction doesn't matter) ie: lea.l "address of array", a0 clr.l d0 clr.l d1 move.l 1048576, d2 (Number of iterations.) Loop: move.b (a0)+, d1 add.l d1, d0 subq.l 1, d2 bne.b Loop d0 will contain a checksum32 you could also add 1048576 to the address of the array and do "move.b -(a0), d1" inside the loop to read it in reverse. Speed should be more or less the same. With a linked list you also have to fetch a new address for every iteration so there's no chance that is faster on these specifically :)
@Chaisz3r07 жыл бұрын
So in all fairness you'd need to enable loop unrolling in gcc (experimental) and compare again. That should significantly speed up the array-based RPi compilation.
@panda42477 жыл бұрын
why is backwards array faster than array on Atari? I'd say that it comes down to the line for (i=0, i
@Deveron47 жыл бұрын
It'd be great if Dr. Bagley was recorded in front of a darker background. The over exposed window background is rather distracting.
@MasterNeiXD7 жыл бұрын
Damodara Kovie Maybe it's part of who he is. I like to think it is intentional, after all they're computer scientists; they know how to use a damn DSLR.
@Deveron47 жыл бұрын
Bryan Keller Fair enough, but I'm appealing to Sean Riley, the videographer and editor.
@NicholasLeader05 жыл бұрын
Just needs some front lighting
@Triantalex6 күн бұрын
??
@OrbitalCookie3 жыл бұрын
If you need linked lists on modern architecture store them in arrays, and instead of pointers use indices. It can be faster than linked list, but of course you still need to measure that. General approach for fast code is to put all required data relatively nearby in memory. Allocating the big array chunk kind-of ensures that (lol there are OS caveats still), as long as elements aren't the pointers.
@lasersbee7 жыл бұрын
9:03 I thought I was the only one still using floppies for some applications... ;) 21:40 I would have liked to see the results on an Intel Windows computer.
@watashiwahatchi5 жыл бұрын
Loved this video! Thanks guys. Great take away; "if you're not sure, come up with some tests and collect the real data". This echos Mike Acton's 'truths', 1. Hardware is the platform 2. Design around the Data.
@utl947 жыл бұрын
8:14 "Confusing variable names, but then, I'm a C-programmer." Haha!
@floatingpointbr4 жыл бұрын
Those computerphile nerds are so friendly. If all teachers were like them...
@chrissxMedia6 жыл бұрын
well...you should really re-run it with c++11 iterative for-loops, because these have a kinda big impact on the cpu-performance of arrays
@saeedbaig42497 жыл бұрын
15:20 - "But if u think about it, if we started at the end and moved backwards, it would be exactly the same as starting from the beginning and moving forwards, so there's no point in testing it." Doesn't that seem like a rather unfair assumption to make? I mean, most people would have assumed looping backwards through an array would be "exactly the same" as looping forwards, but that turned out to be false. Why would u then assume, without even testing it, that going backwards through a list would be just the same?
@mido112k64 жыл бұрын
Man, I really wish I can switch my major rn.. I've always had smth for computers
@AJSquirrel533 жыл бұрын
Do it
@joevero45687 жыл бұрын
VSCode, I see you're a man of culture
@MisterFanwank7 жыл бұрын
Being a C programmer is no excuse for confusing or non-descriptive variable names. As much as possible variable names should explain the purpose of their data unambiguously. I recognize there are times when this is not easy, but with modern IDEs there is no justification for bad variable names. Don't ignore the power of your tools.
@abdullahnoman86186 жыл бұрын
He is a computer scientist not an industrial programmer
@LudwigvanBeethoven26 жыл бұрын
Ok so i will name it "Pointer_that_Points_To_Next_Element_In_The_Array_List"
@sangramjitchakraborty78456 жыл бұрын
♫♪Ludwig van Beethoven♪♫ Or you could just name it nextpointer#
@CYBERBEAST216 жыл бұрын
Butthurt.
@checka90296 жыл бұрын
He said "asleep" programmer, not a C programmer, so he had an excuse
@BeAManPodkast6 жыл бұрын
The most important thing that everyone should learn is this: BOTH ARE FAST ENOUGH! Use whatever method works for your application.
@tengelgeer7 жыл бұрын
8:15 "But then, I'm a C programmer." xD
@filker05 жыл бұрын
I'm 2 years too late, but I think that there is one change to the linked list version that will speed it up on the cached processors. I assume that the elements of the linked list are added to the beginning as they are generated. As the summing loop walks the list, the memory address for each successive element is decreasing, making it more difficult for the CPU to predict and do speculative prefetch. If the newly allocated elements are appended, the general direction is toward higher addresses, which are easier to predict for the prefetch mechanism. I worked on a commercial C compiler for the Atari ST back in the 1980s, and benchmarks were important to the success of the product. Optimization strategies were weighted toward better performance on things like the Dhrystone with the knowledge that there was no cache.
@DrSteveBagley5 жыл бұрын
Oooh, which C compiler was that?
@filker05 жыл бұрын
@@DrSteveBagley mwc (Mark Williams C) for the Atari ST.
@PEGuyMadison5 жыл бұрын
Actually... you may want to look at the index calculation time on the array on the older machine. Some compilers didn't optimize for powers of two and used a multiply for the index. This was pretty painful back in the day.. remember the ARM processor had a serial multiplier on some models but a 5 or 6 clock multiply was also a possibility. People forget this today, typically its 1 clock. IIRC the 80486 had a 3 clock integer multiply
@xteuk7 жыл бұрын
And I thought I knew how it's working. This was a brilliant answer to this "trivial" question. I learned a lot, which is some-divided-by-zero from what I expected first 😀 Bravo et merci
@FalcoGer5 жыл бұрын
It's quite obvious. Linked lists are of complexity O(n) for lookup, insertion or deletion. Arrays are of complexity O(1) for the same thing. you take the starting address and add (n * sizeOfElem). Arrays are fixed size, linked list are of variable size and more flexible. At the same time memory allocation is very slow. if you put your array on the stack it's again literally a single cpu instruction (add to esp). running over all the elements still would be faster in an array still, since in a linked list, any element might be anywhere in memory, and thus it might not be in the cache in a sufficiently large list while the stack is almost certainly on the cache. also you need to dereference a pointer, meaning looking at the memory address to fetch the memory address at which the next element is, which is 2 memory accesses, while the array is again literally ESP + offset. Also linked lists are larger in memory, since you have to store that extra bit of information.
@elkeospert91883 жыл бұрын
"It's quite obvious. Linked lists are of complexity O(n) for lookup, insertion or deletion. Arrays are of complexity O(1) for the same thing." No - Arraya only have the complexity O(1) for lookups - when it comes to inserts and deletions in average half of data in the array has to be moved/copied so complexity is O(N) for inserts/deletions in arrays. Arrays are (compared to linked lists) faster in random read but slower in inserting or removing elements. They are also faster for sequential reads where all elements starting from the first one have to be read when optimized by modern compiler. Normally you need the calculation startingAddress + index * sizeOfElement when you want to access the element in the Array with the given index. But if you have a loop which increases the index every cycle by one a modern compiler can get rid of that muplication by introduing a variable "offset" which is set to startingAdress before the loop and to which sizeOfElement is added after each cycle in the loop..
@aioli1217 жыл бұрын
So you're saying the node pointer is Mario, the node is a remote floating platform, the "next node" pointer is a chuckster, and that my neighbors have already reported me to the police for blaring the a cappella theme on max volume. Well, I must admit, I like what you have to say, sir. I like it a lot.
@romanemul17 жыл бұрын
a question which interest me most now. Is the floppy HD format or DD ?
@finkelmana6 жыл бұрын
"DD" with a hole drilled to make HD
@gwenynorisu68836 жыл бұрын
Why would you do that, as it's being used in a machine that, odds on, only has a double density drive? (I know HD was at least a later option for the MSTe, but STs in the main have 720k drives) As well as bothering with that trick when proper HD floppies remain cheap (heck, pretty much free these days) and plentiful, and have formulation that's a better match for the drive characteristics? PC drives including USB ones can still happily read and write 720k discs, btw.
@Clairvoire7 жыл бұрын
If the Linked List stores it's data like an Array, it'll likely be about as fast. But in practice, Linked Lists will have insertions, deletions, concatenations, all of which fragment this data and create cache miss opportunities. Context matters in this case, which is why it's best to profile these things in the situation you need them in
@SHUBHAMGI7 жыл бұрын
finally, something academic 😅
@HonsHon2 жыл бұрын
A lot of people in the comments do not know (or at least 4 years ago did not know) that you can insert Linked List elements at the head for O(1) if order does not matter. It will reverse the order of elements coming in of course, but this gets rid of the need for an O(n) insert or needing to track the last element in the linked list.
@nbpierson7 жыл бұрын
Isn't the Atari array code running slow vs linked list because the cost of index to memory address is using Multiplication? It more related to C array syntax and math occurring because of the index parameter. I expect this (multiplication) is a larger factor than memory access and not having caches on old Motorola 68000 Atari system. Linked list code will not involve multiplication. Did you look at the assembly code generated?
@josephcote61206 жыл бұрын
Did you notice that it was an array of objects, and not an array of primitives? Makes the array testing come out far, far worse than it otherwise would.
@trevinbeattie48886 жыл бұрын
Joseph Cote, This is C, not Java. They're called "structs". ;)
@josephcote61206 жыл бұрын
So what? My point still stands. Arrays of pointers will always be less efficient than an array of primitives.
@MrMrsirr6 жыл бұрын
Joseph Cote While I don't care about the whole "object" vs "struct" thing, it is important to note that, unlike in Java, data structures in C aren't referenced by pointers unless the code specifies that they are. In C (and other languages), any structure, regardless of it's type, can be accessed either directly or by pointer. In this case, the array was a contiguous sequence of structures, no pointers involved other than the pointer to the array on the heap.
@josephcote61206 жыл бұрын
Let me make my point even more plain. Linked list. Pointer to the head. Each item has a data and the pointer to the next "thing". For each item in the list you have to get the address of the item either from the head pointer or the previous item. Memory access to get the data, memory access to get the next pointer. Perhaps even more processing to resolve the actual addresses within the "thing." Primitive data array: Using the base address of the array and the index, calculate the address to read (very fast all in cpu) access memory to retrieve data, go to next item. For each item ONE address calculation and done. This is my ONLY point: Arrays of primitive values will ALWAYS be much more efficient in tests like this than any kind of fooling around with pointers and objects or whatever you want to call those "things." The video compared a linked list of "things" to an array of "things", NOT a big difference.
@gregf91607 жыл бұрын
As a programmer, this was _Very, very_ interesting! Thanks, Dr. Bagley. More like this, please :)
@Zebsy7 жыл бұрын
Dear me this is dodgy really. The choice between data structure should be made depending on what you're use case is. eg is random access by index needed? are inserts needed? is iteration needed? I found this video interesting in some ways but it fails to address the important points of how to choose a data structure
@D4no002 жыл бұрын
Actually this is a very interesting video, many might have thought that rpi is just an intermediate device for running tests, but actually it is running on totally different architecture, arm64. I know for sure that all servers are migrating to arm64 nowadays as it is much more energy efficient, however those number of cycles are alarming, as clock speeds of those processors are not any higher than intel ones.
@charlescox2907 жыл бұрын
"The compiler is being clever", which is why you don't optimize code to compare algorithm speeds. Also, this is just traversal. How about speed testing insertion in the middle, which is what linked lists excel at?
@NickNackGus7 жыл бұрын
Things become a bit more interesting for embedded real time systems - when finding a memory location to allocate takes too long, so you have to allocate all of your memory in advanced, or write your own allocation algorithm. This shouldn't make a significant difference for reading the array/linked list, but should affect creation time.
@aem5smruasn7 жыл бұрын
insert word association programming humor here
@oisnowy53687 жыл бұрын
That could probably be done faster with a linked list than with an array.
@newvocabulary7 жыл бұрын
THC IS IN MY P
@CountElectric7 жыл бұрын
Very informative video on this topic. And actually very clear about what is happening. Great job.
@JimLeonard7 жыл бұрын
Extremely disappointed this was a comparison of a linear scan of both data structures. What about arbitrary insertion and deletion? Those are some of the the use cases linked lists were invented to address (no pun intended).
@gasquake7 жыл бұрын
Couldn't agree more. The entire video I couldn't stop thinking about how this comparison is unfair.
@JimLeonard7 жыл бұрын
On top of that, something was seriously wrong with the compiler implementation on the Atari ST, that a linear forward scan of an array was somehow not faster than a linked list. And noting the speed of list/array creation -- something that is typically done only once -- was sort of pointless.
@Sonny_McMacsson7 жыл бұрын
Arbitrary insertion and deletion of large structures (100's or 1000's of bytes) is going to be slow in an array. Then there's making a list out of items that may not be contiguously allocated, like a list of request structures for I/O coming from different threads that they allocate themselves, possibly on their own stacks. Of course, for the former, you can create an array of pointers to the big structures and move those pointers around the array if that suits you.
@Jimbo19206 жыл бұрын
I found this disappointing too. It's billed as a comparison of data structures and ends up being a comparison of hardware architectures.When I was in school we used the concept of an abstract machine and everyone learned the pros and cons of arrays and lists for sequential access, random access, inserts, deletes and moves. If a linked list beats an array for sequential or random access, something is definitely screwed up.
@ReedHarston6 жыл бұрын
While a definitely agree that the choice really comes down to what you need the structure for, I found this information importantant as well as it explains another aspect of choosing a structure that some people might not think about and isn’t considered as often as big O notation. I gaurentee you that if he had done the video we had expected then there would be people complaining in the comments saying that he didn’t consider how they run on different architectures. Besides, at 30 minutes this video was long enough as is without comparing all the different use cases (and the previous video did mention some of it anyway).
@danielgrace78877 жыл бұрын
I just learnt about "-O3" setting in GCC. Nice one, thanks!
@anshul52434 жыл бұрын
FWIW, -O3 has a tendency to break code, -O2 is a safer, albeit slightly slower alternative
@glowingone17743 жыл бұрын
@@anshul5243 ozone mode bad
@thechubbypanda5 жыл бұрын
Did he actually say "doog yrev"!? lmao
@nemplayer17764 жыл бұрын
Did eeh sey
@revenevan114 жыл бұрын
That's what it was!!!
@iggienator7 жыл бұрын
I love these in depth looks on computing processes, keep it up!
@neres9097 жыл бұрын
Very insightful benchmark. At first I was worried, that you don't mention cache's role in it but you did. Only thing that's missing there, is that this is synthetic test, and in real life applications would show far different results. The reason is not only those cache misses, that you mentioned, but also that on x64 linked list node would be 2x greater in size than one in array. This applies consequences for how much memory will be prefetched to cache and it's pollution thus greater performance hits during execution. Also, as this is synthetic test, all memory for linked list is allocated one after another anyways, thus lower cache misses than it would be in more likely scenario in real life application, where nodes would be scattered through memory and single fetch would likely be a miss.
@jellorelic7 жыл бұрын
Everything you say here is true, but beside the point. He's trying to teach a more fundamental fact about CompSci in general - just because X is faster/better/whatever on paper doesn't mean that will always turn out to be true when the code hits metal. While it's true in modern cases that most programs can ignore the underlying architecture in most cases, sometimes it can result in unexpected behaviors.
@wolverine96327 жыл бұрын
Extra credit earned for using the Atari! And I'm pretty sure the reason why it's faster when run in reverse is because no compare instruction is needed in the low-level loop, only a check for zero. (I see you clarified this at the end, and I'm right. Atari 2600 programmer here...)
@MACIEJ4545457 жыл бұрын
How about sorting? When you have big enough objects in an array/list wouldn't sorting a list be quicker when you can manipulate pointers instead of moving whole objects around in an array?
@schetefan247 жыл бұрын
For BIG sructs this should be true, but first you must come up with an efficient in-place algorithm, where you need no random access. I'm not quite sure, whether you can do selection sort or sth comparable with a single linked list (this example should work with double linked lists), at quicksort this should be even harder...
@MACIEJ4545457 жыл бұрын
skeliton11211 I was meant to add that to the question to be honest. Of course an array of pointers would be much quicker. When you have a warehouse full of containers you make a checklist where everything is and organise it and not move all the containers around.
@MACIEJ4545457 жыл бұрын
Stefan Gräber I realised that later. sorting one way list might be problematic since you have to relink the neighbouring objects and insert the object in its new place. It results in a far more pointer manipulation than an array of pointers. My question was more about the "pure" version of a structure
@adisander7 жыл бұрын
I would imagine it depends on what you're sorting, and what sorting algorithm you use. First, on the data being sorted: Arrays have to swap the contents, so the time is dependent on the size of that content. Linked lists have to swap the pointers, so constant. These two are equal if the data being sorted is the size (or smaller) of a pointer, otherwise the linked list (I must assume, I should point out that I'm not an expert) will win out. As for how you sort: Insertion sort moves backwards along the array/list, and so is an outright no for linked lists (unless, of course, you use a doubly linked list, i.e. a linked list linked in both directions, which would have even greater memory use than a single linked list and would require two pointer swaps rather than just one for every data swap). Heap sort relies on random access, so would take a hit on linked lists. Merge sort and quick sort (properly implemented) work well on both arrays and linked lists. Radix sort I'm not sure about. On the one hand, from what I recall, what you're sorting is specifically the sort of thing that would benefit from linked lists only needing to swap pointers rather than data. On the other, I would also think these things to be things where the array would only contain pointers to them, for example strings (assuming you aren't accepting the restriction of all strings being the same maximum length, and probably all close to it (otherwise you're wasting lots of space)). Provided you keep in mind which sorting algorithms to use and which not, and that your array elements aren't unreasonably large, the difference is (probably) negligible: sorting an array/list from scratch is something you probably won't be doing that often, or with data large enough, for it to really matter which of the two you use, at least in comparison to other operations you're likely to use on the same data. Lastly, I suspect that if you're sorting an array or list, you're probably going to want to do other things to it that linked lists are better suited to, that is inserting and removing items. That bit however is purely subjective and I don't have anything to really back that up.
@angeldude1017 жыл бұрын
Quicksort can be done in place so there's no need to do large copies. Even still, moving small structures is still often faster than moving pointers in a linked list, again, because of the cache which can preform the move operations in bulk.
@linardssmeiksts65583 жыл бұрын
Nice test and comprehensive explanation as always. Love your videos. 😊
@k1ngjulien_7 жыл бұрын
Wouldn't it have been faster without printing stuff on the screen inbetween? I know that many of my c and c++ programs run much quicker if you leave out the print statement.
@heyandy8897 жыл бұрын
that's a good point, writing to the screen can take some time. however both versions print to the screen, so they are subject to the same slowness. it's like if you had two people run a race through syrup - you can still tell who's faster than who.
@statphantom6 жыл бұрын
yes however the percentage in speed difference is exactly the same as they are printing the same length (minus one or two chars) the same amount of times so it wont effect the comparison
@TheMaginor6 жыл бұрын
They did the prints outside the timings, so it would not affect the recorded time.
@gwenynorisu68836 жыл бұрын
If you're finding that printing half a line of low rez text to the console makes a significant difference to the loop speed of your otherwise ~850ms routine, even on an ST, then you've got bigger problems than whether or not to print said text. Like, realising that you're actually stuck in a runtime interpreter for a particularly shoddy variety of BASIC when you thought you were making compiled C programs. If they're routines that take closer to 850us, then, sure, maybe it'll be worth it. But the text only prints after the loop is done (which is how it knows how long said loop took) and before the start of the next one, so it's not going to affect the reported results any, and in most cases the difference between printing or not printing 100 half-lines of text will save you, ooh, maybe one video frame's worth of runtime?
@babel_7 жыл бұрын
Can Dr Bagley do another shootout, this time between lookup tables and live calculation? That would be amazing to see how modern processors differ from older ones.
@Gordon9727 жыл бұрын
Steven: *uses VSCode* me: MAH BOIiii
@SnowRaptor7 жыл бұрын
Also, the iMac has SSE4 ans AVX instructions, which can add 8 or 16 array items in one cycle.
@angeldude1017 жыл бұрын
Which is facilitated by the loop unrolling, and then the cache makes it so that all 8 or 16 of those elements are accessed in a single read.
@MrCOPYPASTE6 жыл бұрын
MEMORY IS AN ARRAY!!!!!!!!!
@avro549B7 жыл бұрын
It's not necessary to pre-declare arrays in Perl. They grow to whatever size is required. (It may be slightly faster to create large ones in one go, if it's known in advance how many will be needed.)
@tcocaine7 жыл бұрын
Yesterday I fell asleep while watching this video, and when the outro popped, I swear, I woke up THE WHOLE HOUSE with the greatest scream humanity has ever heard.
@hoisinholdup7 жыл бұрын
VS Code master race reporting in
@CoituselPingo7 жыл бұрын
i think is C
@hoisinholdup7 жыл бұрын
Yes, but the text editor he was using to write it on the Mac was VS Code
@AlastairMontgomery6 жыл бұрын
Dream selection of retro kit in that office :)
@Zadster7 жыл бұрын
Doogy rev, doogy rev. Gronda gronda Rangdo!
@philp46847 жыл бұрын
wopsim
@technowey3 жыл бұрын
Instead of using an array index, use a pointer to the first element of the array, and increment the pointer. I think that will be even faster. Modern Intel processors handle adding an index to a pointer and dereferencing the incremented pointer all in a single instruction. The reverse speed is likely due to how the cache works. (I haven’t seen the end of the video yet).
@rlee00017 жыл бұрын
You're original test is invalid. You unnecessarily used a different type of loop, with the loop used for iterating over the array doing a bunch of useless loop-maintaining arithmetic and comparisons that any competent programmer would have either removed or accounted for in the benchmark. Pro tip: summing the elements in an array is in fact faster since it reads half the amount of data from memory for each access (only the number being summed, and no pointer). Since the x86 has more registers, the optimizing compiler had better luck keeping your loop counter in a register and not in memory, where-as on the 68k, the loop counter was stored is a local variable (on the stack in memory), and you access it 3 times per iteration: once to compare it, once to read it for incrementing, and once to write it for incrementing. That's completely unfair to the array implementation.
@vytah7 жыл бұрын
68k has 8 data registers and 7 address registers, it should be enough for all three programs. The operation that slowed down the forward array iteration was probably comparing the index with the constant, which requires one more instruction than comparing it to zero.
@heyandy8897 жыл бұрын
it's not "unfair." the point is that the speed depends on the architecture's bottlenecks. imagine we were dealing with HDD instead of RAM. you would want a data structure that minimized reads from the disk because it's so much slower than memory. On the other hand with an SSD, you are not penalized so much for excessive reads and writes. Therefore you can use different data structures and algorithms without incurring the performance hit of the HDD.
@rlee00017 жыл бұрын
That's not how Big O works.
@iljuro7 жыл бұрын
The test is unfair and should be invalid, but not because the 68k is inferior, but since the for-loop use more complex mechanics and thus is significantly slower than the while-loop The ascending for-loop is the slowest because it uses loop-counter-mechanics and have to do an active end-of-loop-check every iteration. The descending for-loop is faster because it uses the very nice decrease-and-branch-all-in-one-instruction and use the decrease instruction's result to compare to zero instead of a separate compare instruction. But it still have the loop-counter-mechanics to deal with. The linked list is fastest because it just loads the next address and checks if it's zero.
@K3rbalSpace3 жыл бұрын
Loving the Atari MegaST in the background!
@clarkd19557 жыл бұрын
Commenting on an Atari isn't particularly relevant as modern computers generally use Intel chips that are very cache heavy. Many tests have been done on these computers that show that almost all optimizations are next to nothing compared to the cache effects. The bottom line as illustrated by many, is that arrays are almost always must faster than linked lists and therefore linked lists should only ever be used only in special cases. Arrays should always be considered the default. My first computer was a 8080 running at 1 megahertz (linked lists were probably ok on that computer) but that isn't very important to what works on modern computers. There is no argument, don't use linked lists.
@pleasedontwatchthese95937 жыл бұрын
To be fair that was not the point of the video. It was that testing was king.
@Anfros.7 жыл бұрын
Modern computers are not "generally" using intel chips. Intel chips are dominant for pc:s, but pc:s does not represent a majority of even consumer computer systems. As an example take the 3 big gaming consoles of the 7th generation (xbox360, wii and ps3). Between them they have over 260 million units sold, and none of them use intel chipsets. Furthermore most cellphones today are running on qualcom chipsets. That's not even mentioning the non-consumer market. There the atari test is absolutely relevent as many embedded chipsets do not have cache.
@satannstuff7 жыл бұрын
Pretty sure the majority are ARM chips. That's what you're going to find in just about anything that doesn't need a high performance CPU and in some things that do, like most smartphones or that Raspberry Pi for instance.
@gwenynorisu68836 жыл бұрын
Maybe if you're using something dismal like an Atom or an embedded / microcontroller core based off a 386SX or what-have-you, but I don't think that was the point of the test. But even if it was, you can use the 68000 in the ST as a fairly prime example of such; embedded CPUs based on the 68k core (e.g. the Coldfire) were extremely popular up until a few years ago when ARMs and ATmegas started offering greater power and lower energy consumption alongside a lower price tag. So we already have the answer - sophisticated large instruction set, large cache chips prefer regular arrays, and that includes the higher end ARMs. Simpler, cacheless chips prefer linked lists, unless they also display unusual behaviour like being over 50% faster when traversing an array from the bottom up rather than top down.
@gwenynorisu68836 жыл бұрын
Hmm... another thing with the Falcon is that it makes use of the 68030's burst mode memory transfer mode (sort of an early attempt at EDO, which naturally makes loading large blocks faster than true random-access), as well as the cache. It'd be interesting to see the code run again on the MSTe at full 16MHz speed with cache enabled, to see which program is faster in that case, or if it ends up being pretty much 50:50.
@RetroDawn3 жыл бұрын
Unfortunately, the Falcon030 doesn't support burst mode. The TT did, though. Both Alive and New Beat document this deficiency. I agree that it would be interesting to compare with the MSTe in 16MHz mode with cache enabled, though.
@mangethegamer7 жыл бұрын
I don't buy this. Give us a pastebin of the source code so we can see if there are any other differences between the source code.
@saf2718287 жыл бұрын
Even more, I want to see the assembly language listings of the compiled output. p[i].p is well known to be less efficient than using pointer arithmetic (even inside an array); however, some (but not all!) compilers can recognize when p[i] is used inside a loop and automatically substitute pointer-based code. While I expect this to occur for x86-based code considering gcc's/clang's relative maturity, I question whether the 68K compilers provide similar optimizations.
@recklessroges7 жыл бұрын
github or it didn't happen
@Mike-gs7eo7 жыл бұрын
"p[i].p is well known to be less efficient than using pointer arithmetic". My understanding is that p[i] is equivalent to *(p + i), why would it be any slower?
@Stephon90157 жыл бұрын
Mike Scott My guess is that although equivalent, the computer still need to compute p[i].p into *(p+i).
@saf2718287 жыл бұрын
Mike Scott Because it involves a multiply and an addition, versus p++ which is just a simple addition. This becomes particularly visible with multiple dereferences to p[i].
@hellterminator7 жыл бұрын
NOTE: This 30 minute video analyzes in depth the one task where arrays don't crush linked lists by default. There is no operation for which the asymptotic complexity on an array is higher than the asymptotic complexity on a linked list; in most cases, array wins.
@Dngrcrw7 жыл бұрын
This video was just quite misleading and clickbait'y. Very disappointing compared to your typical videos. I've gotta put a dislike on this video for a few reasons: - The description which made it out as though in the general case arrays were slower - He made out as though he was NOT going to be running it on a modern machine - just on the Atari - The fact one has to watch the first 22 minutes (thank god for YT x2 speed) before he properly addresses the fact there are MAJOR differences between the machines
@jpphoton7 жыл бұрын
Excellent break down. Another gem.
@StagnatedOrc7 жыл бұрын
"We're using P inside P... confusing variable names, but then I'm a C programmer" 😂 LoL
@marcvanleeuwen59866 жыл бұрын
While there is some amusement value to this video, it provides very little information of what is actually going on. We do not know how the nodes of the linked list are ordered in memory, and this is very relevant. The fact that the nodes were just all generated in one swoop just before the test would suggest they are probably linearly ordered, with each link pointing a fixed distance forward from itself; by contrast, in a linked list that has been used for some time in ways that linked lists are made for (insertions, deletions, permutations of links), the nodes would tend to be scrambled in memory in a way that is detrimental to locality and therefore to cache performance. We also don't know whether the address arithmetic implicit in finding the address of p[i] is actually performed each time, or that (quite likely) the compiler has optimised this away by just incrementing the address of the previous array access. Unless we have some idea of what happens at the assembly level, we are just watching a race between contenders we hardly know at all. No mention made of locality of access, though this is the unique feature that makes the data cache relevant: in the example, no item in memory is ever used more than once (I'm assuming the few local variables are in registers) so the _only_ gain from cache is that it fetches not only the item requested by the CPU, but a whole cache line so that the next several requests might be cache hits.Caches (and many other things) have been designed specifically to optimise linearly repetitive access to arrays, so it is not so surprising that an array program benefits most from such optimisations. On the other hand, mention of the instruction cache is beside the point: both solutions are small loops that can reside entirely in the instruction cache. If you really want to compare intrinsic efficiency of array of linked lists as data structures, I would suggest a test that forces the array to be accessed non-linearly, and the list to be non-linearly ordered in memory. For instance, just insist that the items be visited in increasing order by the q-component while using the p-components (maybe doing something other than summation, for which order matters). In the array case, leave everything in place and pre-compute the permutation giving the order of visit, while in the list case (where we cannot do random access)pre-sort the list by increasing q, with a sorting method that only adjusts the links. The net effect would be that the linked list is scrambled in memory in the same way the array accesses are scrambled. Then the playing field is levelled in several respects (neither has much locality, so caching has little influence; array indexing cannot be optimised away), and the tests would be a lot more interesting.
@sergiosanchez54397 жыл бұрын
26:28 "the fact that the value is precalculated in memory..." what exactly does this mean? what is he referring to?
@Harve69885 жыл бұрын
I had the same question. I think he probably means the fact that indexing has to do a calculation to get the result. I.e. getting the 5th element of the array requires doing " arrayheadptr + (5 * element_size)" or something like that. This doesn't have to be done in the linked list version because the memory location pointed to by each next pointer is already 'precalculated'.
@pounet25 жыл бұрын
Not sure.... I didn't paid attention to this particular part. Maybe does he talk about the fact that modern processors are not running each instruction in a single step but through pipes. Let's say you want to compute a+b. The addition will be processed in several steps, for instance 3 steps (a,b)-> STEP 1 | STEP 2 | STEP 3 -> a+b Imagine that each step takes 1 unit of time. The time to complete the addition is 3 units. Now let's imagine you want to compute a+b+c+d. Three additions has to be performed, so the time to compute a+b+c+d should be 3*3=9. Right ? Well.... in fact, you can go faster because each component of the processor can work independently. Here is how the compiler can schedule the computation of a+b+c+d Time t=1 STEP1 of a+b t=2 STEP2 of a+b AND STEP1 of c+d t=3 STEP3 of a+b AND STEP2 of c+d t=4 STEP3 of c+d t=5 STEP1 of (a+b)+(c+d) t=6 STEP 2 of (a+b)+(c+d) t=7 STEP 3 of (a+b)+(c+d) So it takes only 7 unit times instead of 9 !!!! As you can see (c+d) is in some way "precomputed" during the computation of (a+b) itself. And as everyone knows (a+b)+(c+d)=((a+b)+c)+d [Numberphile has probably a video for it], so you get the expected result (but faster).
@williamweller81007 жыл бұрын
Hahaha, good shit. I was really really concerned when the linked list turned out to be faster on the Atari, just because I figured an array must be faster. Huge sigh of relief with the results on the mac and pi; I'm not insane.
@alcesmir7 жыл бұрын
A huge advantage of arrays is as mentioned cache friendliness (data locality). But more than that newer CPUs also include SIMD instructions to perform operations on a lot of data as once, say adding 4 or 8 ints in parallel. Adding in this the array will beat a linked list even worse on modern machines. I would try compiling and running with -march=native (along with -O3), which will allow for CPU specific optimizations. Or use some intrinsics to manually add the needed SSE or AVX instructions.
@MorRobots7 жыл бұрын
OK so real computer science talk here regarding array vs linked list. So a Linked list is a modular data structure that can grow or shrink to accommodate data in real time. Arrays need to be completely reallocated and copied to change size. Linked lists also require an extra bit of data in each element while arrays do not. Linked lists must be accessed from the top to the bottom, unless you saved an item of interest in advance. It should be noted that the overhead used in a for loop adds more instructions to this operations than just moving a pointer into a local variable. For loops require an index value such as i, a comparison operation, and an iteration. While a linked list only need to go to the next item until that next item is 0. both situations sounds simple and the same, however they are not when it comes to the assembly since you would just move the next items address into a register compare it, and keep going but in a for loop you would have to compare the index counter to the length of the array then increment the index counter. Since the for loop operation involves two variables(the length of the array, and the index counter) you would need to push those values onto the stack, then back off the stack as your application stepped deeper into its functions and or move those variables into and out of registers each time you got to the comparison. With the Linked List you only need to check if the value stored at a fixed distance away from the list items pointer is 0 or not. Since this distance is fixed, no need to pull the value from memory into an register. I'm not sure I am making a good explanation here but essentially a for loop uses more steps to maintain the loop while a linked list loop just keeps going until a condition is met.
@1Maklak11 ай бұрын
I expected the reverse loop to be fastest, because that's one subtraction less. With a normal loop for (int i=0; i=0; i--) there is no subtraction and i is directly compared to 0. In any case, testing speed optimizations is needed. I sometimes theorized optimizations that turned out to actually be slower than something simpler.
@zxcxcsdfasdfasf7 жыл бұрын
Great deep dive into what I thought would be a trivial problem !