Arrays vs Linked Lists - Computerphile

  Рет қаралды 491,373

Computerphile

Computerphile

7 жыл бұрын

Which is faster? The results may just surprise you. Dr 'Heartbleed' Bagley gives us an in depth shoot-out - Arrays vs Linked Lists...
Link to code can be found in Dr Bagley's tweet: t.co/4kg3AZDA57
Sun Server: • Sun Microsystems (Re-E...
Megaprocessor: • MegaProcessor - Comput...
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Пікірлер: 922
@nick1wasd
@nick1wasd 5 жыл бұрын
"We're not gonna run this on the iMac, we're not gonna run it on the Pi, we're gonna run it on the Atari behind me" I love this guy so much.
@TheSulross
@TheSulross 4 жыл бұрын
Nick1wasd, he's taking CPU caching out of the equation, but on modern architectures, CPU cache friendly data structures can make enormous difference in performance - arrays can be designed to be more CPU cache friendly
@radzdahound3099
@radzdahound3099 4 жыл бұрын
TheSulross You probably already know this by now but he goes into the major speed differences made due to CPU cache later into the video
@farhanyousaf5616
@farhanyousaf5616 2 жыл бұрын
Let’s run it on the Amiga and C64 and 800xl
@monsterhunter445
@monsterhunter445 Жыл бұрын
@@TheSulross isn't that the point of why you use arrays more than linked lists just the fact it's CPU cache friendly?
@gormster
@gormster 6 жыл бұрын
"Confusing variable names, but hey, I'm a C programmer" 😂
@zyaicob
@zyaicob 4 жыл бұрын
As a student learning in C i can confirm that variable names are either long and confusing or short and confusing
@anuraghooda8439
@anuraghooda8439 4 жыл бұрын
@MLU8811 Same
4 жыл бұрын
hell0
@pedronori8999
@pedronori8999 3 жыл бұрын
@MLU8811 Did exactly the same AND like the video
@kahnfatman
@kahnfatman 2 жыл бұрын
Say that to the Assembly programmer, they don't even have VARIABLEs.
@Nerdnumberone
@Nerdnumberone 7 жыл бұрын
The more important difference is the difference is how the performance scales for different operations and what you actually plan to use the data structure for. Both of these were O(n) operations, but selecting an arbitrary element in an array is O(1) in an array and O(n) in a linked list. However, trying to dynamically increase the size of an array would be O(n) (since you have to make a new array and copy the data) while a linked list can add another item to the front in O(1). Adding or removing arbitrary items next to a node you're currently looking at is O(1) for linked lists and O(n) for arrays (though this would be O(n) for linked lists if you had walk to the element you want to change). They are different tools for different things. If you're only using operations where the two structures are comparable (like the summation mentioned in the video) and speed is particularly important, then such a test makes sense, but what you use the data structure for is likely to be far more important.
@ChrisHinton42
@ChrisHinton42 7 жыл бұрын
I agree with your point about choosing the right tool for the job, but insertion into a dynamically resizing array is amortized θ(1) if you change the size geometrically.
@mastablation703
@mastablation703 6 жыл бұрын
Chris Hinton I think Nerdnumberone was referring to inserts into the middle or beginning of the array, which is O(n).
@ivoivanov7407
@ivoivanov7407 6 жыл бұрын
"Amortized" means allocate once, expand if needed, use many times. Yes, this is the case when array has to be used. But in case you have to do many insert/delete operations, there is no amortization.
@Nerdnumberone
@Nerdnumberone 6 жыл бұрын
Having worked mostly on web since school, I jumped to the absolute stupidest way to resize an array: Making a new array that is one element larger and copying the data into it, assuming that any array is necessarily a continuous block of information. Sorry, I haven't dug into the low level implementation of data structures since college, where an array was the continuous block and anything else was either a different data structure or a hand-waved black-box. This assumption was idiotic of me and required the programmer to fail to allocate enough memory for their application. If I'm skimming this Wikipedia article correctly, then dynamic arrays just allocate more memory, just in case. You still have this problem every time you hit that limit, but it should be rare enough to be negligible (as in every time you hit your cap you double the memory allocation).
@hellterminator
@hellterminator 6 жыл бұрын
+Ivo Ivanov That is not what amortized means at all. Amortized θ(1) means that *on average* the time per operation is constant. Yes, whenever you have to reallocate the array, that particular insertion will take θ(n) time, but because you're increasing the size of the array geometrically, the interval between such insertions also grows linearly, so *on average* every insertion is performed in constant time.
@flavioalarcon5533
@flavioalarcon5533 6 жыл бұрын
I learned java programming from an institute, in there I basically learned the basics on how to program, develop websites and things like that, but they never even began to explain how things actually worked, it still worries me how little I actually know about computers, since all I know is basically what years of progress in the computer world have achieved, and I'm just using the tools they gave me without questioning, videos like this are very important for someone like me, I appreciate it.
@abdulhfhd
@abdulhfhd Жыл бұрын
they didn't teach you data structures?
@jaydavis6357
@jaydavis6357 9 ай бұрын
In my opinion this doesn't really have too much to do with Data Structures but more so highlighting compilers, CPU Processors@@abdulhfhd
@abdulhfhd
@abdulhfhd 9 ай бұрын
@@jaydavis6357 you are right
@Nathan5791
@Nathan5791 7 жыл бұрын
This stuff is worth Gold. Not many modern Computer Science degrees teach this because modern day CPUs have cache and compliers have optimization which means that there is virtually no difference in the speed. So people have started assuming that everything runs as quick. Scale the CPU down and you understand the true answer. Very well put video! Worth the 30 minute
@FalconGames109
@FalconGames109 7 жыл бұрын
Any "Computer Science degree" that doesn't teach linked lists is not computer science.
@Rififi50
@Rififi50 7 жыл бұрын
What do you mean by "this"? Teach about caches? Teach about unrolling loops?
@BubbaYoga
@BubbaYoga 7 жыл бұрын
What practical difference is there if there is no difference? Academic?
@markallen97
@markallen97 7 жыл бұрын
There's effectively no difference in small problems, or most common teaching assignments (which are usually fast enough to be instant), then you go out into the real world with huge datasets or untidy problems where the differences can be a LOT more noticeable. Especially working in games where performance can be critical and you're sometimes working with very limited hardware - this is one of the things I always try to drum into new graduates
@BubbaYoga
@BubbaYoga 7 жыл бұрын
Thanks, I understand what you're saying but I guess I was just hung up on the word practical. As in, in practice. If there is a difference, there is a difference.
@AlexandriaRohn
@AlexandriaRohn 6 жыл бұрын
05:22 We're going to visit every single element (125k) in an array & linked list and add up all their values. 11:18 Running linked list version on Atari computer takes ~166 clock ticks. 13:07 Running array version on Atari computer takes ~179 clock ticks. 17:55 On the Atari computer. Traversing the array backwards is ~114 clock ticks. 18:20 Why? What's happening is that the instructions the Atari machine can execute favors walking backwards. 21:00 Summary of results for Atari, Raspberry Pi, and iMac. On the Raspberry Pi and the iMac, the array is much faster than the linked list. 22:15 What's happening? Caching. The Atari doesn't have a cache. Every bit of data has to be fetched from memory each time. But the CPU isn't much faster than the memory. But for the iMac and the Pi, the CPU is much faster and has to wait for things to be fetched from memory.
@gwenynorisu6883
@gwenynorisu6883 5 жыл бұрын
Almost, but you haven't properly annotated *why* the linked list is marginally faster than plain walking forwards on the ST (your next address is prebaked and can be used as a JMP IMMediate, so you don't have to depend on using the CPU's own, slightly slower address calculation), nor why the cache speeds things up so dramatically for the plain array in both directions vs the LL on the others (reading an uncached memory address actually pulls in a whole block of data surrounding that one location into the cache; if you're simply traversing a large chunk of memory either wholly linearly or with very small jumps for each step, that means you could potentially have a few dozen extra cache hits after the initial miss before crossing the next block boundary, massively accelerating the processing speed; the linked list, if it doesn't allocate each new value close to the previous one but instead at an essentially random location, cannot benefit from that speedup)... Also, going backwards on a 68k series chip is markedly faster because of one particular instruction group: Decrement and Branch if is met. Typically DBI-zero - ie it will spin rapidly through a loop that does some particular task, finishing only when the content of a particular register hits zero (and sets its easily testable zero flag). There's no particularly easy or fast equivalent for counting upwards - instead you have to repeatedly test whether your loop-count register is equal to some particular arbitrary value, and simply supplying *that* and doing the more complicated equal/not equal test takes a nontrivial amount of extra processor (and possibly memory bus) time. You can somewhat split the difference by using a decrementing counter to control the loop but either incrementing a separate counter to control the forwards motion, or setting a register to minus the decrementing one, but it's only going to shave the penny and will still be quite a bit slower than just running through the array backwards. The memory in those older machines has no burst output mode (which typically only works in incrementing-address mode) or even anything much like FPM or EDO (which allow reasonably fast, zero-page-esque random addressing within a single block of memory), so you get no compensating speed advantage in terms of reduced memory latency; wholly random access is exactly as fast as wholly linear forward stepping. Which in a way is also why the linked list is faster, besides the lack of cache. That particular instruction is also why, if it was better optimised for later 68k-series chips, the program would show rather curious results on the Falcon (I wouldn't want to try and predict them, in fact) or any other similarly engineered 68030 machine, and show an even stronger bias towards backwards array walking on an ST upgraded to a 68010 (or maybe 020, which has a rather poorer cache than the 030) - the very tight "Loop Mode" instruction introduced with the 010 that is literally engineered for doing nothing more than blasting through very simple memory-walking routines bounded by exit conditions such as DBZ. For the most applicable uses, optimising code to use that instruction can give a 2 to 3 times speedup vs the more explicitly written-out plain 68000 version, and in mixed-instruction code that can occasionally make use of it you still get acceleration measured in dozens of percent. Essentially, if you thought already thought 68k code was strange because you're more used to how x86 does things, Loop Mode just makes it even weirder... the 010 and later processors are pretty much tuned, in cases where burst mode or caches are unavailable or no use, for skimming through memory, very quickly, _backwards._
@blinded6502
@blinded6502 4 жыл бұрын
I wish people didn't create such bloated videos in the first place. Thanks for the laconized info!
@MrDeeb00
@MrDeeb00 3 жыл бұрын
thx, nice summary
@aidangarvey7049
@aidangarvey7049 2 жыл бұрын
@@gwenynorisu6883 as someone who's programmed for the 68k a bit, I'm curious, how would a data register being decremented by DBRA speed up array access when walking backwards? Wouldn't you have to increase or decrease a value in an address register in every iteration of the loop? How does the value in the data register get used to calculate the address of an array element? Off the top of my head I can't really think of anything that would be faster than just adding a constant value (i.e. the size in bytes of one element) to an address register every iteration.
@soraaoixxthebluesky
@soraaoixxthebluesky 4 жыл бұрын
“...which means I need a very high tech piece of equipment” *showed us a floppy disk*
@JulianColeman03
@JulianColeman03 6 жыл бұрын
videos like this make me really appreciate C as a language
@PauloConstantino167
@PauloConstantino167 3 жыл бұрын
C is the ultimate language.
@DeusExAstra
@DeusExAstra 6 жыл бұрын
Between caches and prefetchers, in modern hardware arrays are almost always better than linked lists, or other data structures much of the time. Especially in things like game engine architecture, putting items in arrays versus linked lists of randomly allocated elements (i.e. items all over in memory) can yield very significant speed benefits.
@D4no00
@D4no00 Жыл бұрын
that is true, however this is mostly because intel have made their backdoor optimizations, I can bet that the performance on an arm or amd is much more different. The problem with this is that at application level you have no idea what is going on and you can't optimize it, it is just a question of time until this will fall off, just like directx compared to vulkan.
@dale116dot7
@dale116dot7 Жыл бұрын
@@D4no00 The same performance hit happens in PowerPC architectures, and for that matter, any processor with a data cache. Also, arrays are safer than pointers stored in memory. Some safety critical coding standards prohibit stored pointers.
@stephenelliott7071
@stephenelliott7071 4 жыл бұрын
Steve is by far the clearest when it comes to explaining anything Computerphile. Clear, Concise, C programmer - all the C's! :)
@MasterNeiXD
@MasterNeiXD 7 жыл бұрын
Second term computer science students, rejoice. Computerphile has come to save you.
@KevinFu5100
@KevinFu5100 7 жыл бұрын
i swear, this channel has saved me many times.
@WilsenHernandez
@WilsenHernandez 7 жыл бұрын
Saved me when studying recursion
@Stigsnake5
@Stigsnake5 7 жыл бұрын
I've already graduated but better late than never, although I'm doubtful it'll be useful in my job
@knightshousegames
@knightshousegames 7 жыл бұрын
I remember when Computerphile was brand new, I was in my Data Structures and Algorithms class, and doing horribly. I think I would have done a lot better if they had started this channel about a year sooner. I did tell my professor about this channel on one of the last days of class, so I never found out if he ever used it in any of his lectures or not.
@Ultraice98
@Ultraice98 6 жыл бұрын
Damn son. Saved me again.
@TheJaguar1983
@TheJaguar1983 6 жыл бұрын
First impression. Depends on the operation. I tend to use arrays when I need contiguous memory, or I have a known length, and if I need random access. I use linked lists when building lists with unknown length and where random access is not required.
@SHUBHAMGI
@SHUBHAMGI 7 жыл бұрын
a + b = b + a you can watch Numberphile for that 😂😂 LoL
@ruben307
@ruben307 7 жыл бұрын
more like LoL are three letters => Half life 3 confirmed
@MazelTovCocktail
@MazelTovCocktail 7 жыл бұрын
SABER - FICTIONAL CHARACTER Not in computer science. ad does not equal ba.
@gdogawesome
@gdogawesome 7 жыл бұрын
seems legit
@vadimuha
@vadimuha 6 жыл бұрын
But you forget that we can extend list easily but if we want to add 1 extra element to an array we must create new one
@panda4247
@panda4247 6 жыл бұрын
wait, i thought that there was a conjecture, that a+b=c so it that not valid?
@philippbeer
@philippbeer 3 жыл бұрын
A fantastic video. It’s set up great. And the fact you do “myth busting” and teaching people how to test a question like is simply awesome.
@MorRobots
@MorRobots 7 жыл бұрын
This will completely be a function of use case and implementation. I love to see what you all setup for this test.
@DanRos2Coder
@DanRos2Coder Жыл бұрын
I stumbled across this video this evening, some 5 years after it was published. Brilliant work, mate! I'll be searching to watch more of your videos right now!
@WeatherWonders
@WeatherWonders 6 жыл бұрын
He's talking about complex computer stuff and I'm just intrigued by the little ASCII spinner at 9:37. Edit: Haha, they talk about it later on. :D
@taragnor
@taragnor 5 жыл бұрын
lol same here. The first thing I wondered was "I wonder how many valuable CPU cycles are being wasted animating that."
@rustycherkas8229
@rustycherkas8229 2 жыл бұрын
Used that spinner on VDTs, back in the day... { putchar('\b'), putchar( "\\|/-"[ a++ & 03 ] ); } His was displayed inside "()", so I suspect some kind of "printf()" that certainly wouldn't help speed things along during initialisation (on the Atari)... Not sure why he's generating random numbers to initialise all those 'p's... Any (smallish) number would do just as well: struct.p = loopcounter; Having written lots of code for diverse tasks, imo, each approach has its use and purpose. Arrays when their size is not prone to fluctuation, single or double linked lists when there's lots of inserting/deleting to be done. There's no one "optimal" scheme that is best in all circumstances. (A recipe book has a fixed number of recipes, but each recipe has a variable number of ingredients... Thankfully 'C' has dynamic heap allocation and pointers...)
@rustycherkas8229
@rustycherkas8229 2 жыл бұрын
@S B C's "volatile" keyword makes optimisation a little less secure about its assumptions of what stays and what can be replaced. Scary will be the time when optimisation replaces some code's explicit linked list with array operations, or vice versa, and coders come to rely on that. The demo would work just as well with recursive function calls for the linked list example using the stack as the 'chain' of object instances.
@nab-rk4ob
@nab-rk4ob 6 жыл бұрын
Great info about the differences in the way the chips and the compilers handle the same simple code. Stuff like that fascinates me. Thanks.
@RakeshPGopal
@RakeshPGopal 6 жыл бұрын
Ironically I see this video next on the auto-play list "Bjarne Stroustrup: Why you should avoid Linked Lists"
@tanveerhasan2382
@tanveerhasan2382 5 жыл бұрын
why indeed
@notthedroidsyourelookingfo4026
@notthedroidsyourelookingfo4026 2 жыл бұрын
@@tanveerhasan2382 Watch it, then you'll know. But generally, because growing an array geometrically reduces the amount of time you need for expanding/shrinking it, and because insertions/deletions, which are a Linked List's strength, still require traversing it in O(n). From my own and limited experience, I've not yet come across a case where anything was ever actually faster than an array. In one case, a hash map came close. Might even have been the same speed, and was definitely more convenient because I had key/value pairs, but I ended up ditching it and using a sorted std::vector, because I needed the simplicity for MPI compatibility..
@iggienator
@iggienator 7 жыл бұрын
I love these in depth looks on computing processes, keep it up!
@vavassor
@vavassor 7 жыл бұрын
Also, depending on the fragmentation of the allocator from which the linked list nodes are taken from, on a cached CPU it can be much much slower than an array which always has its elements on the same cache lines relative to one another. And then there's a funny case where supposing you have an allocator which allocates things linearly and you allocate the whole linked list at once, it basically gives you an array with a bunch of (useless) pointers to next nodes between each element. This will basically perform not quite as fast as the array, because it won't be able to fit as many elements per cache line due to the space wasted by the pointers, BUT it'll be much closer to the array than prior. Overall, the speed of operating on a full linked list on a cached CPU will vary depending on where the nodes were put in memory relative to one another. Another thing to note is how in my work anyhow, you're very unlikely to see a list as big as 125k elements. So, the effects of the cache on full array loops is increased even moreso.
@Spacecookie-
@Spacecookie- 6 жыл бұрын
Thank you for clearing that up. I came from programming in the end of the 1980's. I didn't get the linked lists thing and why arrays weren't used. People were talking about linked lists instead.
@RobBCactive
@RobBCactive 2 жыл бұрын
A lot of it was about just allocating the memory you used, then recycling it, which favoured dynamic approaches storing structs/records together. Traditional arrays require sizing and static allocation which put arbitrary limits. You didn't want to allocate a massive array on start up for some feature which wasn't used, or would restrict what the system could do. The languages used weren't necessarily as flexible as C which limited options for long running programs. Allocate at beginning, process and terminate worked fine for short lived programs running jobs on multi-user systems. As a user began using larger GUI programs, static pre-allocation was less feasible.
@subsystemd
@subsystemd 2 жыл бұрын
Wow. That has to be one the best video on the topic of computer hardware differences and how we as programmers should not make any assumption on how a machine is supposed to function when executing code. I love the "I'm writting in a high level language" when actually it's C and it's one of the least abstract language you can use without resorting to assembly. Just shows you how much this man knows his stuff.
@CountElectric
@CountElectric 7 жыл бұрын
Very informative video on this topic. And actually very clear about what is happening. Great job.
@bus
@bus 7 жыл бұрын
Good call on running it on a simple machine. More entertaining and more consistency, win win!
@dragonore2009
@dragonore2009 5 жыл бұрын
8:11 "p inside p, were using p inside p, confusing variable names, but then I'm a C programmer" LOL.
@jabersyed978
@jabersyed978 7 жыл бұрын
Would love to see Dr. Jason Atkin next time. Had him for my C++ module. He is amazing. Dr. Bagley is awesome as well.
@tcocaine
@tcocaine 6 жыл бұрын
Yesterday I fell asleep while watching this video, and when the outro popped, I swear, I woke up THE WHOLE HOUSE with the greatest scream humanity has ever heard.
@ArjanvanVught
@ArjanvanVught 7 жыл бұрын
When using the Raspberry Pi 3 there is the NEON co-processor. Compiling with -O3 could generate vectorized code. To be sure it is not, the vector optimalization should be disabled. Vectorized code would be increasing the speed on array's manipulation.
@UntouchedWagons
@UntouchedWagons 7 жыл бұрын
gcc can unroll loops too, it's not a clang-specific feature.
@HauppiLP
@HauppiLP 7 жыл бұрын
Every decent compiler can.
@recklessroges
@recklessroges 7 жыл бұрын
for(p=0; p
@naikrovek
@naikrovek 7 жыл бұрын
He wasn't talking about unrolling a loop, he was talking about SIMD.
@nicholaswallingford3613
@nicholaswallingford3613 6 жыл бұрын
When GCC switched to the GPLv3 license, Apple forked GCC. I believe the GCC version shipped on OSX is something like 4.2, mainline GCC is up to 7.1. This was in like 2006 or something. GCC has been significantly improved since then. So it's not unlikely that his version of GCC won't unroll and vectorize the code, even though real GCC will.
@alexandrupana6037
@alexandrupana6037 6 жыл бұрын
Reckless Roges Fails to compile. You smartass...
@linardssmeiksts6558
@linardssmeiksts6558 2 жыл бұрын
Nice test and comprehensive explanation as always. Love your videos. 😊
@clementfoyer7785
@clementfoyer7785 7 жыл бұрын
If I may, in this case, it may have been interesting to also compare between the array of structures you had and a structure of arrays (one for p, the other for q) to enforce the vectorization of your summing instructions, and optimize the cache use. Although, as you used the -O3 optimization, may it be possible that the q part of the structures had been opt-out ?
@Madsy9
@Madsy9 7 жыл бұрын
I can understand that the benchmark had to be trivial in order to run on the Atari ST, but I think simply summing integer elements in a loop gives a bad indication on the performance differences between linked-lists and arrays/vectors on modern architectures. If the pointers in a linked list point to a larger data structure, it can absolutely trash the cache and cause big performance hits. It is even worse if the elements in the linked list are scattered around in memory. With three levels of cache which we have now, getting cache misses gives a hefty penalty. Using arrays/vectors is almost always the right choice nowadays when performance is important, even if you don't do random-access. The only exception I can think of is for huge data sets with frequent insertions and removals.
@madeso
@madeso 7 жыл бұрын
Madsy9 worth noting is that if you don't care about the order of items in your array, both insert and remove operations are O(1)
@jellorelic
@jellorelic 7 жыл бұрын
The point wasn't to address a typical use case. The point was to illustrated that the fundamental architecture underlying your code can in fact cause things to behave in unexpected ways. Yes, most of the time X may be true.. but *assuming* it's a universal case can cause problems.
@markallen97
@markallen97 7 жыл бұрын
It depends on your use case, I often end up in cases where an unsorted array and a bit of brute force is faster than a tree or hash based set when your data size is tiny (for example under 10) but you're doing a huge number of lookups.
@bigbenhebdomadarius6252
@bigbenhebdomadarius6252 7 жыл бұрын
+Madsy9 Sorting is another area where linked lists can be faster, especially if the record size is large, since sorting a list only requires altering the pointers, not moving entire records around.
@SyntheToonz
@SyntheToonz 6 жыл бұрын
Possibly the compiler is doing something less than optimal when making the code for an array reference. While in the Air Force in the early 90s we had a problem writing a document paginator on a worst case 5MB report file. (Ultrix workstations). The program sat there and churned for over 30 seconds while the paginator routine determined the starting location of every page in 17 different page formats. Users found this waiting intolerable. I read an article then about how C compilers may generate a multiplication to reference array elements, and that traversing an array by pointer reference instead of subscripts would not include this extra math behind the scenes and should run faster. I tried this idea out on my Amiga 3000 and the paginator using pointers ran in 12 seconds for an equivalent 5MB text file. I was pretty sure the 25MHz 030 should not be a lot faster than the Ultrix boxes at work. The recoded pointer version of the paginator ran in 5 seconds on the ultrix workstations. Happy users.
@cyberbadger
@cyberbadger 6 жыл бұрын
It's always a good idea to have an understanding of what's going on under the hood.
@watashiwahatchi
@watashiwahatchi 5 жыл бұрын
Loved this video! Thanks guys. Great take away; "if you're not sure, come up with some tests and collect the real data". This echos Mike Acton's 'truths', 1. Hardware is the platform 2. Design around the Data.
@xteuk
@xteuk 7 жыл бұрын
And I thought I knew how it's working. This was a brilliant answer to this "trivial" question. I learned a lot, which is some-divided-by-zero from what I expected first 😀 Bravo et merci
@123123raoul
@123123raoul 7 жыл бұрын
Taking the median would give more accurate results. The OS scheduler might interrupt the program sometimes resulting in distorted results when taking the average.
@softwarelivre2389
@softwarelivre2389 2 жыл бұрын
True
@Chaisz3r0
@Chaisz3r0 6 жыл бұрын
So in all fairness you'd need to enable loop unrolling in gcc (experimental) and compare again. That should significantly speed up the array-based RPi compilation.
@danielgrace7887
@danielgrace7887 7 жыл бұрын
I just learnt about "-O3" setting in GCC. Nice one, thanks!
@anshul5243
@anshul5243 4 жыл бұрын
FWIW, -O3 has a tendency to break code, -O2 is a safer, albeit slightly slower alternative
@glowingone1774
@glowingone1774 3 жыл бұрын
@@anshul5243 ozone mode bad
@babel_
@babel_ 6 жыл бұрын
Can Dr Bagley do another shootout, this time between lookup tables and live calculation? That would be amazing to see how modern processors differ from older ones.
@ResandOuies
@ResandOuies 7 жыл бұрын
But in the end the array was faster on all the CPU with a cache, which should be all CPUs now. Right?
@jellorelic
@jellorelic 7 жыл бұрын
Generally, yes... across most use cases and particularly with modern cpu/compiler/memory architecture arrays will be faster than linked lists. But the point of the video is that it's not a fundamental given. Even in cases where you can math out the number of steps involved in the logic, your real-world systems that the code runs on can affect the actual results.
@bruterasta
@bruterasta 7 жыл бұрын
Even CPUs considered obsolete have caches. CPUs without one are old enough to consider them as antiques.
@rylaczero3740
@rylaczero3740 7 жыл бұрын
correct mè if i am wrong, linked list are slower but memory efficient, slower than arrays as they don't provide random access, but heaps can overcome all that stuff, is there anything more it in the video?
@n00dle_king
@n00dle_king 7 жыл бұрын
Not all modern processors have caches. For instance, most Arduinos do not have a cache.
@jakubivanicky3413
@jakubivanicky3413 7 жыл бұрын
Linked lists are not mem effecient, as they require more data to keep ( every node keeps its key and adresses of prev. and next nodes). However, it is easier to add or remove nodes from the middle pf the list. If you search data structures on wikipedia, i think they say both mem. Efficiency and time efficiency for different operations
@TimJSwan
@TimJSwan 4 жыл бұрын
"There's technically a cache but not really but essentially no but there's something like a prefetch"
@jpphoton
@jpphoton 6 жыл бұрын
Excellent break down. Another gem.
@OrbitalCookie
@OrbitalCookie 2 жыл бұрын
If you need linked lists on modern architecture store them in arrays, and instead of pointers use indices. It can be faster than linked list, but of course you still need to measure that. General approach for fast code is to put all required data relatively nearby in memory. Allocating the big array chunk kind-of ensures that (lol there are OS caveats still), as long as elements aren't the pointers.
@romanemul1
@romanemul1 6 жыл бұрын
a question which interest me most now. Is the floppy HD format or DD ?
@finkelmana
@finkelmana 6 жыл бұрын
"DD" with a hole drilled to make HD
@gwenynorisu6883
@gwenynorisu6883 5 жыл бұрын
Why would you do that, as it's being used in a machine that, odds on, only has a double density drive? (I know HD was at least a later option for the MSTe, but STs in the main have 720k drives) As well as bothering with that trick when proper HD floppies remain cheap (heck, pretty much free these days) and plentiful, and have formulation that's a better match for the drive characteristics? PC drives including USB ones can still happily read and write 720k discs, btw.
@mido112k6
@mido112k6 4 жыл бұрын
Man, I really wish I can switch my major rn.. I've always had smth for computers
@AJSquirrel53
@AJSquirrel53 3 жыл бұрын
Do it
@alcesmir
@alcesmir 7 жыл бұрын
A huge advantage of arrays is as mentioned cache friendliness (data locality). But more than that newer CPUs also include SIMD instructions to perform operations on a lot of data as once, say adding 4 or 8 ints in parallel. Adding in this the array will beat a linked list even worse on modern machines. I would try compiling and running with -march=native (along with -O3), which will allow for CPU specific optimizations. Or use some intrinsics to manually add the needed SSE or AVX instructions.
@mykilpee
@mykilpee 6 жыл бұрын
I'm curious with the Atari, since it has to fetch so much from memory with the Array, since Arrays are generally stored in a block of memory you can create a pointer to the beginning of the array (or end) and increment the pointer through the array to add it up if that would speed it up, match it, or slow it down.... A question on how pointers are used and stored in that system and how it will compile it.
@lasersbee
@lasersbee 6 жыл бұрын
9:03 I thought I was the only one still using floppies for some applications... ;) 21:40 I would have liked to see the results on an Intel Windows computer.
@chrissxMedia
@chrissxMedia 5 жыл бұрын
well...you should really re-run it with c++11 iterative for-loops, because these have a kinda big impact on the cpu-performance of arrays
@pleasedontwatchthese9593
@pleasedontwatchthese9593 6 жыл бұрын
This video is great. Make more of these!
@DLCaster
@DLCaster 6 жыл бұрын
Well, yes, test and measure things for sure-at some point. You could also read the book on the processor too. The Intel optimization manual is very helpful if you are really trying to get good performance in real-world situations on x86 processors.
@Deveron4
@Deveron4 7 жыл бұрын
It'd be great if Dr. Bagley was recorded in front of a darker background. The over exposed window background is rather distracting.
@MasterNeiXD
@MasterNeiXD 7 жыл бұрын
Damodara Kovie Maybe it's part of who he is. I like to think it is intentional, after all they're computer scientists; they know how to use a damn DSLR.
@Deveron4
@Deveron4 7 жыл бұрын
Bryan Keller Fair enough, but I'm appealing to Sean Riley, the videographer and editor.
@NicholasLeader0
@NicholasLeader0 4 жыл бұрын
Just needs some front lighting
@tengelgeer
@tengelgeer 7 жыл бұрын
8:15 "But then, I'm a C programmer." xD
@DarkLorgan
@DarkLorgan 4 жыл бұрын
Awesome comparison, great video!
@Chriva
@Chriva 6 жыл бұрын
I'm certain Mr. Bagley already know about this but on 68k specifically an array will ALWAYS be faster than a linked list when coding in assembly since you can predecrement/postincrement the address-pointer. (as in direction doesn't matter) ie: lea.l "address of array", a0 clr.l d0 clr.l d1 move.l 1048576, d2 (Number of iterations.) Loop: move.b (a0)+, d1 add.l d1, d0 subq.l 1, d2 bne.b Loop d0 will contain a checksum32 you could also add 1048576 to the address of the array and do "move.b -(a0), d1" inside the loop to read it in reverse. Speed should be more or less the same. With a linked list you also have to fetch a new address for every iteration so there's no chance that is faster on these specifically :)
@PEGuyMadison
@PEGuyMadison 5 жыл бұрын
Actually... you may want to look at the index calculation time on the array on the older machine. Some compilers didn't optimize for powers of two and used a multiply for the index. This was pretty painful back in the day.. remember the ARM processor had a serial multiplier on some models but a 5 or 6 clock multiply was also a possibility. People forget this today, typically its 1 clock. IIRC the 80486 had a 3 clock integer multiply
@neres909
@neres909 7 жыл бұрын
Very insightful benchmark. At first I was worried, that you don't mention cache's role in it but you did. Only thing that's missing there, is that this is synthetic test, and in real life applications would show far different results. The reason is not only those cache misses, that you mentioned, but also that on x64 linked list node would be 2x greater in size than one in array. This applies consequences for how much memory will be prefetched to cache and it's pollution thus greater performance hits during execution. Also, as this is synthetic test, all memory for linked list is allocated one after another anyways, thus lower cache misses than it would be in more likely scenario in real life application, where nodes would be scattered through memory and single fetch would likely be a miss.
@jellorelic
@jellorelic 7 жыл бұрын
Everything you say here is true, but beside the point. He's trying to teach a more fundamental fact about CompSci in general - just because X is faster/better/whatever on paper doesn't mean that will always turn out to be true when the code hits metal. While it's true in modern cases that most programs can ignore the underlying architecture in most cases, sometimes it can result in unexpected behaviors.
@tljstewart
@tljstewart 5 жыл бұрын
How were you able to compile on your iMac x86_64 cpu and run on Atrai Motorola chips?
@jakejakeboom
@jakejakeboom 7 жыл бұрын
I remember calculating things like average memory access time (AMAT) in my computer architecture course, kinda boring but enlightening. Show's how big a deal fast+big cache can be for memory-intensive programs. Loop unrolling also would be a factor, although each of the computers tested are probably taking advantage of that given the modern compiler with -O3 optimization.
@nbpierson
@nbpierson 6 жыл бұрын
Isn't the Atari array code running slow vs linked list because the cost of index to memory address is using Multiplication? It more related to C array syntax and math occurring because of the index parameter. I expect this (multiplication) is a larger factor than memory access and not having caches on old Motorola 68000 Atari system. Linked list code will not involve multiplication. Did you look at the assembly code generated?
@josephcote6120
@josephcote6120 6 жыл бұрын
Did you notice that it was an array of objects, and not an array of primitives? Makes the array testing come out far, far worse than it otherwise would.
@trevinbeattie4888
@trevinbeattie4888 5 жыл бұрын
Joseph Cote, This is C, not Java. They're called "structs". ;)
@josephcote6120
@josephcote6120 5 жыл бұрын
So what? My point still stands. Arrays of pointers will always be less efficient than an array of primitives.
@MrMrsirr
@MrMrsirr 5 жыл бұрын
Joseph Cote While I don't care about the whole "object" vs "struct" thing, it is important to note that, unlike in Java, data structures in C aren't referenced by pointers unless the code specifies that they are. In C (and other languages), any structure, regardless of it's type, can be accessed either directly or by pointer. In this case, the array was a contiguous sequence of structures, no pointers involved other than the pointer to the array on the heap.
@josephcote6120
@josephcote6120 5 жыл бұрын
Let me make my point even more plain. Linked list. Pointer to the head. Each item has a data and the pointer to the next "thing". For each item in the list you have to get the address of the item either from the head pointer or the previous item. Memory access to get the data, memory access to get the next pointer. Perhaps even more processing to resolve the actual addresses within the "thing." Primitive data array: Using the base address of the array and the index, calculate the address to read (very fast all in cpu) access memory to retrieve data, go to next item. For each item ONE address calculation and done. This is my ONLY point: Arrays of primitive values will ALWAYS be much more efficient in tests like this than any kind of fooling around with pointers and objects or whatever you want to call those "things." The video compared a linked list of "things" to an array of "things", NOT a big difference.
@utl94
@utl94 6 жыл бұрын
8:14 "Confusing variable names, but then, I'm a C-programmer." Haha!
@zxcxcsdfasdfasf
@zxcxcsdfasdfasf 7 жыл бұрын
Great deep dive into what I thought would be a trivial problem !
@K3rbalSpace
@K3rbalSpace 3 жыл бұрын
Loving the Atari MegaST in the background!
@SHUBHAMGI
@SHUBHAMGI 7 жыл бұрын
finally, something academic 😅
@thechubbypanda
@thechubbypanda 5 жыл бұрын
Did he actually say "doog yrev"!? lmao
@nemplayer1776
@nemplayer1776 4 жыл бұрын
Did eeh sey
@revenevan11
@revenevan11 3 жыл бұрын
That's what it was!!!
@TristanBailey
@TristanBailey 6 жыл бұрын
can you do a video some time on why you have so much dot matrix printer paper to draw on still? Or just unused left overs?
@PieroUlloa
@PieroUlloa 5 жыл бұрын
You could also add than in the iMac, the array could fit entirely on cache as opposed to the linked list, and optimizing the code for loop unrolling gave it all the speed boost. So theoretically it might have never accesed the memory but to load the program from disk when running as array.
@FalcoGer
@FalcoGer 4 жыл бұрын
It's quite obvious. Linked lists are of complexity O(n) for lookup, insertion or deletion. Arrays are of complexity O(1) for the same thing. you take the starting address and add (n * sizeOfElem). Arrays are fixed size, linked list are of variable size and more flexible. At the same time memory allocation is very slow. if you put your array on the stack it's again literally a single cpu instruction (add to esp). running over all the elements still would be faster in an array still, since in a linked list, any element might be anywhere in memory, and thus it might not be in the cache in a sufficiently large list while the stack is almost certainly on the cache. also you need to dereference a pointer, meaning looking at the memory address to fetch the memory address at which the next element is, which is 2 memory accesses, while the array is again literally ESP + offset. Also linked lists are larger in memory, since you have to store that extra bit of information.
@elkeospert9188
@elkeospert9188 2 жыл бұрын
"It's quite obvious. Linked lists are of complexity O(n) for lookup, insertion or deletion. Arrays are of complexity O(1) for the same thing." No - Arraya only have the complexity O(1) for lookups - when it comes to inserts and deletions in average half of data in the array has to be moved/copied so complexity is O(N) for inserts/deletions in arrays. Arrays are (compared to linked lists) faster in random read but slower in inserting or removing elements. They are also faster for sequential reads where all elements starting from the first one have to be read when optimized by modern compiler. Normally you need the calculation startingAddress + index * sizeOfElement when you want to access the element in the Array with the given index. But if you have a loop which increases the index every cycle by one a modern compiler can get rid of that muplication by introduing a variable "offset" which is set to startingAdress before the loop and to which sizeOfElement is added after each cycle in the loop..
@BREDCLAN
@BREDCLAN 7 жыл бұрын
insert word association programming humor here
@oisnowy5368
@oisnowy5368 6 жыл бұрын
That could probably be done faster with a linked list than with an array.
@newvocabulary
@newvocabulary 6 жыл бұрын
THC IS IN MY P
@psjw12
@psjw12 7 жыл бұрын
Great video. The information was really useful and informative
@Ebonmourn
@Ebonmourn 6 жыл бұрын
Also if you decide to manage the memory with the linked list you could allocate a block of memory in which the linked list gets stored with new blocks getting allocated when space runs out. It would mean you get less cache misses and it would run a bit faster
@charlescox290
@charlescox290 6 жыл бұрын
"The compiler is being clever", which is why you don't optimize code to compare algorithm speeds. Also, this is just traversal. How about speed testing insertion in the middle, which is what linked lists excel at?
@Zebsy
@Zebsy 7 жыл бұрын
Dear me this is dodgy really. The choice between data structure should be made depending on what you're use case is. eg is random access by index needed? are inserts needed? is iteration needed? I found this video interesting in some ways but it fails to address the important points of how to choose a data structure
@DrewryPope
@DrewryPope 7 жыл бұрын
My favorite video in a bit. Very nice!
@panda4247
@panda4247 6 жыл бұрын
why is backwards array faster than array on Atari? I'd say that it comes down to the line for (i=0, i
@JimLeonard
@JimLeonard 6 жыл бұрын
Extremely disappointed this was a comparison of a linear scan of both data structures. What about arbitrary insertion and deletion? Those are some of the the use cases linked lists were invented to address (no pun intended).
@gasquakestudios
@gasquakestudios 6 жыл бұрын
Couldn't agree more. The entire video I couldn't stop thinking about how this comparison is unfair.
@JimLeonard
@JimLeonard 6 жыл бұрын
On top of that, something was seriously wrong with the compiler implementation on the Atari ST, that a linear forward scan of an array was somehow not faster than a linked list. And noting the speed of list/array creation -- something that is typically done only once -- was sort of pointless.
@Sonny_McMacsson
@Sonny_McMacsson 6 жыл бұрын
Arbitrary insertion and deletion of large structures (100's or 1000's of bytes) is going to be slow in an array. Then there's making a list out of items that may not be contiguously allocated, like a list of request structures for I/O coming from different threads that they allocate themselves, possibly on their own stacks. Of course, for the former, you can create an array of pointers to the big structures and move those pointers around the array if that suits you.
@Jimbo1920
@Jimbo1920 6 жыл бұрын
I found this disappointing too. It's billed as a comparison of data structures and ends up being a comparison of hardware architectures.When I was in school we used the concept of an abstract machine and everyone learned the pros and cons of arrays and lists for sequential access, random access, inserts, deletes and moves. If a linked list beats an array for sequential or random access, something is definitely screwed up.
@ReedHarston
@ReedHarston 5 жыл бұрын
While a definitely agree that the choice really comes down to what you need the structure for, I found this information importantant as well as it explains another aspect of choosing a structure that some people might not think about and isn’t considered as often as big O notation. I gaurentee you that if he had done the video we had expected then there would be people complaining in the comments saying that he didn’t consider how they run on different architectures. Besides, at 30 minutes this video was long enough as is without comparing all the different use cases (and the previous video did mention some of it anyway).
@MACIEJ454545
@MACIEJ454545 7 жыл бұрын
How about sorting? When you have big enough objects in an array/list wouldn't sorting a list be quicker when you can manipulate pointers instead of moving whole objects around in an array?
@schetefan24
@schetefan24 7 жыл бұрын
For BIG sructs this should be true, but first you must come up with an efficient in-place algorithm, where you need no random access. I'm not quite sure, whether you can do selection sort or sth comparable with a single linked list (this example should work with double linked lists), at quicksort this should be even harder...
@MACIEJ454545
@MACIEJ454545 7 жыл бұрын
skeliton11211 I was meant to add that to the question to be honest. Of course an array of pointers would be much quicker. When you have a warehouse full of containers you make a checklist where everything is and organise it and not move all the containers around.
@MACIEJ454545
@MACIEJ454545 7 жыл бұрын
Stefan Gräber I realised that later. sorting one way list might be problematic since you have to relink the neighbouring objects and insert the object in its new place. It results in a far more pointer manipulation than an array of pointers. My question was more about the "pure" version of a structure
@adisander
@adisander 7 жыл бұрын
I would imagine it depends on what you're sorting, and what sorting algorithm you use. First, on the data being sorted: Arrays have to swap the contents, so the time is dependent on the size of that content. Linked lists have to swap the pointers, so constant. These two are equal if the data being sorted is the size (or smaller) of a pointer, otherwise the linked list (I must assume, I should point out that I'm not an expert) will win out. As for how you sort: Insertion sort moves backwards along the array/list, and so is an outright no for linked lists (unless, of course, you use a doubly linked list, i.e. a linked list linked in both directions, which would have even greater memory use than a single linked list and would require two pointer swaps rather than just one for every data swap). Heap sort relies on random access, so would take a hit on linked lists. Merge sort and quick sort (properly implemented) work well on both arrays and linked lists. Radix sort I'm not sure about. On the one hand, from what I recall, what you're sorting is specifically the sort of thing that would benefit from linked lists only needing to swap pointers rather than data. On the other, I would also think these things to be things where the array would only contain pointers to them, for example strings (assuming you aren't accepting the restriction of all strings being the same maximum length, and probably all close to it (otherwise you're wasting lots of space)). Provided you keep in mind which sorting algorithms to use and which not, and that your array elements aren't unreasonably large, the difference is (probably) negligible: sorting an array/list from scratch is something you probably won't be doing that often, or with data large enough, for it to really matter which of the two you use, at least in comparison to other operations you're likely to use on the same data. Lastly, I suspect that if you're sorting an array or list, you're probably going to want to do other things to it that linked lists are better suited to, that is inserting and removing items. That bit however is purely subjective and I don't have anything to really back that up.
@angeldude101
@angeldude101 7 жыл бұрын
Quicksort can be done in place so there's no need to do large copies. Even still, moving small structures is still often faster than moving pointers in a linked list, again, because of the cache which can preform the move operations in bulk.
@jorgerochagualtieri
@jorgerochagualtieri 7 жыл бұрын
Really good stuff. Congrats.
@Clairvoire
@Clairvoire 6 жыл бұрын
If the Linked List stores it's data like an Array, it'll likely be about as fast. But in practice, Linked Lists will have insertions, deletions, concatenations, all of which fragment this data and create cache miss opportunities. Context matters in this case, which is why it's best to profile these things in the situation you need them in
@MrCOPYPASTE
@MrCOPYPASTE 5 жыл бұрын
MEMORY IS AN ARRAY!!!!!!!!!
@k1ngjulien_
@k1ngjulien_ 7 жыл бұрын
Wouldn't it have been faster without printing stuff on the screen inbetween? I know that many of my c and c++ programs run much quicker if you leave out the print statement.
@heyandy889
@heyandy889 6 жыл бұрын
that's a good point, writing to the screen can take some time. however both versions print to the screen, so they are subject to the same slowness. it's like if you had two people run a race through syrup - you can still tell who's faster than who.
@statphantom
@statphantom 6 жыл бұрын
yes however the percentage in speed difference is exactly the same as they are printing the same length (minus one or two chars) the same amount of times so it wont effect the comparison
@TheMaginor
@TheMaginor 6 жыл бұрын
They did the prints outside the timings, so it would not affect the recorded time.
@gwenynorisu6883
@gwenynorisu6883 5 жыл бұрын
If you're finding that printing half a line of low rez text to the console makes a significant difference to the loop speed of your otherwise ~850ms routine, even on an ST, then you've got bigger problems than whether or not to print said text. Like, realising that you're actually stuck in a runtime interpreter for a particularly shoddy variety of BASIC when you thought you were making compiled C programs. If they're routines that take closer to 850us, then, sure, maybe it'll be worth it. But the text only prints after the loop is done (which is how it knows how long said loop took) and before the start of the next one, so it's not going to affect the reported results any, and in most cases the difference between printing or not printing 100 half-lines of text will save you, ooh, maybe one video frame's worth of runtime?
@NickNackGus
@NickNackGus 7 жыл бұрын
Things become a bit more interesting for embedded real time systems - when finding a memory location to allocate takes too long, so you have to allocate all of your memory in advanced, or write your own allocation algorithm. This shouldn't make a significant difference for reading the array/linked list, but should affect creation time.
@markallen97
@markallen97 7 жыл бұрын
Makes me curious - on the Atari How would it behave if rather than indexing the array each time, you increment/decrement a pointer to walk along the array, avoiding the extra instructions to produce a memory address. My assumption would be better than any of the tests done, with the caveat that it only make sense if you do indeed want to walk the entire array for your problem. Also, is there a reason forward iteration was slightly faster than reverse on the Pi/iMac? My best guess would be cache prediction preferring to go forwards (which I've seen on console hardware before), but that's a hell of a grab-at-straws!
@Gordon972
@Gordon972 6 жыл бұрын
Steven: *uses VSCode* me: MAH BOIiii
@MisterFanwank
@MisterFanwank 6 жыл бұрын
Being a C programmer is no excuse for confusing or non-descriptive variable names. As much as possible variable names should explain the purpose of their data unambiguously. I recognize there are times when this is not easy, but with modern IDEs there is no justification for bad variable names. Don't ignore the power of your tools.
@abdullahnoman8618
@abdullahnoman8618 6 жыл бұрын
He is a computer scientist not an industrial programmer
@LudwigvanBeethoven2
@LudwigvanBeethoven2 6 жыл бұрын
Ok so i will name it "Pointer_that_Points_To_Next_Element_In_The_Array_List"
@sangramjitchakraborty7845
@sangramjitchakraborty7845 6 жыл бұрын
♫♪Ludwig van Beethoven♪♫ Or you could just name it nextpointer#
@CYBERBEAST21
@CYBERBEAST21 6 жыл бұрын
Butthurt.
@checka9029
@checka9029 5 жыл бұрын
He said "asleep" programmer, not a C programmer, so he had an excuse
@cthree87
@cthree87 6 жыл бұрын
if you used a while loop and pointer arithmetic to index into the array instead of using a for loop and accessing it by subscript would that change the performance of the array addition on the atari?
@joshualandry3160
@joshualandry3160 6 жыл бұрын
No because the compiler is going to optimize it away. This is the generic caution given to new CS students that i++ and i=i+1 are actually identical to the compiler. 99% of students don't absorb this, so there are many who will insist otherwise. Compilers these days are very good at finding optimizations.
@obvioustruth
@obvioustruth 5 жыл бұрын
Excellent video!!! I had Atari Falcon 030. It was extraordinary at its time. :)
@Zadster
@Zadster 7 жыл бұрын
Doogy rev, doogy rev. Gronda gronda Rangdo!
@philp4684
@philp4684 7 жыл бұрын
wopsim
@hoisinholdup
@hoisinholdup 7 жыл бұрын
VS Code master race reporting in
@CoituselPingo
@CoituselPingo 6 жыл бұрын
i think is C
@hoisinholdup
@hoisinholdup 6 жыл бұрын
Yes, but the text editor he was using to write it on the Mac was VS Code
@cidercreekranch
@cidercreekranch 2 жыл бұрын
For cached architectures, It would interesting to see the performance difference with the linked list version if you malloc'ed a single contiguous block of memory and then 'sub-allocate' the structures in the memory block.
@filker0
@filker0 4 жыл бұрын
I'm 2 years too late, but I think that there is one change to the linked list version that will speed it up on the cached processors. I assume that the elements of the linked list are added to the beginning as they are generated. As the summing loop walks the list, the memory address for each successive element is decreasing, making it more difficult for the CPU to predict and do speculative prefetch. If the newly allocated elements are appended, the general direction is toward higher addresses, which are easier to predict for the prefetch mechanism. I worked on a commercial C compiler for the Atari ST back in the 1980s, and benchmarks were important to the success of the product. Optimization strategies were weighted toward better performance on things like the Dhrystone with the knowledge that there was no cache.
@DrSteveBagley
@DrSteveBagley 4 жыл бұрын
Oooh, which C compiler was that?
@filker0
@filker0 4 жыл бұрын
@@DrSteveBagley mwc (Mark Williams C) for the Atari ST.
@clarkd1955
@clarkd1955 7 жыл бұрын
Commenting on an Atari isn't particularly relevant as modern computers generally use Intel chips that are very cache heavy. Many tests have been done on these computers that show that almost all optimizations are next to nothing compared to the cache effects. The bottom line as illustrated by many, is that arrays are almost always must faster than linked lists and therefore linked lists should only ever be used only in special cases. Arrays should always be considered the default. My first computer was a 8080 running at 1 megahertz (linked lists were probably ok on that computer) but that isn't very important to what works on modern computers. There is no argument, don't use linked lists.
@pleasedontwatchthese9593
@pleasedontwatchthese9593 6 жыл бұрын
To be fair that was not the point of the video. It was that testing was king.
@Anfros.
@Anfros. 6 жыл бұрын
Modern computers are not "generally" using intel chips. Intel chips are dominant for pc:s, but pc:s does not represent a majority of even consumer computer systems. As an example take the 3 big gaming consoles of the 7th generation (xbox360, wii and ps3). Between them they have over 260 million units sold, and none of them use intel chipsets. Furthermore most cellphones today are running on qualcom chipsets. That's not even mentioning the non-consumer market. There the atari test is absolutely relevent as many embedded chipsets do not have cache.
@satannstuff
@satannstuff 6 жыл бұрын
Pretty sure the majority are ARM chips. That's what you're going to find in just about anything that doesn't need a high performance CPU and in some things that do, like most smartphones or that Raspberry Pi for instance.
@gwenynorisu6883
@gwenynorisu6883 5 жыл бұрын
Maybe if you're using something dismal like an Atom or an embedded / microcontroller core based off a 386SX or what-have-you, but I don't think that was the point of the test. But even if it was, you can use the 68000 in the ST as a fairly prime example of such; embedded CPUs based on the 68k core (e.g. the Coldfire) were extremely popular up until a few years ago when ARMs and ATmegas started offering greater power and lower energy consumption alongside a lower price tag. So we already have the answer - sophisticated large instruction set, large cache chips prefer regular arrays, and that includes the higher end ARMs. Simpler, cacheless chips prefer linked lists, unless they also display unusual behaviour like being over 50% faster when traversing an array from the bottom up rather than top down.
@rlee0001
@rlee0001 7 жыл бұрын
You're original test is invalid. You unnecessarily used a different type of loop, with the loop used for iterating over the array doing a bunch of useless loop-maintaining arithmetic and comparisons that any competent programmer would have either removed or accounted for in the benchmark. Pro tip: summing the elements in an array is in fact faster since it reads half the amount of data from memory for each access (only the number being summed, and no pointer). Since the x86 has more registers, the optimizing compiler had better luck keeping your loop counter in a register and not in memory, where-as on the 68k, the loop counter was stored is a local variable (on the stack in memory), and you access it 3 times per iteration: once to compare it, once to read it for incrementing, and once to write it for incrementing. That's completely unfair to the array implementation.
@vytah
@vytah 6 жыл бұрын
68k has 8 data registers and 7 address registers, it should be enough for all three programs. The operation that slowed down the forward array iteration was probably comparing the index with the constant, which requires one more instruction than comparing it to zero.
@heyandy889
@heyandy889 6 жыл бұрын
it's not "unfair." the point is that the speed depends on the architecture's bottlenecks. imagine we were dealing with HDD instead of RAM. you would want a data structure that minimized reads from the disk because it's so much slower than memory. On the other hand with an SSD, you are not penalized so much for excessive reads and writes. Therefore you can use different data structures and algorithms without incurring the performance hit of the HDD.
@rlee0001
@rlee0001 6 жыл бұрын
That's not how Big O works.
@iljuro
@iljuro 6 жыл бұрын
The test is unfair and should be invalid, but not because the 68k is inferior, but since the for-loop use more complex mechanics and thus is significantly slower than the while-loop The ascending for-loop is the slowest because it uses loop-counter-mechanics and have to do an active end-of-loop-check every iteration. The descending for-loop is faster because it uses the very nice decrease-and-branch-all-in-one-instruction and use the decrease instruction's result to compare to zero instead of a separate compare instruction. But it still have the loop-counter-mechanics to deal with. The linked list is fastest because it just loads the next address and checks if it's zero.
@swiftfox3461
@swiftfox3461 7 жыл бұрын
Can you do Array Lists as well? I feel that often they are a good compromise, especially when you later realise that the list you thought would remain immutable actually needs an ability to add and delete its items later on.
@auxchar
@auxchar 6 жыл бұрын
Well, there's another problem here. Is your routine that measures time between each add operation pushing anything out of cache? I'd like to see the test run again on the iMac, but measuring the the total time to sum everything up, as opposed to the average time of each operation.
@mangethegamer
@mangethegamer 7 жыл бұрын
I don't buy this. Give us a pastebin of the source code so we can see if there are any other differences between the source code.
@saf271828
@saf271828 7 жыл бұрын
Even more, I want to see the assembly language listings of the compiled output. p[i].p is well known to be less efficient than using pointer arithmetic (even inside an array); however, some (but not all!) compilers can recognize when p[i] is used inside a loop and automatically substitute pointer-based code. While I expect this to occur for x86-based code considering gcc's/clang's relative maturity, I question whether the 68K compilers provide similar optimizations.
@recklessroges
@recklessroges 7 жыл бұрын
github or it didn't happen
@Mike-gs7eo
@Mike-gs7eo 7 жыл бұрын
"p[i].p is well known to be less efficient than using pointer arithmetic". My understanding is that p[i] is equivalent to *(p + i), why would it be any slower?
@Stephon9015
@Stephon9015 7 жыл бұрын
Mike Scott My guess is that although equivalent, the computer still need to compute p[i].p into *(p+i).
@saf271828
@saf271828 7 жыл бұрын
Mike Scott Because it involves a multiply and an addition, versus p++ which is just a simple addition. This becomes particularly visible with multiple dereferences to p[i].
@williamweller8100
@williamweller8100 6 жыл бұрын
Hahaha, good shit. I was really really concerned when the linked list turned out to be faster on the Atari, just because I figured an array must be faster. Huge sigh of relief with the results on the mac and pi; I'm not insane.
@D4no00
@D4no00 Жыл бұрын
Actually this is a very interesting video, many might have thought that rpi is just an intermediate device for running tests, but actually it is running on totally different architecture, arm64. I know for sure that all servers are migrating to arm64 nowadays as it is much more energy efficient, however those number of cycles are alarming, as clock speeds of those processors are not any higher than intel ones.
@mdmenzel
@mdmenzel 6 жыл бұрын
but when was the last optimizing compiler for the Atari ST made?
Multithreading Code - Computerphile
15:54
Computerphile
Рет қаралды 378 М.
Why You Should AVOID Linked Lists
14:12
ThePrimeTime
Рет қаралды 268 М.
New Gadgets! Bycycle 4.0 🚲 #shorts
00:14
BongBee Family
Рет қаралды 12 МЛН
Ну Лилит))) прода в онк: завидные котики
00:51
Where did Bytes Come From? - Computerphile
11:31
Computerphile
Рет қаралды 474 М.
Why Linked Lists vs Arrays isn’t a real choice
9:15
SimonDev
Рет қаралды 306 М.
This is NVIDIA’s new GPU
12:58
Linus Tech Tips
Рет қаралды 347 М.
Programming Loops vs Recursion - Computerphile
12:32
Computerphile
Рет қаралды 1,4 МЛН
Where GREP Came From - Computerphile
10:07
Computerphile
Рет қаралды 931 М.
Are Lists Evil? Creator Of C++ | Prime Reacts
14:30
ThePrimeTime
Рет қаралды 94 М.
Faster than Rust and C++: the PERFECT hash table
33:52
strager
Рет қаралды 511 М.
Python Hash Sets Explained & Demonstrated - Computerphile
18:39
Computerphile
Рет қаралды 105 М.
How Do Computers Remember?
19:32
Sebastian Lague
Рет қаралды 6 МЛН
Automata & Python - Computerphile
9:27
Computerphile
Рет қаралды 98 М.
New Gadgets! Bycycle 4.0 🚲 #shorts
00:14
BongBee Family
Рет қаралды 12 МЛН