Optimizing Loops In Go | Prime Reacts

  Рет қаралды 83,579

ThePrimeTime

ThePrimeTime

Күн бұрын

Пікірлер: 185
@abdia-lb2yx
@abdia-lb2yx 7 ай бұрын
I ain't gonna lie having an influencer style videos in the coding world just entertainment non educational helps me stay on track when i feel not motivated to study keep it up bro.
@Lewini
@Lewini 7 ай бұрын
Hard agree. Keeps my algorithms in check
@cyrilemeka6987
@cyrilemeka6987 7 ай бұрын
Agreed.
@isthatjake
@isthatjake Ай бұрын
It'll help with work, too
@DjoumyDjoums
@DjoumyDjoums 7 ай бұрын
My guess on why 3 is slower than 2 : index bound checks. Compilers can very often eliminate them when you loop on the actual length of the array, not a pre-cached numerical value.
@grifon97
@grifon97 7 ай бұрын
sounds like a very good guess to me!:d
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
@@chigozie123 Not true.
@Pipe0481
@Pipe0481 7 ай бұрын
@@chigozie123That's just completely false lol
@chigozie123
@chigozie123 7 ай бұрын
@@Pipe0481 I should clarify that I'm comparing it to what some C/C++ compilers do. If you look up optimizations done by the Go compiler, you will find that the only optimization even worth mentioning is the conversion of "zeroing out a slice", to a simple memclr instruction. In C, we get that for free using `memset`. Maybe also function inlining, but that's nothing given that in C, the `inline` keyword is free to use. That's just the tip of the iceberg. Compare that list to what GCC does, and you will see why it's difficult to claim that the go compiler is an optimizing compiler. The main reason why I said that go doesn't do optimizations is that in C, there is no reason why the second and third code snippets should not run at the same speed. The compiler should have optimized the len call out of the second, to make it pretty much the same as the third.
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
@@chigozie123 The for loop with the range clause is clearly more optimized by the compiler compared to standard for loop with the condition clause, as he has shown. I agree with the len call - it should have optimized it, but he measured with the time package, not benchmarked it, and did not show us standard deviations. In Go you can opt for no inline. I simply disagree with the statement that Go compiler doesn't do exhaustive optimizations, as long as they are worth doing and as long as they would not bloat the compile duration.
@ColinFox
@ColinFox 7 ай бұрын
Go uses symbol lifetimes to determine whether to allocate on the stack or on the heap. If the object survives after the function returns, it's on the heap. Pretty straightforward.
@susiebaka3388
@susiebaka3388 7 ай бұрын
"The more you write standard code, the more likely you're generally doing well." - ThePrimeagen
@samsqu4red
@samsqu4red 7 ай бұрын
Found the channel just few weeks ago. Coming from a background in Webflow, with only a grasp of HTML, CSS, and a bit of JavaScript, I see myself as a puppy. Recently, I've decided to take coding more seriously because it's interesting, and I also have a web app idea that I'm genuinely determined to bring to life. Though 90% of what the man is saying still flies right over my head, I feel like my understanding is expanding. I've never attended school, worked in-house with a team, or studied with guidance. It seems like I found a semblance of a mentor who's bombarding me with jargon I can't possibly comprehend, but ultimately is helping me see throw the fog in a way. Feels like little by little I'm starting to grasp how everything is connected.. weird.
@andygarfield6529
@andygarfield6529 7 ай бұрын
There’s all sorts of things he could have done here. Changing the order of the loops can give you wild improvements because it can change the proportion of cache hits. Also loop unrolling. Also using arrays in Go (which are a real thing) instead of slices.
@CjqNslXUcM
@CjqNslXUcM 7 ай бұрын
the loop order is correct, it goes through the innermost slice first.
@SaHaRaSquad
@SaHaRaSquad 7 ай бұрын
If you do something like manual loop unrolling always benchmark it, because most likely it will actually make your code slower by blocking much better automatic optimizations.
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
In my experience, array gives me worse performance than slice. After I checked that, instead of declaring a := [5]something, I always do: a := make([]something, 5, 5)
@Zullfix
@Zullfix 7 ай бұрын
Agreed. His single-array test could have been much faster if the locality of coordinates were stored contiguously.
@Yawhatnever
@Yawhatnever 7 ай бұрын
Loop unrolling might be one of those things where you end up fighting the compiler (would need to measure), but loop tiling might have a real chance of increasing cache hits
@HairyPixels
@HairyPixels 7 ай бұрын
The compiler writes better code than you but what if you're writing a compiler? BRAIN MELTING
@ninocraft1
@ninocraft1 7 ай бұрын
if you write your own compiler, its probably gonna be even slower xD
@Patterner
@Patterner 7 ай бұрын
just compile the compiler with the compiler. repeat. ... profit
@OkAtheistExplainVim
@OkAtheistExplainVim 7 ай бұрын
if the compiler compiles your compiler is it really your compiler?
@HairyPixels
@HairyPixels 7 ай бұрын
@@Patterner In Soviet Russia compiler compiles you.
@joranmulderij
@joranmulderij 7 ай бұрын
@@OkAtheistExplainVim If you wrote the compiler that wrote the compiler then yes
@jagc2206
@jagc2206 7 ай бұрын
There is a big distinctions between array an slices in go. This video uses slices, which are pointers to arrays. A slatic array will have a compile time length and allows the compiler to do a lot more optimization
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
My experience with performance is the opposite, so I stick with slices.
@eni4ever
@eni4ever 7 ай бұрын
blue hair no?
@dejangegic
@dejangegic 7 ай бұрын
Old video
@sparrow243
@sparrow243 7 ай бұрын
He's getting bald
@greendsnow
@greendsnow 7 ай бұрын
@@sparrow243 hahahahahah
@yobson
@yobson 7 ай бұрын
no
@XDarkGreyX
@XDarkGreyX 7 ай бұрын
Brain no?
@philipp04
@philipp04 7 ай бұрын
A year ago my prof had us write gaussian elimination, but the algorithm we had to implemeny was slightly different for each of us. I had to find the inverse of a matrix using Cholesky decomposition, which is designed to be fast for large symmetric matrices. I'll cut to the chase, the code I wrote was slow as hell and I couldn't figure out what's going on, though I could tell that I had tons and tons of cache misses. The professor though could tell what I did wrong immediately: the matrix was stored as a 1d array row by row, but I was traversing it columnwise. I transposed the matrix and swapped all indices in my code, and the total runtime decreased by a factor of 20 (on 8000x8000 matrices). Moral of the story: go through your data sequentially, it makes a huge difference.
@samhughes1747
@samhughes1747 7 ай бұрын
EDIT: this is wrong. Only the comprehension-inlining bit made the boat in 3.12. Hey. Python doesn’t have a loop problem anymore. There’s still the “it’s python” problem, but Python 3.12 introduced loop unrolling,
@ruroruro
@ruroruro 7 ай бұрын
source?
@samhughes1747
@samhughes1747 7 ай бұрын
​@@ruroruro, oof. Yeah, that didn't get approved for 3.12, and I got that mixed up with the comprehension-function inlining that did get released in 3.12.
@g.c955
@g.c955 7 ай бұрын
he could to a follow up by profiling the code to see where CPU and memory are spent, then analyze the assembly output of the code to understand why.
@rt1517
@rt1517 7 ай бұрын
The video author should have disassembled the resulting binary to see if he would be able to figure out what the hell is going on.
@swedishpsychopath8795
@swedishpsychopath8795 7 ай бұрын
I'm planning to throw a SIMULA-67 party in 3 years to celebrate its 60 years as the first OOP language. Hate it or love it the norwegian SIMULA-67 invented Object Oriented Programming (OOP) with Classes, Inheritance, Encapsulation, overloading, overriding and all the other good stuff. It is crazy to think that this paradighm shift in programming is 60 (SIXTY) years already!! Press LIKE if you would like to get an early invitation!
@0dsteel
@0dsteel 7 ай бұрын
Slices are just references to an array with set length and capacity. They are reallocated if the values wouldn't fit said capacity. On the other hand, arrays *may* be stack allocated, if the compiler is certain they don't escape the given scope and, well, fit in the stack frame. (this is not legal advice, please ask your friendly neighbourhood debugger)
@miroslavhoudek7085
@miroslavhoudek7085 6 ай бұрын
A guy watches a guy who doesn't know what's going on, while also not knowing what's going on, while people in chat suggest what might be going on, but they can't be all right. Good fun overall, would recommend, but don't know why.
@defeqel6537
@defeqel6537 7 ай бұрын
Optimization videos like this, I'd always want to know the size of the data, and the theoretical memory bandwidth, etc. to have some actual context. No point trying to optimize things if you are already limited by the hardware (or the optimization would need to be radically different at least).
@shapelessed
@shapelessed 7 ай бұрын
Omg GO... I only had a week with that language when I tried it but I love the errors as values concept. To the point that I started using them all over my JS.
@0dsteel
@0dsteel 7 ай бұрын
haters gonna hate, but you mention their uncaught runtime exceptions and they'll be in shambles :)
@max3446
@max3446 7 ай бұрын
I like errors as values but Go would be so much nicer if they had monadic operations on errors (or at the very least, some sort of shorthand for `if err != nil { return err }`).
@debemdeboas
@debemdeboas 7 ай бұрын
reading contiguous memory is generally faster because the compiler may not have to deal with paging and segmentation for every index
@Satook
@Satook 7 ай бұрын
The 2nd form in part 1 is pretty easy for a compiler to optimise with constant propagation. The x and y don’t change so the sub-expressions can be moved up out of the innermost loop.
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
The difference of cca. 250 ms for about one billion elements is to be expected if the additional GC sweeps are triggered. He should have insulated the three scenarios and benchmarked them separately, with the SD, not consecutively in one test. It seems that an additional GC sweep was turned on sometime after the second test. Also, package "time", in my experience, may interfere somehow with either GC or the execution. There are better tools for benchmarking, although I do use "time" to give me a sense.
@freedomextremist7921
@freedomextremist7921 7 ай бұрын
It's a slice not a list. And Go does also have arrays which are contiguous regions in memory.
@Leeway4434
@Leeway4434 7 ай бұрын
yes in go arrays are stack allocated and are created with a fixed length. [5]int is an array of length 5 on the stack and []int{1,2,3,4,5} is a slice of length 5 on the heap, pointer to it on the stack. I believe there is a size limit for stack allocated arrays at some point the compiler puts them on the heap* to avoid stack overflows
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
@@Leeway4434 All true.
@Leeway4434
@Leeway4434 7 ай бұрын
@@Nenad_bZmaj I actually looked into it some more and it is not guaranteed for slices to be put on the heap. The compiler has heuristics for whether or not something should end up on the heap or the stack. Large values usually end up on the heap. Small values (including slices of sufficiently small constant length) can often be allocated on the stack.
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
@@Leeway4434 Thanks. I didn't know that.
@rumplstiltztinkerstein
@rumplstiltztinkerstein 7 ай бұрын
The notion of "code optimization" is mostly popular because of Javascript. We invented compilers so we don't have to worry about looking at everything the cpu is doing in every clock cycle.
@yugioh8810
@yugioh8810 7 ай бұрын
using only index with range is usually faster especially if the array is long or has heavy structs since it eliminates the copy operation to the variable in the for loop. for i := range x is faster than for _, val := range x
@miracleinnocent2649
@miracleinnocent2649 7 ай бұрын
Go's allocations all depend on its escape analysis sometimes even interfaces can be stack allocated and I think the highest bytes for stack allocation is 32kb
@michaelrobb9542
@michaelrobb9542 7 ай бұрын
You can't put Sys calls in for loops. Calls like time.Now are calls to the OS, this will produce inconsistent results on each run.
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
True. There is a bigger for loop in this particular test, encompassing three tests.
@MattBolt35
@MattBolt35 7 ай бұрын
There are ways to pass link options to go’s compiler to show which values escape. Slices are typically heap allocated. You can stack allocate an array if you use a finite length (as he mentions).
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
But only small sized arrays.
@AlexnderMisuno
@AlexnderMisuno 7 ай бұрын
Once I implemented an Mineswepper game in go using "normal" approach to store data about field, cell state etc. And afterwards I did the same but using bit-wise operations (just for fun). To my surprise, not only the performance was the same, but also memory consumption
@georgehelyar
@georgehelyar 7 ай бұрын
I'm not a go expert but it sounds like the reason 3 is slower than 2 and reverse is faster than forwards is because of bounds checking.
@defeqel6537
@defeqel6537 7 ай бұрын
which I think is something go should probably be able to optimize away in cases like this
@georgehelyar
@georgehelyar 7 ай бұрын
​​​​​​@@defeqel6537 is it a jagged array though? If len(a[0]) can be different to len(a[1]) then it could potentially go out of bounds, but again I'm not a go expert. C#, for example, has multi-dimensional arrays of contiguous memory, indexed with a[x,y,z] where the lengths of each dimension are guaranteed, or you could make a jagged array by making an array of arrays where they aren't, and then you'd have to use a[x][y][z]. Reverse being faster in several languages always seemed like it could be optimised though. Reverse is faster because it always knows that the lower bound is 0 regardless of the array so if it starts inside the bounds and then never goes below 0 then it doesn't have to check bounds for each iteration. And of course dynamic arrays could change size at runtime Rust probably does bounds checking at compile time
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
@@defeqel6537 I believe that, too. The very typical difference of cca 200 ms for arrays of this size tells me that probably an additional GC sweep was triggered.
@haraldfielker4635
@haraldfielker4635 7 ай бұрын
I think the range works better with CPU caches (in that szenario).
@Gusto20000
@Gusto20000 7 ай бұрын
I’ve done the same thing a week ago after 1.22 released, benchmarking with testing.B and measured exactly the same result for each case… So I decided simple range is good enough. Bill Kennedy talks about “mechanical sympathy” in “Mastering Go” about this exact problem - it’s all about low level caching and prefetching by cpu and compiler
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
Regarding some comments: I did try arrays instead of slices of length known at the compile time, but performance was always slightly worse. In my experience, Go programmers can forget the arrays, since slices seem to be very well optimized. And, I like using slices when I have to do something differently in the last iteration or in distinct subranges. So, instead of for i := 0; i < k; i++ {...}; for i := k; i < len(a); i++{...}, I can write: b := a[: k]; c := a [k:]; for _, el := range b {...}; for _, el := range c {...}.
@flga_
@flga_ 7 ай бұрын
I'm not sure what you did, but when switching to arrays you need to be careful to avoid copies. There is no inherent reason for arrays to perform slower, other than that.
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
@@flga_ I passed them as reference. Very small arrays, for instance, small hash tables with short slices of bytes as elements, or arrays of ints. But passing pointers to them apparently always triggers GC sweep. When I turn them to slices, the execution is faster overall for the duration of one GC sweep, which on my comp., for several programs I wrote, apparently (inferred) takes about 200 ms on average.
@flga_
@flga_ 7 ай бұрын
@@Nenad_bZmaj I don't think you were measuring things correctly then, I highly doubt gc is a factor. Passing a pointer to an array will require a nil check, that's the only difference and can be done by you, upfront, instead of by the compiler per iteration. Put in other terms, using arrays is asymptotically equivalent to using a 1d slice. Tho for small sizes the distinction is unlikely to be meaningful as the chance of it all being cached in either scenario is high.
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
Small arrays can be on stack, but slices are always allocated on the heap (in Go).
@RenThraysk
@RenThraysk 7 ай бұрын
His problem is that Go compiler doesn't perform loop unrolling, this is a deliberate decision by the go devs. If you manually loop unroll he'd get an improvement.
@Jarikraider
@Jarikraider 7 ай бұрын
Personally, I thought a 100ms improvement on 1 billion elements was pretty good.
@MethodOverRide
@MethodOverRide 7 ай бұрын
Write smooth easy code? Sounds like something that genius Tom would do with JDSL.
@sotiriosmantziaris6097
@sotiriosmantziaris6097 7 ай бұрын
The compiler decides based on escape analysis where to put things.
@joyboricua3721
@joyboricua3721 7 ай бұрын
@6:22 somewhere a hacker says "Oh yes we could"
@awesomedavid2012
@awesomedavid2012 7 ай бұрын
My assumption is that if you create a slice with a fixed compile time size and it doesn't outlive the function, Go will simply implement it as an array on the stack. The compiler knows its size and it knows how long it lives. Notice that even if you call append, it constructs a new slice. So the original could still be an array on the stack in implementation as long as you don't pass it outside of the function.
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
Most arrays and ALL slices are allocated to the heap in Go.
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
Also, when you call append, Go runtime doesn't necessarily create a copied array of twice the capacity. If there is enough capacity in the original array, appending a slice that points to it doesn't create new array. (On the side, a new slice is not declared with reasignment: a = append(a, el) .)
@what1heh3ck
@what1heh3ck 7 ай бұрын
@@Nenad_bZmajI thought append always create a new slice (still point to the original array if it has enough capacity), am I missing something here?
@awesomedavid2012
@awesomedavid2012 7 ай бұрын
@@Nenad_bZmaj I agree about the append thing. Though I still think the compiler could theoretically optimize a slice to be on the stack if you never call append, give it a constant time size, and don't return it. When you say it's always on the heap in go, do you mean that is how the main Go compiler implements it or is that part of the Go standard?
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
@@awesomedavid2012 I remember that this was the standard, although not in the language specification, since this doesn' t belong there, but this behaviour may change in future without changing the language specs. Somebody just told me that he looked at this more closely and that the current compiler does keep smaller slices that have properties you describe on the stack.
@OnFireByte
@OnFireByte 7 ай бұрын
really interesting that flatten slice is a bit slower than actual 3d slice
@CjqNslXUcM
@CjqNslXUcM 7 ай бұрын
the slice is a pointer with a type and a len, so every time you move from the innermost slice it jumps to a cache miss. the flat slice is one fat blob of memory. no cache misses.
@OnFireByte
@OnFireByte 7 ай бұрын
@@CjqNslXUcM that’s why im surprised that flatten one is slower
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
Something I noticed: you'll get slightly worse performance if you write involved expressions instead of simple indexes. Hint: assign such expression to a variable first and use that as an index. and let compiler optimize it. Simpler expressions are better put as an index directly, e.g. a[i+3], but write: ind := b[i]; do something with a[ind], rather than: do something with a[b[i]]. Not a big deal, though, since difference is marginal.
@metropolis10
@metropolis10 7 ай бұрын
I liked that he said the 100ms improvement wasn't worth it. How often do people admit the technically faster thing isn't worth the headaches.
@blenderpanzi
@blenderpanzi 7 ай бұрын
I'm at 1:53 - I think the 2nd one could be the fastest, because an intelligent compiler could inline len() and eliminate bound checks in that case. It can't eliminate bound checks in the last version. Let's see if my intuition is right.
@RandomGeometryDashStuff
@RandomGeometryDashStuff 7 ай бұрын
08:39 "a.length = 0" doesn't create new object so less stuff to gc
@adambright5416
@adambright5416 6 ай бұрын
The fact that he used time in one function instead of builtin benchmarks is just... Wow
@RandomGeometryDashStuff
@RandomGeometryDashStuff 7 ай бұрын
08:55 "a[a.length]=1" is slower than "a.push(1)" in v8
@rosehogenson1398
@rosehogenson1398 7 ай бұрын
go has escape analysis which can decide to place slices on the stack if it can guarantee they don't escape the current scope
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
True for some types of variables, not sure if this is true for slices. I think they are always on the heap.
@Bluesourboy
@Bluesourboy 7 ай бұрын
These complications from loops is why I believe C/C++ will be around FOR a WHILE longer.
@reactjs1900
@reactjs1900 7 ай бұрын
I think slices in go are like vectors and stored in heaps
@bbrodriges
@bbrodriges 7 ай бұрын
Slices can be stack allocated. If fact they are by default
@metaltyphoon
@metaltyphoon 7 ай бұрын
Not they r not. The stack type, which is a pointer + length, will be stack allocated but the data won’t be in the stack.
@antoniusupov
@antoniusupov 7 ай бұрын
All pointer values in go will go to heap. Go check out the runtime package in go and see what is inside. Only objects of fixed length live on stack, in go even strings contain pointers in them and go to heap.
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
The so called slice-header will be on stack (basically 3 numbers). It always points to the memory allocation on the heap.
@lukekurlandski7653
@lukekurlandski7653 7 ай бұрын
These are dynamic arrays, right? I feel like you should be able to call a .contiguous() function or something that moves all of the elements of the tensor into a contiguous arrangement. Then again, I’m guessing there is some other data structure in Go that is closer to a traditional array or something…?
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
Yes, array : a := [k]something, as opposed to slice- a:= make ([]something, k). But in my experience, slices perform better.
@huge_letters
@huge_letters 7 ай бұрын
wouldn't it be faster to access an underlying list at each loop layer? instead of loop x loop y loop z t += arr[x][y][z] you do loop x xa = arr[x] loop y ya = xa[y] loop z t += ya[z] that way you do access on x and y only when needed and not every iteration? If you're saying it's likely memory access that's the issue
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
No difference. You still have to access the element of the array via xa. xa and arr[x] are both just pointers. And the array is one same. The compiler will transform your code into the one in the video, anyway, because xa and ya are just transient variables of slice (pointer) type.
@huge_letters
@huge_letters 7 ай бұрын
@@Nenad_bZmaj I mean that in option 1 you do access by x x*y*z times while in option 2 you do it only x times. Same for y - y*z times vs y times
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
@@huge_letters Oh, I wish. But simply not possible. No matter how you write it, the array contains x*y*z elements, and you need to access them all.
@k98killer
@k98killer 7 ай бұрын
Go will store slices in stack memory if they are small enough (less than 2^16 bytes). Allegedly.
@masterchief1520
@masterchief1520 2 ай бұрын
Thought it all depended on escape analysis?
@k98killer
@k98killer 2 ай бұрын
@@masterchief1520 I think it is a mix, but it has been a while since I last looked into this, so idk for sure.
@FunkyELF
@FunkyELF 7 ай бұрын
The python range() is not like ranges in other languages. Python range just gives back numbers, that's it. I think go's range is more equivalent to general iteration in Python and the "in" keyword... `for thing in things: ...`
@davidzwitser
@davidzwitser 7 ай бұрын
Meanwhile in APL languages you just write ‘matrix + number’ and it’s fully optimised
@justbrad_v3906
@justbrad_v3906 7 ай бұрын
yeah, that sounds like buffer overflow hell lol
@arkie87
@arkie87 7 ай бұрын
numba allows you to write fast python loops by compiling using llvm also, am I the only one who pronounced numpy in their head as num-pee for the longest time
@GaborGyebnar
@GaborGyebnar 7 ай бұрын
Case 5 is fastest because the code is buggy: y is never used. It's just reading the same row multiple times, cache hits make it fast but the result is incorrect.
@RandomGeometryDashStuff
@RandomGeometryDashStuff 7 ай бұрын
09:19 "while" is usually worse enemy
@Arwahanoth
@Arwahanoth 7 ай бұрын
I guess range statement avoid bound checking.
@someman7
@someman7 7 ай бұрын
Whoa, his parentheses are different colors. It only took us what, 30 years to see "such" progress in syntax highlighting?
@Zullfix
@Zullfix 7 ай бұрын
Rather than trying to optimize iterating, it sounds like his array needs to be split into partitions (or chunks if you play minecraft) to allow for multithreading his calculations.
@ErazerPT
@ErazerPT 7 ай бұрын
Not everything can can be parallelized. It all comes down to whether or not your next step depends on the results of the previous step. If no, you can, if yes, you can't. Simple example, a loop that steps through a vector adding the previous element to the current one in place. You can't parallelize it because you need the previous step's result before you execute the next.
@Zullfix
@Zullfix 7 ай бұрын
@@ErazerPT You don't parallelize by computing steps out of order, you parallelize by partitioning your space into chunks and computing each chunk on a separate thread, then combine the results. Obviously nothing is that simple, but at a surface level this is *the* technique used by professionals to create multi-threaded physics simulations and it very likely can be applied to whatever the guy in the video is trying to compute.
@ErazerPT
@ErazerPT 7 ай бұрын
@@Zullfix Read the example i gave you. It CAN'T be parallelized. You CAN'T partition things if ANY of the subsequent operations depend on the results of its predecessors. As for physics simulations, be it rigid body dynamics or fluid, search around and you'll find a simple "some parts can, some parts can't" answer. Think like this, in a perfectly inelastic collision of billiard balls in a perfect line, only the cue ball and the last ball will move. But the last ball can't know it will move before the "one before last" knows it should try to move. And the one before last won't know... etc. Things like for example "grass simulation" can be done IF you take some "artistic license", ie, for example assuming wind interacts with grass but grass blades don't interact with each other. THAT is when you can partition things, because wind is a common entity and each blade of grass does it's own thing.
@atijohn8135
@atijohn8135 7 ай бұрын
@@ErazerPT for your example, you could still optimize it by adding up parts of the array in parallel, then for each part add the final element of the previous one. this would be faster for large enough arrays, since you can further parallelize the subsequent block adds no problem. many smaller dependencies like that can be eliminated by experienced parallel programmers, it's really more common than you seem to think, you can't just say that you can't parallelize an algorithm just because "not everything can't be parallelized"
@ErazerPT
@ErazerPT 7 ай бұрын
@@atijohn8135 Jesus Mary, you people really don't give up... YOU CAN'T parallelize that example (and be more efficient). THINK. If one thread is adding a[0] to a[1], another CAN'T add a[1] to a[2] because it DOESN'T know what a[1]'s final value will be before the first thread finishes. But let's say we ignore that, and have a cpu with unlimited (real) threads. So, one does a[0]+a[1],while another does a[1]+a[2], etc until a[n-1]+a[n]. When they all finished, you now have to run another batch, look at differentials from the previous ops, and do it all over again for a[1] to a[n]. But... it has a "carry effect", so, you have to do it again for a[2] to a[n], then for each element upto a[n-1]+a[n]. Congratulations, you just parallelized something and ended up being less efficient than a single thread, because you still had to run n-1 steps, do more work per step and waste more memory because you had to keep track of both old values and new to calculate differentials. Great job.
@martinpata2899
@martinpata2899 7 ай бұрын
I like the idea but he should be using go benchmark for this tests
@dejangegic
@dejangegic 7 ай бұрын
I guess Go really is K.I.S.S. from the bottom-up
@ninocraft1
@ninocraft1 7 ай бұрын
S.S.I.K. ?
@AlexanderNecheff
@AlexanderNecheff 7 ай бұрын
I used to believe the compiler knew best and especially that any bug in the code was my fault, never the compiler's fault: then I started having to use the Intel compilers... And oh boy howdy - some of the worst compilers ever. Poor standards compliance, lots of weird bugs in both the front and back end, horrible diagnostics to boot. Still generally think the compiler usually knows best, but now I have an disclaimer associated with that rule of thumb.
@mannycalavera121
@mannycalavera121 7 ай бұрын
We Gorupies now
@yoloswaggins2161
@yoloswaggins2161 7 ай бұрын
Can't you just disassemble the binary and look at what they're actually doing?
@GermanClaus
@GermanClaus 7 ай бұрын
Try out some Kotlin ;)
@alexpyattaev
@alexpyattaev 7 ай бұрын
1 billion elements... The problem is likely memory bound. Make use of smaller data type, it will run faster.
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
Yeah. Also the ratio of used to total memory influences how frequent GC will run. (kinda; it is more complicated). GC sweep for this size typically takes 200 ms (on my old comp with old single core Athlon CPU).
@alexpyattaev
@alexpyattaev 7 ай бұрын
@@Nenad_bZmajGC does not traverse an array of integers. As far as GC is concerned, it is one "leaf" object in the object graph, so it does not get traversed, no matter how big it is. Now if it contained other objects like strings the GC would have to walk it.
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
@@alexpyattaev I am not talking about GC traversing an array, but about the GC mark-and-sweep cycle occurring during the execution at some time points, at which times "the world is frozen". The way he structured this test - you don't know when the GC was active, so benchmarking three test-cases consecutively in one outer loop is not going to be informative for comparisons... I mentioned size dependency, because the program runs longer on larger arrays and GC runs more times. I see that in my programs. Although these programs involve quite complex data processing, the mere size of input array dictates how many times GC will run or for how long in total it freezes the world. When my input slices point to array of 50 Mbytes, one GC sweep takes about 50 ms on average, and when the array is 250 Mb - 250 ms. (on my very old comp.) I assume that when each innermost slice is done its array can be marked for sweeping since it is not going to be used anymore, and so on upwards.
@alexpyattaev
@alexpyattaev 7 ай бұрын
@@Nenad_bZmajOh I see, with more memory allocated in general it just runs GC more often. That is crazy, so the more RAM you use the slower your code runs (as it does more pauses)? Brrr....
@Nenad_bZmaj
@Nenad_bZmaj 7 ай бұрын
@@alexpyattaev It is highly optimized, though, so it uses the least number of cycles possible for the amount of garbage it needs to collect. Plus, Go gives you options for GC so you can optimize it for your specific program during the build. If your program approaches the memory limit, Go GC will take that into account when calculating how often must sweep, and adjust its cycles so you don''t run out of memory. You can opt out of this behavior. When you have enough memory, GC doesn't take a large fraction of total time even if you have a lot of garbage. If you have a miltithreaded CPU, then you almost won't notice the GC.
@mohamedaityoussef9965
@mohamedaityoussef9965 7 ай бұрын
Blue hair gone nooooooo
@lancemarchetti8673
@lancemarchetti8673 7 ай бұрын
Nice
@dixztube
@dixztube 7 ай бұрын
Man the new react next is sucks! I just popped out an htmx and go project. Silky smooth, fast and sexy. Been bumming around this react nonsense for two months now after getting decent at the last version of react
@jony1710
@jony1710 7 ай бұрын
ITT nobody knows what the stack is
@gavipk
@gavipk 6 ай бұрын
solution: c++
@ofekshochat9920
@ofekshochat9920 7 ай бұрын
arr[a][b][c] = arr[a * (second_dim + third_dim) + b * third_dim + c]. this shouldn't have hops...
@leeroyjenkins0
@leeroyjenkins0 3 ай бұрын
They're slices so you can append to them and thus they can move. They're not contiguous memory. If you want contiguous memory you use fixed-size arrays.
@bloomingbridges
@bloomingbridges 7 ай бұрын
🐼
@nevokrien95
@nevokrien95 7 ай бұрын
Pandas is slow use numpy
@BlitzDjingga
@BlitzDjingga 7 ай бұрын
This is really shocking result, but I do believe the second one is better approach most of the time, while the third one is super optimized that it is so tiring to code and maintain. To bad the video does not show memory performance for this, I believe what is happening is that the second and third one operate on HEAP does make the performance slower. While the first one is lucky enough not to encounter memory overflow. thing is when you do `range` it make copy of the object, albeit 3D array maybe the host machine having ram big enough to account for all of that iterating. Will be more educational if can show the garbage collection and memory / space performance for each of the method.
@ryankimeu9013
@ryankimeu9013 7 ай бұрын
i am an arch user
@uzbekistanplaystaion4BIOScrek
@uzbekistanplaystaion4BIOScrek 7 ай бұрын
that's so poggers
@RT-.
@RT-. 7 ай бұрын
Here's your trophy
@cyrusturner6111
@cyrusturner6111 11 күн бұрын
,yamv
@pyaehtetaung
@pyaehtetaung 7 ай бұрын
Black?
@deathzombee
@deathzombee 7 ай бұрын
First
@adjbutler
@adjbutler 7 ай бұрын
But Python is strongly typed and JS is weakly typed... why you hate on Python so much? It is not as slow as it used to be....
@svetlinzarev3453
@svetlinzarev3453 7 ай бұрын
Golang is such a a crappy language
@spartan_j117
@spartan_j117 7 ай бұрын
Hahahaha Prime! So you DO reupload your old videos! Now that's lame...
@LengCPP
@LengCPP 7 ай бұрын
admit it you watch your own videos
@moneypennysloverboy
@moneypennysloverboy 7 ай бұрын
react equals pain
@dickheadrecs
@dickheadrecs 7 ай бұрын
uh excuse me i prefer my memory malloc carte
Golang is BAD for SMART PEOPLE
27:25
ThePrimeTime
Рет қаралды 268 М.
From Svelte to Go and HTMX
24:04
ThePrimeTime
Рет қаралды 156 М.
АЗАРТНИК 4 |СЕЗОН 3 Серия
30:50
Inter Production
Рет қаралды 999 М.
CloudFlare - Trie Hard - Big Savings On Cloud
26:04
ThePrimeTime
Рет қаралды 57 М.
New Go Billion Row Challenge w/ Great Optimizations | Prime Reacts
39:42
Flux Keyboard - Flux Keyboard Review ( features, cost, price)
3:08
Creator of Go on Software Complexity | Rob Pike | Prime Reacts
42:45
Function Iterators might just change the way we write loops in Go
11:35
err != nil Is GOOD? (And Why)
7:19
ThePrimeTime
Рет қаралды 91 М.
Why CoPilot Is Making Programmers Worse
21:31
ThePrimeTime
Рет қаралды 103 М.
Linus On C vs Rust Linux Problems
34:51
ThePrimeTime
Рет қаралды 141 М.
How GO Was Created - Less Is More | Prime Reacts
28:15
ThePrimeTime
Рет қаралды 138 М.
32 Reasons WHY TS IS BETTER Than Go
1:09:29
ThePrimeTime
Рет қаралды 248 М.