Just Buy More Cores (1min to 1sec no optimization)

  Рет қаралды 29,277

Tsoding Daily

Tsoding Daily

Күн бұрын

Previous Episodes: Previous Episodes: • Data Mining in C
References:
- Source Code: github.com/tsoding/data-minin...
- Less is More: Parameter-Free Text Classification with Gzip - arxiv.org/abs/2212.09410
Support:
- BTC: bc1qj820dmeazpeq5pjn89mlh9lhws7ghs9v34x9v9
- Servers: zap-hosting.com/en/shop/donat...
Chapters:
- 00:00:00 - Announcement
- 00:00:41 - Intro
- 00:03:44 - Hosting
- 00:04:33 - Why is the load so high?
- 00:06:33 - Parallelization Idea
- 00:10:50 - Refactoring klassify_samples()
- 00:12:47 - Getting the amount of cores
- 00:19:44 - Splitting the Samples
- 00:41:49 - Learning Math
- 00:44:52 - Atomics
- 00:46:26 - Memory Management
- 01:00:39 - Atomics again
- 01:07:05 - Predicting the Class
- 01:12:19 - Testing on urmom
- 01:15:42 - REPL
- 01:36:40 - Footnote

Пікірлер: 77
@nulligor
@nulligor 4 ай бұрын
The most underrated programmer on earth (right after Terry Davis). Shoutout fr fr!
@bebre_2288
@bebre_2288 4 ай бұрын
Where p in your nickname?
@nulligor
@nulligor 4 ай бұрын
@@bebre_2288 This is not even my final form.
@rubyciide5542
@rubyciide5542 4 ай бұрын
Terry daviS Over DosING
@dkaaakd
@dkaaakd 4 ай бұрын
In order to split workload more evenly, you can increment chunk size for first N % T threads. This way number of samples per thread will differ by at most 1 (instead of at most T - 1)
@ryan-heath
@ryan-heath 4 ай бұрын
I was looking for this comment. Splitting with remainder: 9: 3,3,3 10: 3,3,4 11: 3,3,5 12: 4,4,4 Smooth remainder more evenly: 9: 3,3,3 10: 4,4,2 11: 4,4,3 12: 4,4,4 The effect is even greater with more processors.
@tekno679
@tekno679 4 ай бұрын
Or you could just do a ceil division. so `n_chunks = ceildiv(n_tasks, n_cores)`.
@nephew_tom
@nephew_tom 4 ай бұрын
00:41:49 - Learning Math I would say that what Tsoding said is the right way to learn math, physics and programming (and probably, anything...). I took lots of high math and physics courses in engineering University, but since didn't care much too me (just to pass the exams), really didn't learn that much. Learning by something that cares you a lot, improves by a huge amount the learning.
@Kul3Kow101
@Kul3Kow101 4 ай бұрын
First of all, I absolutely love your content. recreational programming ftw! When splitting the samples, my first thought was to distribute the remainder among the first few chunks. So if rem=1, the first chunk gets an extra sample. if rem=2, the first two chunks get an extra sample. That way the distribution is more even. You can also express this in an elegant way: when iterating over the range 0..nprocs and spinning up the threads, you can take the final chunk size to be (chunk_size + (i
@berndeckenfels
@berndeckenfels 4 ай бұрын
10:30 not sure if you meant this, but There is no guarantee that the first sample in a partition will show up in the k-first merged samples. (But you can abort the merger after k)
@anon_y_mousse
@anon_y_mousse 4 ай бұрын
Just so you know for the future, htop has a help screen that you can bring up by pressing F1, C11 also added threads, and Pascal does have pointers. If you meant that it doesn't have untyped pointers, like C has the void pointer, well, it does have those too. For general types, foo: type; bar: ^type; bar = @foo; bar^ = value; and there's the generic `pointer` type. Pascal is actually why my language has a with clause and uses := for assignment in conditional headers. So no confusing = and == in an if, you can assign or compare with impunity.
@pedropesserl
@pedropesserl 4 ай бұрын
he said pascal didn't have pointers *originally*
@anon_y_mousse
@anon_y_mousse 4 ай бұрын
@@pedropesserl Okay, except it always had them.
@pedropesserl
@pedropesserl 4 ай бұрын
@@anon_y_mousse oh
@cobbcoding
@cobbcoding 4 ай бұрын
im surprised urmom could handle 8 cores at once ngl
@AntonioNoack
@AntonioNoack 4 ай бұрын
Usually the nodes with lots of memory w/could be called urmom though. (they're called fat)
@Blubb3rbub
@Blubb3rbub 4 ай бұрын
Would also be a great problem to dabble into OpenMP parallelization. To get more balanced chunks you can use `thread_index*(n/thread_count) + min(thread_index, N % thread_count)` as start in indices and `n/thread_count + (thread_index < n%thread_count ? 1 : 0)`. This will add 1 extra element to the first few chunks, so you don't end up waiting for a single thread for larger thread_counts because of work imbalance. Another, even easier, approach is to just increment by the thread_count instead of 1 and offset the start indices by the thread_index. But then you don't have continuous chunks, which might be slower?
@AntonioNoack
@AntonioNoack 4 ай бұрын
If you have less than 2^64 items * cores, you can also use i0 = (threadIndex * numItems / numThreads) i1 = ((threadIndex+1) * numItems / numThreads) That gets rid of the error prone ifs.
@Blubb3rbub
@Blubb3rbub 4 ай бұрын
@@AntonioNoack Ah true! it puts the extra elements on the later cores, but is a lot easier to calculate. Nice! Thanks for sharing.
@re_detach
@re_detach 4 ай бұрын
This topic is so fascinating. You are literally showing the relationship between compression and learning/intelligence/pattern detection in regards to humans and AI
@diegorocha2186
@diegorocha2186 4 ай бұрын
Pretty impressive how you achieve amazing stuff with code that's both readable and simple!!!
@sago27
@sago27 4 ай бұрын
yo,this type of programming content is something else.🔥
@spacewad8745
@spacewad8745 4 ай бұрын
great video to pair with a relaxing evening... or so i thought
@nephew_tom
@nephew_tom 4 ай бұрын
1:00:10 "cool memory management, who needs garbage collectors if you have aren-aaasssss" - that deserves a tweet! 🤣
@BboyKeny
@BboyKeny 4 ай бұрын
On the topic of learning math through programming. I think it's a really good approach since math can be hard and abstract when you don't have a context or application to use it. When learning the math you need because you need to apply it, then you already have an application. Also look up math symbols to programming. for example Σ(xs) means for x in xs: sum += x where xs is a list of numbers
@adamjasinski1463
@adamjasinski1463 4 ай бұрын
Ada took a lot of inspiration from Pascal, also has great type system and the 'use' clause My professor told me that I should try it out, it was painful at first, for a beginner programmer like me that used Go for most of his time, but it's so delightful to be productive in this language, I guess same goes for Pascal.
@jakestewart5915
@jakestewart5915 4 ай бұрын
you should try doing shorts. you explore interesting ideas and give funny/worthwhile takes on them. i, and i imagine many others, rarely have the time to watch 1 1/2 hour videos. i imagine you doing shorts would do quite well
@thatstupiddoll
@thatstupiddoll 4 ай бұрын
I get that with text editors over ssh too, try running "reset" and then pasting stuff, that sometimes helps
@mhamdmarch8709
@mhamdmarch8709 4 ай бұрын
The one who always give me the reason to do some wierd projects, keep going especially with mathmatics and operating system stuff ❤❤❤🎉🎉🎉🎉
@KellyMurphy
@KellyMurphy 4 ай бұрын
I wonder if you were to take and classify a bunch of articles , then add them to training as they are classified does it get more or less accurate.
@dnkreative
@dnkreative 4 ай бұрын
if best compression option makes compression algo dependent on content it's better to not use it since it can mess up results so individual and concatenated versions of compression of the same data might use different ratios which will affect classification result.
@xulitol
@xulitol 4 ай бұрын
43:36 This thing u said inspires me a lot I always afraid of doing dumb things and writing shitcode but I want to improve Also your English is so Russian so I feel like a native English speaker listening you
@vvarhand3985
@vvarhand3985 4 ай бұрын
The Azozinator delivers, simple as
@dar4891
@dar4891 4 ай бұрын
omg tsoding uploaded
@bukitoo8302
@bukitoo8302 4 ай бұрын
You can add some option to store the classification result to a file so you can use it from run to run, without the need to re parse everything.
@SiiKiiN
@SiiKiiN 4 ай бұрын
Perhaps next up could be a ANN or approximate nearest neighbor
@shubhamsingh3068
@shubhamsingh3068 4 ай бұрын
The solution you have as of timestamp 40:54 is good enough but not elegant. For instance, If the `count` is 27 and the `chunks` is 7, your solution will divide it in 7 chunks of 3, 3, 3, 3, 3, 3, 9 which is okay. But still the difference between the smallest and largest chunk comes out to be 9 - 3 = 6 What you're truly looking for in this case (in my opinion) is regularization, where you want to divide the `count` into the `chunks` in nearly equal size aka minimizing the size difference between the smallest and largest chunk. Here is how I would've done it: int remaining_count = count; int remaining_chunks = chunks; while (remaining_chunks > 0) { int chunk_size = remaining_count / remaining_chunks; remaining_count -= chunk_size; remaining_chunks -= 1; printf("%d ", chunk_size); } printf(" "); The output of the above logic when `count` = 27 and `chunks` = 7 will be "3 4 4 4 4 4 4" which minimizes the size difference between the smallest and largest chunk.
@AntonioNoack
@AntonioNoack 4 ай бұрын
Not very elegant either. I prefer i0 = (threadIndex * numItems / numThreads) i1 = ((threadIndex+1) * numItems / numThreads)
@yagamilight2166
@yagamilight2166 4 ай бұрын
Similar to jai context, you could make temp allocator buffer thread local.
@AntonioNoack
@AntonioNoack 4 ай бұрын
As a recommendation for your next video, analyze the scaling behavior (#cores -> #speedup?), and look whether you can find out how many cores your system has without using a dedicated system call (would be quite educational, I think).
@matteovalentino4890
@matteovalentino4890 4 ай бұрын
Zap hosting is down at the moment? Can't seem to successfully donate
@XORfun
@XORfun 4 ай бұрын
Love you! MmmA!
@berndeckenfels
@berndeckenfels 4 ай бұрын
I am more of a calloc guy for struct members in arrays
@dnkreative
@dnkreative 4 ай бұрын
tail is very easy, just add 1 to each work batch count while remainder is > 0
@kevinnguyen163
@kevinnguyen163 4 ай бұрын
Don't bring my mom into this
@egk_nix
@egk_nix 4 ай бұрын
yandex geepeetee can answer english questions in english, but not all of them and you have to ask in russian to reply in english explicitly and its really wierd. At one point it started giving me answers in english to questions in russian.
@AndrieMC
@AndrieMC 4 ай бұрын
hello tsoding
@TsodingDaily
@TsodingDaily 4 ай бұрын
Hi!
@ecosta
@ecosta 3 ай бұрын
I came for the programer... I love his programming skills... But the chat jailbreak was the best part of this video.
@DanelonNicolas
@DanelonNicolas 4 ай бұрын
:set paste Will help to paste code into vim 😊
@anon_y_mousse
@anon_y_mousse 4 ай бұрын
This might also be a terminal problem. I use Konsole and regularly paste into Vim without putting it in insert mode and it works without issue.
@yousefsayed6380
@yousefsayed6380 4 ай бұрын
what about parallelizing on the gpu for poor people like me who can't buy a better cpu
@millieno
@millieno 4 ай бұрын
Then you use openCl or CUDA or HIP
@Maik.iptoux
@Maik.iptoux 4 ай бұрын
20:06, you could check if the amount is zero.
@Mozartenhimer
@Mozartenhimer 4 ай бұрын
The ugliest hack would of been making the temp buffer threadlocal. That would of been gross.
@TheMelopeus
@TheMelopeus 4 ай бұрын
Just throw computation and it will fix everything
@thomasziereis330
@thomasziereis330 4 ай бұрын
yes until Amdahl's law kicks in
@lizardy2867
@lizardy2867 4 ай бұрын
This is why we have GPU compute cores.
@forayer
@forayer 4 ай бұрын
Khello!
@berndeckenfels
@berndeckenfels 4 ай бұрын
1:11:01 23/25s real vs. user is not that much parallel, but Imguess the poor machine is struggling
@namefreenargrom5694
@namefreenargrom5694 4 ай бұрын
I think you can get rid of the sort. ??
@user-gz3zp8hw7z
@user-gz3zp8hw7z 4 ай бұрын
I would enjoi these videos so much more if I had any clue what is going on ....
@God-vl5tk
@God-vl5tk 4 ай бұрын
Can you stream on weekends? I cannot watch during the work. I go to work
@-rya1146
@-rya1146 4 ай бұрын
1:04:26 lmao
@luijia
@luijia 4 ай бұрын
Caspal
@blastygamez
@blastygamez 4 ай бұрын
Try visual programming languages for a video
@leifmessinger
@leifmessinger 4 ай бұрын
Imagine getting an instance of 8 shared cores on the cloud rather than having an 8+ core computer
@undefinedchk
@undefinedchk 4 ай бұрын
this man using google chrome?
@xulitol
@xulitol 4 ай бұрын
Привет
@mindasb
@mindasb 4 ай бұрын
Wait - you don't consider parallelization as a type of optimization? LOL
@AndrieMC
@AndrieMC 4 ай бұрын
no its just doing two stuff at once
@mindasb
@mindasb 4 ай бұрын
@@AndrieMC *just* sure bro. Keep thinking that.
I regret doing this...
1:20:07
Tsoding Daily
Рет қаралды 63 М.
Easy Annoying Popups in C
1:37:31
Tsoding Daily
Рет қаралды 26 М.
ИРИНА КАЙРАТОВНА - АЙДАХАР (БЕКА) [MV]
02:51
ГОСТ ENTERTAINMENT
Рет қаралды 524 М.
Can you beat this impossible game?
00:13
LOL
Рет қаралды 65 МЛН
They RUINED Everything! 😢
00:31
Carter Sharer
Рет қаралды 21 МЛН
1❤️#thankyou #shorts
00:21
あみか部
Рет қаралды 58 МЛН
The standard library now has all you need for advanced routing in Go.
13:52
Python RAG Tutorial (with Local LLMs): AI For Your PDFs
21:33
pixegami
Рет қаралды 100 М.
Data Mining in C
1:59:50
Tsoding Daily
Рет қаралды 39 М.
Don't buy these USELESS vintage chisels.
16:07
Rex Krueger
Рет қаралды 180 М.
Mind-bending new programming language for GPUs just dropped...
4:01
Why i think C++ is better than rust
32:48
ThePrimeTime
Рет қаралды 270 М.
Control Your Dependencies
1:45:07
Tsoding Daily
Рет қаралды 23 М.
Faster than Rust and C++: the PERFECT hash table
33:52
strager
Рет қаралды 515 М.
Easy Web Games in C
2:54:16
Tsoding Daily
Рет қаралды 47 М.
Hiding Data Inside of Executable Files
1:55:14
Tsoding Daily
Рет қаралды 24 М.
ВСЕ МОИ ТЕЛЕФОНЫ
14:31
DimaViper Live
Рет қаралды 49 М.
Apple watch hidden camera
0:34
_vector_
Рет қаралды 58 МЛН
How charged your battery?
0:14
V.A. show / Магика
Рет қаралды 4,3 МЛН
Iphone or nokia
0:15
rishton vines😇
Рет қаралды 1,4 МЛН