Пікірлер
@abdimajid684
@abdimajid684 14 сағат бұрын
For someone that did a huge DP coding in python. One common problem with like cython is that it does not support so many functions from numpy (good luck using argmax)
@N____er
@N____er 16 сағат бұрын
How do you optimise python performance without any external libraries or programs? Just native python3 with the standard pre-installed libraries.
@dougmercer
@dougmercer 12 сағат бұрын
Hmm, I guess the only way would be to write efficient code. I'd profile the code to see what functions are taking the most time, and then focus on improving the slow/frequently called ones Use the right data structures/algorithms. consider using functools.cache to memoize anything that would benefit from caching. reprofile your code after each change to quantify what changes were helpful. You can technically write your own c extensions if your system has a c compiler, but that's probably not what you want.
@joaovitormeyer7817
@joaovitormeyer7817 22 сағат бұрын
for anyone interested, I copyed his c++ code and his example with 30000 elements in each vector, and in my computer it ran in ~25 seconds (my PC is slow). By simply compiling with -Ofast, it got down to ~5 seconds, still without modifying the code at all. I'm not hating in the content of the video, wich in fact is great
@dougmercer
@dougmercer 17 сағат бұрын
I compiled the C++ with -O3 for this video gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3
@Saddl3r
@Saddl3r Күн бұрын
Great editing. What programs do you use? A "making-of" video would be very interesting!
@dougmercer
@dougmercer Күн бұрын
Thanks! This video I used a mixture of Manim to render the syntax highlighted code and Davinci Resolve to do the call outs. In later videos I use exclusively Davinci Resolve (+ a Python script to turn my code into a syntax highlighted fusion composition). I'm actually working on writing my own animation library that's similar to manim, but is more tailored for me/the sort of videos I make. I plan on making a video about it as one of my next two videos. so, stick around for that =]
@Saddl3r
@Saddl3r Күн бұрын
@@dougmercer Thanks!! 🌟
@Macatho
@Macatho 3 күн бұрын
I'm curious how fast this would run with a GPU implementation. I loved this video, hope you'll extend it with a GPU implementation :)
@dougmercer
@dougmercer 3 күн бұрын
Someone did a Dask + cuDF implementation. Seems super fast github.com/gunnarmorling/1brc/discussions/487
@Macatho
@Macatho 3 күн бұрын
You cant possibly be such a purist that you dont use numpy or pandas 😅😅
@dougmercer
@dougmercer 3 күн бұрын
I tried them! Way slower for this task Using this pandas implementation github.com/Butch78/1BillionRowChallenge/blob/main/python_1brc%2Fmain.py takes about 150s, but polars takes 11-12s
@SimpaTheImba
@SimpaTheImba 4 күн бұрын
14:10 my man, can't agree more
@dougmercer
@dougmercer 4 күн бұрын
Hahaha absolutely =]
@arnabchatterjee2094
@arnabchatterjee2094 5 күн бұрын
basically prefect is celery with steroids
@dougmercer
@dougmercer 5 күн бұрын
That's typically how I use the open source library! I recently spoke with someone from Prefect at PyCon, and they said the automation features and more celery-esque features are coming to the OS library soon. So keep an eye on that =]
@ryanshea5221
@ryanshea5221 5 күн бұрын
How to write fast Python: Write C
@dougmercer
@dougmercer 5 күн бұрын
Well... I write Python, and let PyPy compile itself down to C =]
@nathan22211
@nathan22211 7 күн бұрын
I feel like you could get similar performance using lupa + lua_importer or nimporter/nython. Both lua and nim are similar in difficulty to python, though I think nim is somewhat like rust when it comes to how to code it.
@dougmercer
@dougmercer 6 күн бұрын
This is my first time hearing about either of those. Very interesting 🤔
@AlexanderHyll
@AlexanderHyll 7 күн бұрын
I mean all this is saying is if python wraps a low level language it gets low level performance. Not that huge of a revelation. Thats how pythons entire data science world runs.
@dougmercer
@dougmercer 7 күн бұрын
That, *and* users don't necessarily need to write their own low level code to get that speed up
@Big_bangx
@Big_bangx 8 күн бұрын
What about optimizing your C++ implementation instead to go faster ?
@gawwad4073
@gawwad4073 8 күн бұрын
Nice video. VERY good writing and editing. Smooth as hell, keep it up!
@dougmercer
@dougmercer 8 күн бұрын
Thanks =]
@tangerie1284
@tangerie1284 8 күн бұрын
I wonder how fast an AOT compiler like nuitka would make it
@dougmercer
@dougmercer 8 күн бұрын
Yeah, I wonder that too. I'm also curious if anyone has a Cython solution...
@affrokilla
@affrokilla 8 күн бұрын
A 50x speedup between a for loop and a polars dataframe is really significant, great video!
@dougmercer
@dougmercer 8 күн бұрын
Polars/Duckdb are crazy fast. Thanks for watching!
@user-uc6wo1lc7t
@user-uc6wo1lc7t 10 күн бұрын
Well... Your python code is not well "optimized" too =) 1. a, b - np.array, so you use numpy in your pure python code (that is not fair). Moreover, switching to lists gives you speedup (I'm shocked too). But, maybe you did so... You did not show us "all" code. 2. max of two elements is not faster than pure if-else. And if you use it - you need to extract dp[prev_i, j] and dp[i, prev_j] to variables to not duplicate accessing to elements. 3. Your i-1 and j-1 can be evaluated in corresponding loops and saved to variables. The same argument for len(a|b) + 1. Can be evaluated once. With all these I could speedup pure python up to +70% performance on my Ryzen 5 3600X. And it was consistant through N = 3_000, N = 10_000, so all my tests with numpy was with N = 3_000 to not waste too much time. Therefore optimized python code faster than not optimized numpy code by ~82.2%. If you apply 2) and 3) optimizations to numpy version, optimized python code faster by ~77.5%. I even tried different logic for numpy and python versions where dp is 1d array, but python version became slower, while numpy version made some poor progress (optimized numpy with different logic faster than optimized numpy by ~5%). Well, it was intresting. Never expirienced numpy code slower than pure python. Maybe because operations are so fast that switching to numpy types make it worse...
@dougmercer
@dougmercer 10 күн бұрын
I didn't use numpy in the pure Python implementation... See the code at 1:59. That's why I mentioned numpy was surprisingly so much slower in the numpy section. As for the other stuff... You're right that there might be other small tweaks you could make to the various implementations. Thanks for pointing them out
@user-uc6wo1lc7t
@user-uc6wo1lc7t 10 күн бұрын
​@@dougmercer From the timestamp you gave I can't deduct if a,b are lists or np.array s because they were created by numpy functions as you have shown at 2:10 . But I'm pretty sure you DID convert a, b to lists, but it could be actually seen when you used mypy and created type hints. I just forgot about it because I was fully invested in pure python and numpy. Sorry, I didn't mean to say your benchamrks are bad, they are actually pretty good. Your video is a lifechanger for me. Previously, I used numpy without any hesitation, even in loops that doesn't leverage vectorization. And, strangly, numpy ave me speedup. So I tried to code your example in disbelive. But, undoubtfully, your code is a good example that I was wrong. But that leaves me with questions how could I get speedup in my previous code...? Maybe I was useing vectorization... Can't remember. Seems like I'm gonna benchmark my old code and test it against pure python implementation. Recently I watched one guy who was benchmarking nested loops. He was trying to proove that smaller outer loops gives +30% performance. But he didn't even realized what exactly he was benchmarking and fooled a lot of people and ignoring me even I gave him math proof of results he got. And conditions to get 30+% boost are extremely vague. So I lost my beliefs in such type of videos. But your... is really good. Thanks a lot!
@dougmercer
@dougmercer 10 күн бұрын
@@user-uc6wo1lc7t Ah I see what you're saying. I believe I coerced them to lists but I'm not 100% sure. I'll have to double check later. And definitely agree- I always reach for numpy, and it's surprising when it doesn't help! In this case, indexing into the arrays to do a lot of scalar operations is slower. We typically get the speed up when we can do vectorization. Thanks for watching and the thoughtful comments =]
@UndyingEDM
@UndyingEDM 10 күн бұрын
The video editing is top notch too!
@dougmercer
@dougmercer 10 күн бұрын
Thanks =]
@Slightlymisshapen
@Slightlymisshapen 10 күн бұрын
Beautiful presentation. I love your descriptions on libraries. The illustrative code break down puts context and concrete examples on what would otherwise be yet more abstract documentation. It helps to grasp the utility of the libraries uou describe not just in this video but all of them. Fantastic work.
@dougmercer
@dougmercer 10 күн бұрын
Aw, thanks! glad you enjoyed it, and thanks for your super nice comment =]
@silloo2072
@silloo2072 10 күн бұрын
Why no c++ optimisation
@dougmercer
@dougmercer 10 күн бұрын
I did -O3 compile the C++, but this channel is mostly focused on Python devs. So the idea is, what would someone unfamiliar with C++ write... That said, structuring as a 1D array does achieve some speed up, but still only slightly faster than Numba/Taichi , so it doesn't really change the story
@ogrp5777
@ogrp5777 11 күн бұрын
What about pypy?
@dougmercer
@dougmercer 11 күн бұрын
I covered PyPy in my latest video ("How fast can Python parse 1 billion rows of data")
@ogrp5777
@ogrp5777 11 күн бұрын
@@dougmercer I just saw the video, that was brilliant! Thanks
@ali.moumneh
@ali.moumneh 12 күн бұрын
your video quality is top notch, im sure it will soon equate to video views if you keep this up, good luck.
@dougmercer
@dougmercer 12 күн бұрын
Thanks! I hope so too 🤞
@DepressedMusicEnjoyer
@DepressedMusicEnjoyer 14 күн бұрын
As much as I like the video the 6 times slower not being a problem kinda drives me crazy as a person doing low level coding 😭 Like yeah you’re right for a lot of stuff it doesn’t matter, but then imagine 5 vs 30 fps, and then if it’s nested within another thing you may get exponentially slower. I have been trying to make a 480mhz mcus cpu do rendering and it’s difficult to get 30 fps ish and well if using python would mean 6 times slower I woudlnt be happy no matter how much easier my code would be to understand
@dougmercer
@dougmercer 14 күн бұрын
That's fair. Some things are worth squeezing all the performance out It will be nice when PyPy supports sub-interpreters or no-GIL... it will eventually be possible to close the gap more
@0MVR_0
@0MVR_0 14 күн бұрын
should not have watched this video, considering the fact that I have been waiting two days for my implementation of kendal's Tau to finish,
@dougmercer
@dougmercer 14 күн бұрын
Oh no 💀 Numba + numpy or PyPy would probably help
@rverm1000
@rverm1000 18 күн бұрын
That's nice of you to point these libraries out.
@dougmercer
@dougmercer 18 күн бұрын
Thanks!
@bbajr
@bbajr 18 күн бұрын
is pandas slow?
@dougmercer
@dougmercer 18 күн бұрын
It would be for something like this! Using this pandas implementation github.com/Butch78/1BillionRowChallenge/blob/main/python_1brc%2Fmain.py takes about 150s, but polars takes 11-12s
@gencurrent
@gencurrent 19 күн бұрын
It's hard to watch this with all the whistles and blowers and another hangers, glitters and so on. Kindly, stop using those! The interesting video turns into a torture session!!
@dougmercer
@dougmercer 18 күн бұрын
Still trying to find my style/voice ¯\_(ツ)_/¯ Check out my two more recent videos. They have less intrusive editing
@gencurrent
@gencurrent 18 күн бұрын
@@dougmercer Thank you Doug : )
@joaoguerreiro9403
@joaoguerreiro9403 19 күн бұрын
Damn, this was an instant follow! Hope to see more computer science content 🙏🏼 Great video :)
@dougmercer
@dougmercer 19 күн бұрын
Thanks so much =]
@Jp-ue8xz
@Jp-ue8xz 22 күн бұрын
c++ -O3 flag: am i a joke to you?
@dougmercer
@dougmercer 22 күн бұрын
It was -O3 optimized gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3
@RiadAhmed-ce6qo
@RiadAhmed-ce6qo 23 күн бұрын
python use third party compiler as well to execute fast.
@MaxiveLegend
@MaxiveLegend 23 күн бұрын
1. "Some of these solutions are gonna be as fast or even faster than my C++ implementation" just means your C++ is bad lol. The python interpreter was built in C, so by definition a python program can never be faster than , or even as fast as C. C++ is SLIGHTLY slower than C, but python would never fall in between them. 2. Making python "faster" by transpiling it to C isn't making PYTHON faster, it's just converting it to C. It's basically like saying "I'm going to upgrade my Renault Twingo" and then getting a Bugatti Veyron
@patfre
@patfre Күн бұрын
Yeah
@smithdoesstuff
@smithdoesstuff 23 күн бұрын
Kinda bummed I wasn’t sub 10k
@dougmercer
@dougmercer 23 күн бұрын
Hah! I'm at 9,991 subs, so good news! Hopefully crossing 10k mark soon =] 🤞
@CottidaeSEA
@CottidaeSEA 25 күн бұрын
I shit on Python a lot for being slow, but honestly, 8-10 seconds to read 1 billion rows is sufficient in most scenarios.
@joseduarte9823
@joseduarte9823 27 күн бұрын
Depending on how large the total sum actually is, using an incremental mean may yield better performance since python won’t need to upgrade the number to a big int
@dougmercer
@dougmercer 26 күн бұрын
Neat idea... It's worth a shot! Feel free to fork the repo and give it a try
@nikilragav
@nikilragav 27 күн бұрын
are you allowed to use numpy or gpu (torch, cupy, etc)
@dougmercer
@dougmercer 27 күн бұрын
You could use numpy, but I don't think it would help (the bottleneck is reading the data in). I did see some use Dask + cuDF (CUDA) and that was very fast. However, it wasn't allowed in the challenge because the evaluation system didn't have a GPU
@nikilragav
@nikilragav 27 күн бұрын
@@dougmercer ah. Reminds me of another challenge I saw where IO is the bottleneck. Even there I'm wondering if writing the content to the GPU memory and back is too slow
@cottawalla
@cottawalla 28 күн бұрын
I couldn't get your opening "performance critical python" out of my head and so missed the entire rest of the video.
@dougmercer
@dougmercer 28 күн бұрын
¯\_(ツ)_/¯
@jmidski5753
@jmidski5753 29 күн бұрын
I think it's 60x faster not 100x. We don't have 100 seconds in a minute. Great video though!
@dougmercer
@dougmercer 29 күн бұрын
The clock visualization is a little confusing but... 256 seconds 2.56 seconds 100x difference
@BarafuAlbino
@BarafuAlbino 29 күн бұрын
Pypy? Nuitka?
@dougmercer
@dougmercer 29 күн бұрын
I tackled PyPy in my latest video if you're interested ("How fast can Python parse 1 billion rows of data")
@user-pg9nf2vq8s
@user-pg9nf2vq8s Ай бұрын
i would never use python, but i like watching how people optimize the hell out of something.
@dougmercer
@dougmercer Ай бұрын
There's something Zen about it 🧘
@ThisRussellBrand
@ThisRussellBrand Ай бұрын
Beautifully done!
@dougmercer
@dougmercer Ай бұрын
Thanks Russell =]
@ThisRussellBrand
@ThisRussellBrand Ай бұрын
This is a beautiful explanation. Thank you for sharing it.
@dougmercer
@dougmercer Ай бұрын
Glad it was helpful!
@alexnolfi3730
@alexnolfi3730 Ай бұрын
did you test out pandas to see how much slower it was than polars?
@dougmercer
@dougmercer Ай бұрын
It's way slower Using the pandas implementation in here github.com/Butch78/1BillionRowChallenge/blob/main/python_1brc%2Fmain.py takes about 150s, whereas the polars implementation takes 11-12s
@GRHmedia
@GRHmedia Ай бұрын
Can't really say you are just relying on python at that point. Well guess you can't say that to start with. Cpython runs on a JIT that is programmed in C to start with. If python is so great why don't they create a compiler with it to create a JIT that made from python that runs python.
@dougmercer
@dougmercer Ай бұрын
My focus on this video was libraries/approaches that let you either *write* python or interoperate with Python very seamlessly. I don't really care that Taichi or numba dips to LLVM. As for compilers written in Python... PyPy's JIT is written in RPython and compiles to C ¯\_(ツ)_/¯
@mohak9102
@mohak9102 Ай бұрын
How is this better than airflow?
@dougmercer
@dougmercer Ай бұрын
"better" is probably a matter of taste. I prefer it because it feels more like writing Python and less like writing a config file. There are several comparisons out there that dive into the differences. Here's a third party link comparing them + another approach (Argo) neptune.ai/blog/argo-vs-airflow-vs-prefect-differences
@mahdirostami7034
@mahdirostami7034 Ай бұрын
8:53 I cannot believe writing a parser would gain any performance considering the default one is probably implemented in C. I assume this is a case of pypy optimizing while running and I'm wondering if running this same script with cpython would result in worse performance.
@dougmercer
@dougmercer Ай бұрын
I expect that the custom parser would have worse performance in plain CPython, but I didn't test it
@rushcoc9605
@rushcoc9605 Ай бұрын
😮😅can you tell me how to measure how much time it take which part of code just by looking how ?
@dougmercer
@dougmercer Ай бұрын
1. A bit of intuition 2. A bit of being totally wrong (removing the constants that indicated which column was min, max, count, sum didnt speed up performance... I was trying too many things at once and accidentally bundled that change in with something else) 3. I used a lot of time.perf_counter() to measure that time that certain operations took and A/B tested them Normally, in CPython, I would typically profile using something like PyInstrument or other similar line-level profilers
@maurolimaok
@maurolimaok Ай бұрын
Nice channel. Hope it grows.
@dougmercer
@dougmercer Ай бұрын
Thanks Mauro! I hope so too🤞
@nangld
@nangld Ай бұрын
Instead of putting so much effort optimizing python, why not just use C/C++? Also multiple CPUs wont help, unless you have RAID0
@dougmercer
@dougmercer Ай бұрын
Multiple cores help even for C/C++
@this-one
@this-one Ай бұрын
Would it count as Python if we write it as a module in C?
@dougmercer
@dougmercer Ай бұрын
I'm no philosopher, but this gives me ship of theseus vibes. so... maybe technically but I don't feel good about it