"Python Performance Matters" by Emery Berger (Strange Loop 2022)

  Рет қаралды 77,769

Strange Loop Conference

Strange Loop Conference

Жыл бұрын

It's 2022. Moore's Law and Dennard scaling have run out of steam, making it harder than ever to achieve high performance - especially in Python. This talk first explains in detail the unique challenges that Python poses to programmers. It then presents Scalene, a novel high-performance CPU, GPU and memory profiler for Python that does many things that past Python profilers do not and cannot do. Scalene both runs orders of magnitude faster than other profilers while delivering more accurate and more actionable information that's especially valuable to Python programmers.
Emery Berger
Professor, University of Massachusetts Amherst
@emeryberger
Emery Berger is a Professor of Computer Sciences at the University of Massachusetts Amherst, the flagship campus of the UMass system. Professor Berger and his collaborators have built numerous widely adopted software systems including Hoard, a fast and scalable memory manager that accelerates multithreaded applications (on which the Mac OS X memory manager is based); DieHard/DieHarder, error-avoiding and secure memory managers that influenced Windows, and Coz, a "causal profiler" that ships with modern Linux distros. He is also the developer and maintainer of CSrankings.org. His honors include an NSF CAREER Award, Most Influential Paper Awards at OOPSLA, at PLDI, and ASPLOS; five CACM Research Highlights, and Best Paper Awards at FAST, OOPSLA, and SOSP; he is an ACM Fellow. Professor Berger served six years as an elected member of the SIGPLAN Executive Committee; a decade as Associate Editor of TOPLAS; he was Program Chair for PLDI 2016 and co-Program Chair of ASPLOS 2021.
------- Sponsored by: -------
Stream is the # 1 Chat API for custom messaging apps. Activate your free 30-day trial to explore Stream Chat. gstrm.io/tsl

Пікірлер: 62
@jackgenewtf
@jackgenewtf Жыл бұрын
Instead of saying "you are writing Python, but you are not writing Python," it might be better to phrase it as, "you are just using Python to orchestrate highly performant C programs."
@skellious
@skellious Жыл бұрын
yes I often explain it as being a conductor.
@miraculixxs
@miraculixxs Жыл бұрын
To be precise, if you are writing "you are just using to orchestrate highly performant machine code". It's a fact of programming life and not specific to Python.
@MyAmazingUsername
@MyAmazingUsername Жыл бұрын
@@miraculixxs Salty python programmer detected. 😂👌
@diogoantunes5473
@diogoantunes5473 Жыл бұрын
@@miraculixxs I would not frame compiled languages as orchestrating assembly code? You are creating the assembly code youself (by cooperating with the compiler)
@erisonveshi8406
@erisonveshi8406 Жыл бұрын
@@diogoantunes5473 The compiler is a Jerk! But the mofo is right all the time XD
@ehhhhhhhhhh
@ehhhhhhhhhh Жыл бұрын
Wow, cutting out the questions at the end made the Q&A feel so streamlined. I hope more lecture videos start doing that. Probably saved all viewers 5+ minutes! Emery gives some fantastic lectures, please invite him back in the future.
@markusklyver6277
@markusklyver6277 Жыл бұрын
Also he repeated the question, which was good.
@leonidkerchev4256
@leonidkerchev4256 Жыл бұрын
Wow! Big thank you to Prof. Berger and his team for the ground braking profiler from someone who had to upgrade RAM to 64Gb for data science projects. Just tried it and it is a masterpiece. The visual representation of the profiler opens doors into code optimization for everybody.
@allanwind295
@allanwind295 Жыл бұрын
Informative, funny without being lame. The run-time graph was great motivation. It might be even more compelling if you work an actual problem then weave how Scalene addresses the gap in the current Python profiler landscape.
@MrKenkron
@MrKenkron Жыл бұрын
I have a script with a native backend, pure python backend, and numpy backend that I tested with this profiler. Very interesting results. The numpy implementation, and the pure python implementation both had around 31% native code, despite the numpy implementation being around twice as fast. Scalene showed me a few good places to optimize, and I was able to cut the pure python implementation down from 35 to 25 seconds. The c backend was still about 100x faster (0.2 seconds), with the output "Scalene: Program did not run for long enough to profile."
@mushchlowastaken
@mushchlowastaken Жыл бұрын
Hey, the guy that made the randomizer and Coz! I like this guy, excited for the talk :)
@hugsun5918
@hugsun5918 Жыл бұрын
I love Emery Berger, such a charismatic guy and is always working on interesting projects.
@muray82
@muray82 Жыл бұрын
Small update - Py-spy has multiprocessing. Just use --subprocesses to catch those. I liked way better same output but rendered by pyroscope. Comparison function being really helpfull.
@9e7exkbzvwpf7c
@9e7exkbzvwpf7c Жыл бұрын
Love that he mentioned Dennard Scaling. My Parallel Programming prof introduced us to this back in 2016 and made the argument that it was actually the driver for increasing need for parallelized programs.
@tissuepaper9962
@tissuepaper9962 Жыл бұрын
"1 core too hot. 128 cores do trick." -Your professor
@GodOfMacro
@GodOfMacro Жыл бұрын
Violent Agreement haha, super nice tool, makes python even more valuable. The ability to see what your code does is what I need most in a programing language.
@rcoder01
@rcoder01 Жыл бұрын
I was very surprised to see that the `np.array(range(10**7))` in the code example was never addressed. On my machine, removing the redundant `np.array` call from the `random` line only saved about 2.5 seconds while changing `np.array(range(10**7))` to `np.arange(10**7)` saved almost 8 seconds. Of course the first optimization saved memory, which the second didn’t really do.
@general_alexus2533
@general_alexus2533 Жыл бұрын
You know the reason for it and think its too obviouse or do you want the answer? In case of 2: "np.array(range(10**7))" creates a python (slow) list via python range function (slow) that than needs to be cast (slow) into a numpy c-optimised object. "np.arange(10**7)" passes the argument 10**7 directly into the numpy C-optimised code and creates the array inside using a fast function.
@robmckiernan3264
@robmckiernan3264 Жыл бұрын
Incredibly well explained and engaging. Good stuff 👍
@Greenindragon
@Greenindragon Жыл бұрын
I love Emery's talks, always so informative and well-structured
@Kattemageren
@Kattemageren Жыл бұрын
Great talk and tool too! Will be trying it out for sure, thanks
@simonsemmler9804
@simonsemmler9804 Жыл бұрын
Great talk and great profiler!
@Splatpope
@Splatpope Жыл бұрын
27:00 ooh is that why sometimes, it's impossible to CTRL+C a running python program ?
@juliusfucik4011
@juliusfucik4011 Жыл бұрын
I am certainly going to try this. I still prefer C++, but Python has many benefits, especially the ease of getting to work cross platform. I can write a bit of Python, including optimized modules on a windows 64 bit laptop and then move it to an arm based small form factor computer running Linux. That is certainly possible using C/C++, but there are more steps in compilation and some basic IO issues that always need solving.
@carlosp.6784
@carlosp.6784 Жыл бұрын
You might want to try Julia!
@Snirokok
@Snirokok Жыл бұрын
was very confused when the size of c++ map was compared to a python dictionary. just checked but `sizeof(std::unordered_map) is 56`
@Frozander
@Frozander Жыл бұрын
I currently do not have access to my big boy pc that has all my big python codebases so I can't be 100% sure but from testing my scripts and such it seems like a cool profiler. Much better than the ones I tried to use before.
@lobaorn
@lobaorn Жыл бұрын
I laughed when he used a Rotate to turn into a table, making a subtle reference to this "GoingNative 2013 C++ Seasoning" Talk by Sean Parent: kzbin.info/www/bejne/jWPXiIKar8yLfqM Edit: Amazing talk as always, have watched Emery talks for years now, and he always deliver the goods!
@noli-timere-crede-tantum
@noli-timere-crede-tantum Жыл бұрын
Hadn't heard of this guy. Sounds like I'll be binge watching all his stuff now - like I did when I came across Dave Beazley years ago.
@sortof3337
@sortof3337 Жыл бұрын
Yo. I use scalene everyday. Its dope. Nice to see Emery giving talk. :D
@wstein389
@wstein389 Жыл бұрын
“…or in the case of numpy, FORTRAN, at the end of the day, not a lie.” This quote at 16:30.
@matthiasschuster9505
@matthiasschuster9505 Жыл бұрын
How is that the case?
@michaelosmann7509
@michaelosmann7509 Жыл бұрын
Seems interesting, but I couldn’t get it to work on Windows. Every time I run it, I get a different “Error in program being profiled”. I still get a report, but it’s pretty useless when the program fails after a second.
@Rareme530
@Rareme530 Жыл бұрын
How to use this with fastapi. Documentation is not clear about how to use it with web frameworks?
@tamasgal_com
@tamasgal_com Жыл бұрын
Great talk! But in my opinion Python does damage the high performance ecosystem (period). While there is a small group of (let's call them) experts (I'd count myself as one) who have spent years in learning language internals and ways to make Python fast (Cython, Numba, numpy gymnastics, Dask, C/C++/Fortran wrappers and so on), the high-level APIs Python offers always come with limitations, are very complex and hard to maintain (lot of different technologies/languages) and newcomers will write code with terrible performance when they implement whatever they need and can't find. That's fact, I am doing code-reviews and reveal from week to week code which runs 10000x slower than it could on clusters with thousands of CPUs, that's just insane. I use python for at least 10 years in scientific programming and have written tons of packages but our students and PhD candidates struggle to squeeze out moderate performance out of Python and need a lot of training. I think it's a waste of time and resources and it's just beating a dead horse. It's time to move forward and be open for alternatives which solve a lot of these issues, like Julia.
@matthiasschuster9505
@matthiasschuster9505 Жыл бұрын
Do you know the Python implementation of GraalVM?
@wafflescustard5374
@wafflescustard5374 Жыл бұрын
This is exactly the same with any language that is not C++ or lower
@sjatkins
@sjatkins Жыл бұрын
Pretty much left out not at all metal languages that have really good JITs and built in optimization options such as one of the oldest languages of all, Lisp.
@sheeplord4976
@sheeplord4976 Жыл бұрын
LISP used to be considered a slower language, but that was when C was considered "high level". If you want maximum performance specifically, C or C++ or rust are king with lisp still being slower in most metrics.
@Drqonic
@Drqonic Жыл бұрын
Why would he say the guy is lying when he claims to like GIL? I've found it very helpful when working with databases that run with threading
@PhysicsGamer
@PhysicsGamer Жыл бұрын
I love that people are finally coming around to the realization that performance is, in fact, still important... and the best way to produce performant Python code is to not code in Python.
@pubdigitalix
@pubdigitalix Жыл бұрын
Totally agree. Now you have people programming embedded systems with micropython.
@jankucera8505
@jankucera8505 Жыл бұрын
No. Internet throughput still makes Python performance irrelevant, mr. Gamer. So do Python compilers.
@jakeisnt7377
@jakeisnt7377 Жыл бұрын
RE: the intro: what about Lisp?
@KipIngram
@KipIngram 6 ай бұрын
And Forth. You had FORTH in 1970, and it's really the most "metal" of any of them. Or, rather, it's *as* metal as assembly, and far more so than the other ones you listed.
@paulpayer
@paulpayer Жыл бұрын
Mid 80s computers: Conveniently forgot the Amiga ... whose chip set was used with Video Toaster which started a special effects revolution in television & film, before Commodore managed to bankrupt the brand.
@brandonlewis2599
@brandonlewis2599 Жыл бұрын
Well, this was nothing I hadn't learned first hand from 10 years of writing python. TBH, not sure why things like Cython and PyPy aren't more popular. It's an easy way to get a huge performance increase without really doing much hand-optimization.
@randomnobody660
@randomnobody660 Жыл бұрын
Admittedly, I'm not that informed of what python is "generally" used for, but from my experience mostly working with ml or data processing stuff, you mostly use it to call libraries anyways, so it almost makes sense that existing profilers don't do a lot of things. Like ofc you don't care about c time, or gpu usage, or memory leaks; those you generally assume are handled by people who wrote your libraries. And if you are writing libraries for python, I'd assume you would be profiling your stuff in the language you are actually writing in, not in python. Now having typed that, I'm guess that last assumption is wrong somehow?
@DajesOfficial
@DajesOfficial Жыл бұрын
In complex data processing pipelines you still need to know which commands take the most time to complete so you can search for ways to replace them, their arguments or even their order and oftentimes you may increase performance by a few hundred percent this way.
@hughmanwho
@hughmanwho Жыл бұрын
Most important Python optimization tip: Don't use it, use something like Flogram instead
@bernadettetreual
@bernadettetreual Жыл бұрын
Looking at the steaming pile of (censored) that the Python interpreter is and the 15 million people who use it, I can't help but wonder why nobody seems to care to actually improve the interpreter. You could get a ton of mileage from actually creating a modern interpreter, like V8 for JavaScript. Instead, people say, "nah, just just Python to orchestrate FFI calls". 😂
@JuanBC
@JuanBC Жыл бұрын
it is really a very bad idea to do multiprocess profiling. You increase system noise, that's why not a single tool implements multiprocessing.
@allanwind295
@allanwind295 Жыл бұрын
What's the alternative?
@EmeryBerger
@EmeryBerger Жыл бұрын
Scalene is a statistical sampler so it does not generally distort executions of programs being profiled. The way that Scalene profilers multiprocessing does not interfere with the programs being executed, instead aggregating information from each process only when profiling ends.
"Performance Matters" by Emery Berger
42:15
Strange Loop Conference
Рет қаралды 479 М.
"The Mess We're In" by Joe Armstrong
45:50
Strange Loop Conference
Рет қаралды 376 М.
FOOTBALL WITH PLAY BUTTONS ▶️ #roadto100m
00:29
Celine Dept
Рет қаралды 72 МЛН
ELE QUEBROU A TAÇA DE FUTEBOL
00:45
Matheus Kriwat
Рет қаралды 13 МЛН
Human-Machine Teaming - 2025 Call for White Papers
9:22
Laboratory for Analytic Sciences
Рет қаралды 12
The most important Python script I ever wrote
19:58
John Watson Rooney
Рет қаралды 92 М.
"Systems that run forever self-heal and scale" by Joe Armstrong (2013)
1:10:23
Strange Loop Conference
Рет қаралды 72 М.
"From Geometry to Algebra and Back Again: 4000 Years of Papers" by Jack Rusher
31:35
PLEASE Use These 5 Python Decorators
20:12
Tech With Tim
Рет қаралды 88 М.
Keynote: The big leap of Python 3.13 - Łukasz Langa
30:56
PyCon Thailand
Рет қаралды 9 М.
So You Think You Know Git - FOSDEM 2024
47:00
GitButler
Рет қаралды 948 М.
"Type-Driven API Design in Rust" by Will Crichton
40:57
Strange Loop Conference
Рет қаралды 116 М.
Карточка Зарядка 📱 ( @ArshSoni )
0:23
EpicShortsRussia
Рет қаралды 90 М.
Обзор игрового компьютера Макса 2в1
23:34