Profiling and optimizing your Python code

Profiling and optimizing your Python code | Python tricks

Рет қаралды 71,755

Sebastiaan Mathôt

Күн бұрын

Пікірлер: 139

@senajoaop 5 жыл бұрын

Had to do a fast and superficial analysis in a very long code. This video made it possible, thanks a lot pal.

@sheikhakbar2067 4 жыл бұрын

This channel needs a couple of millions subscribers... I always come back to it to learn those marvellous tips and tricks!

@Drahagoon 4 жыл бұрын

Awesome video! Well explained, with a simple, clear and typical hands-on example illustration. Great work.

@MsSuyash1995 5 жыл бұрын

I came here to get a glimpse of how cProfile module but leaving here impressed with your final solution... I loved how you combined the zip() function after sorting the list... And, a great job in illustrating the importance of why profilers are an important tool in a programmer's armamentarium...

@mstr_rprochowicz 5 жыл бұрын

It helped me a lot in tracking expensive functions that were unnecessary used 2 million times in a loop. Thanks for this useful tutorial!

@MrAmbarish710 4 жыл бұрын

Man your videos are really really helpful! Best explanation of cProfile, profiling and optimization in python. Please keep posting videos...

@legau2k 6 жыл бұрын

Awesome video. It was great watching you go step by step through the ode optimization. Your solution for finding duplicates also was very clever and elegant. Worth a subscribe ^.^

@mervynwinn1852 5 жыл бұрын

i love the way you say "popping"

@eilonavizemer7755 3 жыл бұрын

Great video! here is another, perhaps easier, solution to make this code's complexity linear: 1. lowercase all the movies 2. convert your movie list into a set (sets in python avoid duplicates) 3. convert the set back into a list and return it.

@liorbm1 3 жыл бұрын

It will be nice to see perf difference between his final code to your idea..

@vermarajat2596 2 жыл бұрын

i think conversion from set to list will take more time.

@cosminturtureanu692 2 жыл бұрын

The goal is to find the duplicates, not to remove them

@balmittal1770 2 жыл бұрын

Nice code optimization. specially the last one.

@0versun0 7 жыл бұрын

More more more videos. Yours video is very helpful! Keep going

@abhishekpandey7096 6 жыл бұрын

Hey🌏🌏🏕️

@sailalmishra4860 5 жыл бұрын

Buddy this is Amazing, You should not be on such low subscriber count.. God bless

@simonbrecher878 5 жыл бұрын

Good video, but it is actually quadratic (polynomial), not exponential. Quadratic is n^2, polynomial n^k, where k is a constant and exponential is k^n. n is length of input. You would not be able to do even near 5000 in exponential problem.

@daviddvorak3278 3 жыл бұрын

The final solution is also not linear, but n*log(n), since python sort is not linear.

@AyushMandowara_xx7 4 жыл бұрын

This helped me optimize code by about 50-75% depending on the file contents being scanned. Earlier it was consuming about a minute on a large scan, while now it takes about 20 at max. The average speed is reduced to 10secs from 25secs. All I did after analyzing was change my Pandas Series objects (generated from Google Spreadsheets) to tuples (lists would also have the same effect but my data never changes in a single run). Using cprofiler I could see that Pandas library was consuming loads of resources just to fetch values based on an index number. Thanks a ton!!

@prakharchaurasiya8107 4 жыл бұрын

Finally some optimization that is not too complex. Thanks.

@sm3801_smo 7 жыл бұрын

I initially subbed because of your Biological Psychology videos, but I didn't know you're into programming, very useful video!

7 жыл бұрын

+Samuel Muñoz thanks ! Yes, the Bio Psy lectures are something new for me. Most of the videos are about Python and/or OpenSesame

@vinitkumar2923 2 жыл бұрын

Great video and explanation. Thanks for sharing this.

@IsmaelRDeMelo 2 жыл бұрын

You earned my follow at 15:33

@tonyradice4166 Жыл бұрын

Outstanding presentation!!!

@AutoXplorerYT 2 жыл бұрын

Well explained! Thanks for this video....

@Julien-hg8jh 4 жыл бұрын

15:30 auto corection ! Nice video BTW :D

@yildirimicen766 2 жыл бұрын

Hi Mr. Mathot, you are great, I love your Python sessions... :)

@marveltv5341 4 жыл бұрын

Careful... he is a hero 🙌

@maedehshahabi4744 2 жыл бұрын

Thank you sir for your clear explanation.

@hamol3d 3 жыл бұрын

Great Video! Thank you.

@razintailor 4 жыл бұрын

Great explanation. Lucid and fundamental. It is indeed helpful.

@ranelpadon8834 3 жыл бұрын

Good analysis and build up of improvements. Thanks!

@MultiRick15 2 жыл бұрын

Wow! Great explanation.

@marazDNG 2 жыл бұрын

Great video man!

@ke30_ 4 жыл бұрын

I love this so much

@farooqseeru948 6 жыл бұрын

Brillant. Clear explanation.

@Grimlor 7 жыл бұрын

I've found this so useful! Thank you for this video. By analyzing my code and applying a little tweak, I've already managed to save 0.8 seconds of runtime. And I've only just started! :D

7 жыл бұрын

+Grimlor glad to hear it!

@DaanWaardenburg 4 жыл бұрын

Keep coming back here when my code starts running slow :P

@nopo_b3645 4 жыл бұрын

Yeah I can imagine :-) First time here. Coming back to remember yourself that errr... yeah ... why on earth does it take so long and frustrates me ... how am I ever going to find out wth this code is slow. Is it me or is it some crazy circumstance that goes on in my libraries that I use. While waiting for your code to finnish you can actually study the problems and get them fixed. But I do think not by hand just by profilers

@15kasturi Жыл бұрын

I just subscribed you by watching this video, very informative and nice goggles!

@danyalt8221 2 жыл бұрын

It Was Great! Thank You.

@hrithiksharma2047 4 жыл бұрын

Great tut bro! Thank you

@arjunkirpal9776 6 жыл бұрын

Thank you Sebastiaan! Would love more Asyncio videos!

@onlymusic2005 4 жыл бұрын

Real treasure... bunch of thanx

@mahesh_kok 5 жыл бұрын

This guy is crazy in coding , concepts and thinking...he brought down the execution time from 6 sec to .002 sec......this is insane ... tremendous work done bro...

@thecaveofthedead 6 жыл бұрын

Excellent tutorial. Thanks.

@migovas1483 5 жыл бұрын

This was great and clear, right to the point!!

@xanterx 4 жыл бұрын

Love your shades 🤘

@rgrapey 7 жыл бұрын

Clear and informative!

@shivan2418 4 жыл бұрын

In case anyone in the future reads this I found that this method executes even faster than the method he ended up with. from collections import Counter def find_duplicate_words_counter(src='movies.txt'): return [movie for movie, count in Counter([movie.lower() for movie in read_movies(src)]).items() if count>1]

@Alister222222 3 жыл бұрын

Was going to post this as well - converting the list into a Counter (e.g. a special dict type from the collections module) and running a comprehension to get back everything that had a count above 1 does seem to be the cleanest way to get to the solution, and I am pleased it is also the fastest!

@droit19 2 жыл бұрын

@@Alister222222 - I tried this and was 0.13 seconds faster or 44% faster than the Zip method

@acho8387 4 жыл бұрын

very good video! thanks!

@pygemssoftware4254 2 жыл бұрын

Great work and explanation. I would like to email my eyes to you as token of my appreciation😃

@nutcrackeroverdrive 7 жыл бұрын

Thanx, Sebastiaan, very useful and helpful video.

@babuasian 6 жыл бұрын

Appreciate it. Really useful for most of the programs..

@ЕвгенийТитов-и9ю 4 жыл бұрын

This is super nice video, thank you sir

@botenbireu7875 7 жыл бұрын

Thank you a lot! very clear explanation!!!

@deividaspelakauskas9394 4 жыл бұрын

Underrated.

@haonanqiu4251 3 жыл бұрын

thanks a lot!

@jeremyalvaprathama4069 4 жыл бұрын

Awesome work! I just subscribed

@benedictcoltman1983 4 жыл бұрын

Superb! Thanks

@drewduncan5774 6 жыл бұрын

12:54 Quadratic, not exponential.

5 жыл бұрын

NO it's in O(n*ln(n)) because of the Sort()

@stephenaiesi6073 5 жыл бұрын

With Big O notation we are really ony concerend with the term with the highest power. An algorithm on the order of O(3x² + 2x + 11) is usually reduced down to to O(3x²). I've seen books drop the coefficient as well but that has a fairly large impact on the accuracy of the expression in my opinion. So in terms of Big-O, an algorithm on the order of a quadratic equation is usually considered to be on the order of its highest term. If you think about comparing two algorithms, one operating at O(3x² + 2x + 11) to one that runs at O(3x²), let's see how different they really are: So given the following equations: f(x) = 3x² + 2x + 11 g(x) = 3x² Let's see how they correspond given a single input (n=1) f(1) = 16 g(1) = 3 The ratio between these two results is 5.33 and would go to show that quadratic and exponential are not swappable in this context Now lets scale it to 100 inputs, n=100 f(100) = 30211 g(100) = 30000 Now they are operating at a ratio of 1.007. Not identical, but damn near close dependng on the precision needed. In terms of making algorithms efficient with computers 100 inputs is not considered much anyways. Now let's scale it to 1,000,00 inputs f(1,000,000) = 3000002000011 g(1,000,000) = 3000000000000 Ratio of 1.00000066 The difference in comparing these two with without the extra terms is often negligible when comparing them to algorithms on the order of a different exponential power. Run the same exeriment with comparing f(x²) and g(x³), with and without extra quadatic terms and you can see that dropping the lower terms, though not exact, is definitely enough to compare the efficiency of algorithms. So as the size of the inputs grows, paritculary towards quantities where optimization is necessary, we are usually dealing with such vast amounts of data that including the lower terms of the quadratic formula in our assessment of an algorithms efficiency does not necessarily provide extra insight. ps: i'm fully aware this isn't the case in every domain, but it is for the most part how it is done and definitely applies to the kinds of problems in this video.

@_treed1 5 жыл бұрын

Lol these comments. It's a loop in a loop which is n * n so n^2 tadah

@__gavin__ 4 жыл бұрын

@@stephenaiesi6073 > I've seen books drop the coefficient as well but that has a fairly large impact on the accuracy of the expression in my opinion. Big O notation has a formal mathematical definition. A function f(x) is said to be O(g(x)) if |f(x)| =x_0, where A and x_0 are some constant values. Hence, when considering big O notation, it really doesn't matter if you drop the 3 or not. If f(x) = O(3x^2) then all we are saying is that there exists some x_0 such that for all x>=x_0, |f(x)|

@calebmunuru3598 4 жыл бұрын

Stephen Aiesi Thanks mate. This is a really good explanation

@kpespinosa 5 жыл бұрын

great explanation! cheers

@blanky_nap 7 жыл бұрын

Great video!

@TheFilipo2 6 жыл бұрын

Thank you, this was super helpful!

6 жыл бұрын

Good to hear!

@siddharthindora7182 4 жыл бұрын

Great Video...Thanks for explanation :)

@Jack-4242 4 жыл бұрын

Thank you, this helped me so much :)

@mohammedgt8102 2 жыл бұрын

Awesome video.

@AmrXcellent 4 жыл бұрын

Good video but If I understand correctly the final change in code change does not account for a movie title that is duplicated more than once. So the first two iterations of the code are doing more functionality. All in all nice video, I learned something new watching it. so thank you for that.

4 жыл бұрын

That's correct: triple duplicates are not caught with this method. And thank you!

@deadman1999 2 жыл бұрын

yes, I was thinking the same thing, the final code was so fast because it only checked its 1st neighbor, taking into account that there were only 1 duplicate.

@anumsheraz4625 4 жыл бұрын

is there any tool to identify how much memory is consumed by the code ?

@АлексейТрофимов-ф5у 2 жыл бұрын

thanks!

@svalaboj 4 жыл бұрын

your video is very useful, thanks for the same.

@bunlonglay463 3 жыл бұрын

Hey, shouldn't be your last solution, where you sort the movies list, O(n log(n)) and not as you said O(n)?. Sorting the movies list takes O(nlog(n)) time. Also when you use zip with slices of the movies arrays, copies of movies are created. This is also inefficient. Could someone maybe confirm what I said? Anyway, great video explaining the profiler

@fuanka1724 6 жыл бұрын

Loved this. Optimization is really important to me. Thanks.

@dhananjaykansal8097 5 жыл бұрын

YOU ARE JUST AWESOMEEEEEE

@alvaromartin6301 5 жыл бұрын

Excelente Content! New sub.

@sashkazayebashka 6 жыл бұрын

Great video/ Thank you man!

@parietal100 7 жыл бұрын

Thank you Sebastian

@mariusnorheim 6 жыл бұрын

Hi Sebastian, I tried running the profile decorator, but 1) I'm using python 2.7 and 2) I'm running it in atom, not jupyter, so I get an error message. Would be awesome if you could post the code for python 2.7 as well in the file

@mariusnorheim 6 жыл бұрын

Actually got it to work. It seems that you'll have to encode the Unicode strings to byte strings, and use io.BytesIO, instead of io.StringIO.

@BullishBuddy 3 жыл бұрын

👍👍

@vyl6781 5 жыл бұрын

Saved my sanity.

@neelojp8460 7 жыл бұрын

thank you so much for your videos they are really very helpful! Do you have any own books about python ?

7 жыл бұрын

+post fix Thank you! No, I'm afraid that I do not have any Python books myself. But there are plenty of good free Python books out there, such as Byte of Python.

@neelojp8460 7 жыл бұрын

thank you for your answer, dank je wel :-).... you should wirte one about the tricks which you show us here... and here is the link for the Byte of Python for all others: www.gitbook.com/book/swaroopch/byte-of-python/details

@emasmach 5 жыл бұрын

Nice. Excelent.

@alishermatkurbanov9205 6 жыл бұрын

What if the list has more than 1 duplicate, e.g. [1, 3, 1, 4, 1, 4, 4, 5] -> sorted [1, 1, 1, 3, 4, 4, 4, 5] -> zipped smth like this [(1, 1), (1, 1), (1, 3), (4, 4), (4, 4), (4, 5)], so 1 and 4 will be added to duplicates twice. Doesnt duplicates should be list with unique items?

6 жыл бұрын

That's correct. Triplicates will end up twice in the list of duplicates, which may not be what you want. An easy trick to get around that would be to use a set comprehension (kzbin.info/www/bejne/q4W4h2WbhLOkibM), rather than a list comprehension. Because sets by definition consist of unique items.

@luciano_remes 3 жыл бұрын

Your last solution runs in NlogN time complexity, but you could actually make it faster by just using a set of found movies. It would run in Linear time and be way simpler: found = set() duplicates = [] for movie in movies: if movie not in found: found.add(movie) else: duplicates.append(movie)

@fantasdeck 2 жыл бұрын

I like how you edited your video to hide the little typo you made. But, cool tool. Will be using...

@adityakushwaha3654 4 жыл бұрын

But how do you know which code will be more efficient wrt present code ?

@graycybermonk3068 5 жыл бұрын

You will kill me. Really Awesome.

@norwegiandud 3 жыл бұрын

Helpful video, thanks! Just one 🐛 with the 007-method (or weird feature). If there are movies that are represented more than two times they appear as duplicates in the duplicates list. E.G. 'the phantom of the opera' appears five times in the TXT file, and four times in the list of duplicates. Now if this is a 🐛 or feature ... depends on who you ask.

@chunceywei8284 7 жыл бұрын

Thank you

@vanglequy7844 4 жыл бұрын

13:30 Who else pause the video and challenge yourselves? But beware of the tendency to jump into redesign the solution before profiling.

@Excess-qn7qh 4 жыл бұрын

does the @profile annotaion only work with jUpiter?

@Jure1234567 3 жыл бұрын

Can I do it with wxwidgets classes and multithreading?

@nikithar3628 6 жыл бұрын

Awesome

@IsmaelRDeMelo 2 жыл бұрын

"Well, you can see that our code it's taking about 0.00023 to execute. But if you're not satisfied with that..." lmao

@ikramu5719 5 жыл бұрын

Thank you for that explanation. Neat solution with the zip and slices too! ps The link for the movies file is now out of date though.

@deepak1725 6 жыл бұрын

Very Very nice

@yildirimicen766 2 жыл бұрын

Hi Mr. Mathot, how about the following with "combinations" (you can even omit "movies.sort()"): from itertools import combinations # find duplicates in list of movies movies = ['abc','abc','xyz','ddddd','ddddd','star wars'] print([m for m,n in combinations(movies, 2) if m==n])

@fcoignmo 6 жыл бұрын

Where did you get the "movies.txt" file (link)? Thank you for the vide, great work.

6 жыл бұрын

My reply is a bit late, but I got this data from here: osf.io/r73y9/

@pushpendrasingh1819 6 жыл бұрын

Bro it would give duplicates result if our file is containing movies that are there in file more than 2. Ex. movie.txt Hello Hello Hello then your code will print hello atleast twice.

@sailalmishra4860 5 жыл бұрын

Hey, Hope you understand this is for demonstration purpose. Would be good to concentrate on the technique rather than logic. u can probably go about refining the logic.. A solution is to use set and then extract elements with more than 1 occurance.

@Memfis0 5 жыл бұрын

you can use this: `duplicates = [name for name, count in Counter(movies).items() if count > 1]` instead of that zip zip method also this one doesn't require the list to be sorted remember to `from collections import Counter`

@vaibhavjain1914 3 жыл бұрын

Bruh in this video you are teaching code optimization but looking at your choice of wearable I feel I am learning how to assassinate enemy but amazing video 😀

@7aygames35 3 жыл бұрын

The 22 people who disliked are those who were writing bad code and when it was pointed out to them, they just got angry

@naughtybuddha3942 4 жыл бұрын

Where is the movies.txt? Please provide it, thanks.

@ВикторДзеба 2 жыл бұрын

May you give us the movies.txt file please???

@DragonRazor9283 3 жыл бұрын

from 6 seconds to 0.007 seconds wow!

@xspager 6 жыл бұрын

Awesome explanation but when you removed the function you also changed the way you do the searching, you stopped looping over all the movies and used the "in" operator