I'm not exaggerating when I say this video changed my life. I went from a guy who did everything upstream in SQL and grudgingly used Pandas to a guy who uses Pandas for everything. The approach Matt demonstrates also translates generally to PySpark. I'm now considered the go-to guy for Pandas and PySpark code in my department. There's so much bad code around, often written by people with advanced degrees and MATLAB experience it seems. I could make a full time job out of cleaning up bad code. Dot chain FTW!
@mattharrison721 Жыл бұрын
Thanks! Glad to help.
@amilkyboi Жыл бұрын
Heh, MATLAB and bad coding practices - the two are never far from one another it seems.
@AkashRana11112 жыл бұрын
This is gold! Matt did an amazing job showing best practices when using pandas and a lot of intuition about how pandas function run under the hood.
@DavidDobr2 жыл бұрын
90 minutes of pure gold. Thanks Matt!
@mattharrison7212 жыл бұрын
Thanks David. 👍🙏 Make sure you check out my book, Effective Pandas, if you appreciated this.
@scottlucas2322 жыл бұрын
AGREE COMPLETELY ! FANTASTIC PRESENTATION ! Learned more here than in past two years
@ninhluong50042 жыл бұрын
This is easily the best pandas guide I have ever watched so far.
@mattharrison721 Жыл бұрын
Thank you!
@gregorywpower Жыл бұрын
I can’t wait for you to give another talk on polars!
@bendirval36122 жыл бұрын
This was a ridiculously useful video. I feel like I've watched a lot of python videos, but I think this might be the most practically useful for people who are not brand new to pandas--who use it all the time.
@annagora6409 Жыл бұрын
Matt, big thank you for chaining idea!
@johannes-euquerofalaralema43742 жыл бұрын
By far the best pandas video I have ever seen
@nickhodgskin2 жыл бұрын
Really interesting talk, was doubtful about chaining at first but you have converted me :) . A very very informative talk, thanks
@mattharrison7212 жыл бұрын
Thanks for coming around Nick. 😉 Hope you find these techniques useful to you.
@erginceyhan2 жыл бұрын
Great presentation. As others said pure gold. If there is button called pure gold I would have clicked it. A simple like is not enough. It also changed my view of code organization. Thanks for sharing.
@firefoxmetzger90633 жыл бұрын
1:18:00 For the specific question being asked (find duplicates in a primary key) there is a much simpler solution than what Matt Harrison suggested: df.duplicated("primary_key", keep=False). It will select all rows with non-unique values in the "primary_key" column, i.e., all the rows that are duplicated. Matt solves the more general problem of "find all rows for which the element in primary_key occurs at least N times". A more concise (though perhaps less readable) solution to this would be something like (df [df.primary_key.value_counts()[df.primary_key].reset_index().primary_key > N] )
@kernel20062 жыл бұрын
An alternative to your approach is to use .transform() with .groupby(), to act effectively like a SQL window function that counts the primary keys, but whose result is the same length as the original data (rather than being collapsed due to aggregation). Something like: num_dups = df.groupby('key')['key'].transform('size') # has same index as df df.loc[num_dups > N]
@rephechaun2 жыл бұрын
This is mind blowing... Thank you very much!
@aoihana10422 жыл бұрын
This tutorial had so many gems! Thanks Matt
@elidrissii2 жыл бұрын
Here from your HN comment. Super informative.
@NearLWatson2 жыл бұрын
I was looking how to speed up my pandas operations since I read Python itself is faster than R and pandas should be faster than python, i am happy i came here. Excellent tips that I am going to experiment and hopefully achieve a quicker output time. Excellent session nevertheless.
@whkoh76192 жыл бұрын
Thanks Matt, this was an incredible presentation. Came here from the Real Python podcast, just bought the book too!
@mattharrison721 Жыл бұрын
Thanks for your support
@ioannisnikolaospappas67032 жыл бұрын
Thanks for the wonderful pandas insights matt and pydata!
@santchev13262 жыл бұрын
Really interesting, many thanks to Matt and Pydata :)
@hazemmosaad34402 жыл бұрын
Really interesting and informative talk. Thanks
@FRANKWHITE19962 жыл бұрын
Thanks for sharing ❤
@bullbranch2 жыл бұрын
Excellent Pandas best practices video. I was already a big user of chaining but for some reason hadn't used append much. This is a cleaner way to do things and I will be using it. My next notebook is going to be much easier to maintain and much easier to build. Thanks Matt!
@mattharrison721 Жыл бұрын
Awesome. Thanks
@antecavlina88972 жыл бұрын
just a tip: at 48:30 when commenting line by line upwards you could point with mouse at desired line, then press (i think) ALT and keep pressed, pointer might switch to a thin lined cross, then drag with mouse pointer up or down the lines and then insert # its like doing block comment... still looking for a way to do that without mouse, but not sure to use sth like vim extension, if there is one...
@samplaying4keeps2 жыл бұрын
Thank you for this! This is super helpful. I learned so much!
@Davidkiania2 жыл бұрын
I really love this session and it’s completely changed the way I process data going forward. Thanks a lot !
@grumpy_techo2 жыл бұрын
Thanks for you 'rant' Matt - have your recent books and still realised something that I should be doing with my data. 👌
@mattharrison7212 жыл бұрын
Thanks Tyrone! Good luck with your Pandas. 😉🐼
@dragangolic6515 Жыл бұрын
Great video, I need this data set. Where can I find it?
@jongcheulkim7284 Жыл бұрын
Thank you
@abimaeldominguez41262 жыл бұрын
I have a problem with aggregations, sometimes if you aggregate two columns and one column has a cell with a NaN .groupby will ignore it, I know you can keep those NaNs, but I would like to see a use case when is good idea to keep NaNs while using a .groupby and when is not a good idea.
@tariqaziz17952 жыл бұрын
Sir the apply method gave me error such as unhashable series. How to fix that?
@mischaminnee2 жыл бұрын
Awesome!
@pmiron Жыл бұрын
Can someone identify the font he uses in Jupyterlab ? :D
@JimmieChoi936 ай бұрын
'Lato' I guess
@pmiron6 ай бұрын
@@JimmieChoi93 I just tried and I don't think it is Lato.
@JimmieChoi936 ай бұрын
@@pmiron damn. Here's an idea, screenshot it to ChatGPT and ask
@pmiron6 ай бұрын
@@JimmieChoi93 haha I actually did try with some screenshots. It recognizes that is a notebook and a monospace font but then suggest it might be the default JupyterLab font or Consolas, Menlo, etc. Also tried WhatTheFont and FontSquirrel with no luck.
@joecookieee2 жыл бұрын
ty for the video matt this is awesome can you explain how u got those numbers @ 57:30 -- 6_220 / 125 Thank you!
@walkingintopeople2 жыл бұрын
235.215 is a ratio between mpg and l/100km. It's a constant the presenter looked up on a search engine ahead of time