Effective Pandas I Matt Harrison I PyData Salt Lake City Meetup

  Рет қаралды 70,645

PyData

PyData

Күн бұрын

Пікірлер: 49
@flipside5482
@flipside5482 10 ай бұрын
This man is a living data legend. Mass respect.
@MartyAckerman310
@MartyAckerman310 2 жыл бұрын
I'm not exaggerating when I say this video changed my life. I went from a guy who did everything upstream in SQL and grudgingly used Pandas to a guy who uses Pandas for everything. The approach Matt demonstrates also translates generally to PySpark. I'm now considered the go-to guy for Pandas and PySpark code in my department. There's so much bad code around, often written by people with advanced degrees and MATLAB experience it seems. I could make a full time job out of cleaning up bad code. Dot chain FTW!
@mattharrison721
@mattharrison721 Жыл бұрын
Thanks! Glad to help.
@amilkyboi
@amilkyboi Жыл бұрын
Heh, MATLAB and bad coding practices - the two are never far from one another it seems.
@AkashRana1111
@AkashRana1111 2 жыл бұрын
This is gold! Matt did an amazing job showing best practices when using pandas and a lot of intuition about how pandas function run under the hood.
@DavidDobr
@DavidDobr 2 жыл бұрын
90 minutes of pure gold. Thanks Matt!
@mattharrison721
@mattharrison721 2 жыл бұрын
Thanks David. 👍🙏 Make sure you check out my book, Effective Pandas, if you appreciated this.
@scottlucas232
@scottlucas232 2 жыл бұрын
AGREE COMPLETELY ! FANTASTIC PRESENTATION ! Learned more here than in past two years
@ninhluong5004
@ninhluong5004 2 жыл бұрын
This is easily the best pandas guide I have ever watched so far.
@mattharrison721
@mattharrison721 Жыл бұрын
Thank you!
@gregorywpower
@gregorywpower Жыл бұрын
I can’t wait for you to give another talk on polars!
@bendirval3612
@bendirval3612 2 жыл бұрын
This was a ridiculously useful video. I feel like I've watched a lot of python videos, but I think this might be the most practically useful for people who are not brand new to pandas--who use it all the time.
@annagora6409
@annagora6409 Жыл бұрын
Matt, big thank you for chaining idea!
@johannes-euquerofalaralema4374
@johannes-euquerofalaralema4374 2 жыл бұрын
By far the best pandas video I have ever seen
@nickhodgskin
@nickhodgskin 2 жыл бұрын
Really interesting talk, was doubtful about chaining at first but you have converted me :) . A very very informative talk, thanks
@mattharrison721
@mattharrison721 2 жыл бұрын
Thanks for coming around Nick. 😉 Hope you find these techniques useful to you.
@erginceyhan
@erginceyhan 2 жыл бұрын
Great presentation. As others said pure gold. If there is button called pure gold I would have clicked it. A simple like is not enough. It also changed my view of code organization. Thanks for sharing.
@firefoxmetzger9063
@firefoxmetzger9063 3 жыл бұрын
1:18:00 For the specific question being asked (find duplicates in a primary key) there is a much simpler solution than what Matt Harrison suggested: df.duplicated("primary_key", keep=False). It will select all rows with non-unique values in the "primary_key" column, i.e., all the rows that are duplicated. Matt solves the more general problem of "find all rows for which the element in primary_key occurs at least N times". A more concise (though perhaps less readable) solution to this would be something like (df [df.primary_key.value_counts()[df.primary_key].reset_index().primary_key > N] )
@kernel2006
@kernel2006 2 жыл бұрын
An alternative to your approach is to use .transform() with .groupby(), to act effectively like a SQL window function that counts the primary keys, but whose result is the same length as the original data (rather than being collapsed due to aggregation). Something like: num_dups = df.groupby('key')['key'].transform('size') # has same index as df df.loc[num_dups > N]
@rephechaun
@rephechaun 2 жыл бұрын
This is mind blowing... Thank you very much!
@aoihana1042
@aoihana1042 2 жыл бұрын
This tutorial had so many gems! Thanks Matt
@elidrissii
@elidrissii 2 жыл бұрын
Here from your HN comment. Super informative.
@NearLWatson
@NearLWatson 2 жыл бұрын
I was looking how to speed up my pandas operations since I read Python itself is faster than R and pandas should be faster than python, i am happy i came here. Excellent tips that I am going to experiment and hopefully achieve a quicker output time. Excellent session nevertheless.
@whkoh7619
@whkoh7619 2 жыл бұрын
Thanks Matt, this was an incredible presentation. Came here from the Real Python podcast, just bought the book too!
@mattharrison721
@mattharrison721 Жыл бұрын
Thanks for your support
@ioannisnikolaospappas6703
@ioannisnikolaospappas6703 2 жыл бұрын
Thanks for the wonderful pandas insights matt and pydata!
@santchev1326
@santchev1326 2 жыл бұрын
Really interesting, many thanks to Matt and Pydata :)
@hazemmosaad3440
@hazemmosaad3440 2 жыл бұрын
Really interesting and informative talk. Thanks
@FRANKWHITE1996
@FRANKWHITE1996 2 жыл бұрын
Thanks for sharing ❤
@bullbranch
@bullbranch 2 жыл бұрын
Excellent Pandas best practices video. I was already a big user of chaining but for some reason hadn't used append much. This is a cleaner way to do things and I will be using it. My next notebook is going to be much easier to maintain and much easier to build. Thanks Matt!
@mattharrison721
@mattharrison721 Жыл бұрын
Awesome. Thanks
@antecavlina8897
@antecavlina8897 2 жыл бұрын
just a tip: at 48:30 when commenting line by line upwards you could point with mouse at desired line, then press (i think) ALT and keep pressed, pointer might switch to a thin lined cross, then drag with mouse pointer up or down the lines and then insert # its like doing block comment... still looking for a way to do that without mouse, but not sure to use sth like vim extension, if there is one...
@samplaying4keeps
@samplaying4keeps 2 жыл бұрын
Thank you for this! This is super helpful. I learned so much!
@Davidkiania
@Davidkiania 2 жыл бұрын
I really love this session and it’s completely changed the way I process data going forward. Thanks a lot !
@grumpy_techo
@grumpy_techo 2 жыл бұрын
Thanks for you 'rant' Matt - have your recent books and still realised something that I should be doing with my data. 👌
@mattharrison721
@mattharrison721 2 жыл бұрын
Thanks Tyrone! Good luck with your Pandas. 😉🐼
@dragangolic6515
@dragangolic6515 Жыл бұрын
Great video, I need this data set. Where can I find it?
@jongcheulkim7284
@jongcheulkim7284 Жыл бұрын
Thank you
@abimaeldominguez4126
@abimaeldominguez4126 2 жыл бұрын
I have a problem with aggregations, sometimes if you aggregate two columns and one column has a cell with a NaN .groupby will ignore it, I know you can keep those NaNs, but I would like to see a use case when is good idea to keep NaNs while using a .groupby and when is not a good idea.
@tariqaziz1795
@tariqaziz1795 2 жыл бұрын
Sir the apply method gave me error such as unhashable series. How to fix that?
@mischaminnee
@mischaminnee 2 жыл бұрын
Awesome!
@pmiron
@pmiron Жыл бұрын
Can someone identify the font he uses in Jupyterlab ? :D
@JimmieChoi93
@JimmieChoi93 6 ай бұрын
'Lato' I guess
@pmiron
@pmiron 6 ай бұрын
@@JimmieChoi93 I just tried and I don't think it is Lato.
@JimmieChoi93
@JimmieChoi93 6 ай бұрын
@@pmiron damn. Here's an idea, screenshot it to ChatGPT and ask
@pmiron
@pmiron 6 ай бұрын
@@JimmieChoi93 haha I actually did try with some screenshots. It recognizes that is a notebook and a monospace font but then suggest it might be the default JupyterLab font or Consolas, Menlo, etc. Also tried WhatTheFont and FontSquirrel with no luck.
@joecookieee
@joecookieee 2 жыл бұрын
ty for the video matt this is awesome can you explain how u got those numbers @ 57:30 -- 6_220 / 125 Thank you!
@walkingintopeople
@walkingintopeople 2 жыл бұрын
235.215 is a ratio between mpg and l/100km. It's a constant the presenter looked up on a search engine ahead of time
SDS 557: Effective Pandas - with Matt Harrison
1:29:19
Super Data Science: ML & AI Podcast with Jon Krohn
Рет қаралды 22 М.
Cat mode and a glass of water #family #humor #fun
00:22
Kotiki_Z
Рет қаралды 24 МЛН
Quilt Challenge, No Skills, Just Luck#Funnyfamily #Partygames #Funny
00:32
Family Games Media
Рет қаралды 54 МЛН
99.9% IMPOSSIBLE
00:24
STORROR
Рет қаралды 27 МЛН
Top Five Tricks for Coding in Pandas - with Matt Harrison
20:26
Super Data Science: ML & AI Podcast with Jon Krohn
Рет қаралды 10 М.
681: XGBoost: The Ultimate Classifier - with Matt Harrison
1:09:56
Super Data Science: ML & AI Podcast with Jon Krohn
Рет қаралды 6 М.
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
1000x faster data manipulation: vectorizing with Pandas and Numpy
26:39
Matt Harrison - Keynote PyCon Colombia 2024
51:08
PyCon Colombia
Рет қаралды 419
Tutorial: Idiomatic Pandas by Matt Harrison
2:24:16
Six Feet Up
Рет қаралды 10 М.