25 Nooby Pandas Coding Mistakes You Should NEVER make.

  Рет қаралды 281,428

Rob Mulla

Rob Mulla

Күн бұрын

Пікірлер: 513
@viewsfromthechris7810
@viewsfromthechris7810 2 жыл бұрын
I need to implement the chaining methods and using functions into what I do, much easier to use and read. Great video as always.
@robmulla
@robmulla 2 жыл бұрын
Totally. Just those two things alone are huge! Glad you enjoyed the video.
@akmalmir8531
@akmalmir8531 2 жыл бұрын
00:18 #1. Writing into csv with unnecessary index 00:53 #2. Using column names which include spaces 01:25 #3. Filter dataset like a PRO with QUERY method 01:44 #4. query strings with(@ symbol) to easily reach variables 02:07 #5. "inplace" method could be removed in future versions, better explicitly overwrite modifications 02:35 #6. better Vectorization instead of iteration 03:01 #7. Vectorization method are preferable than Apply method 03:30 #8. df.copy() method 04:08 #9. chaining formulas is better than creating many intermediate dataframes 04:28 #10. properly set column dtypes 05:01 #11. using Boolean instead of Strings 05:25 #12. pandas plot method instead of matplotlib import 05:45 #13. pandas str.upper() instead apply and etc 06:10 #14. use data pipeline once instead of repeating many times 06:41 #15. learn proper way of renaming columns 06:59 #16. learn proper way of grouping values 07:31 #17. proper way of complex grouping values 08:01 #18. percent_change or difference now could be implemend with function 08:25 #19. save time and space with large datasets with pickle,parquet,feather formats 08:58 #20. conditional format in pandas(like in Microsoft Excel) 09:22 #21. use suffixes while merging TWO dataframes 09:48 #22. check merging is success with validation 10:13 #23. wrapping expression so they are readable 10:33 #24. categorical datatypes use less space 10:55 #25. duplicating columns after concatenating, code snippet
@robmulla
@robmulla 2 жыл бұрын
Thanks for making this!
@akmalmir8531
@akmalmir8531 2 жыл бұрын
@@robmulla i wish i commented better as English is not my native language, Thank You for bringing us Valuable Tutorials that saves us our time and energy! I wish i helped and learned from you more
@kongson14
@kongson14 2 жыл бұрын
egg bro
@PGhai
@PGhai 2 жыл бұрын
thanks, I like no 4
@bonumonu5534
@bonumonu5534 2 жыл бұрын
This needs to be pinned
@vishnurj6207
@vishnurj6207 Жыл бұрын
Please keep doing this. No additional jargon, crisp, straight to the point explanations are what are required. No body needs a 10 hour tutorial. Thank you for this.
@robmulla
@robmulla Жыл бұрын
I'll try my best! I do like trying to cram a ton of information into a short format, but these videos take a while to create. I totally copied the format from mmcoding (check out the channel if you haven't already)
@jfjfjfjf-v3k
@jfjfjfjf-v3k 2 жыл бұрын
Matt Harrison's "Effective Pandas: Patterns for Data Manipulation" is one of the best resources I've read on idiomatic pandas.
@robmulla
@robmulla 2 жыл бұрын
I really need to get myself a copy! He knows his stuff for sure.
@MrEo89
@MrEo89 2 жыл бұрын
He has a great video (series?) on effective pandas also!
@lord_voldemort44
@lord_voldemort44 9 ай бұрын
ty i will look into this book
@jelmermulder7276
@jelmermulder7276 2 жыл бұрын
I thought I was pretty good in Pandas, but you gave me so many new things to improve. HUGE thank you!
@robmulla
@robmulla 2 жыл бұрын
Glad I could help! I'm constantly learning better ways to do things in pandas myself.
@olyaagapova989
@olyaagapova989 2 жыл бұрын
I was thinking that I was pretty bad, but surprisingly I usually only make 2 mistakes from the video (which is a cool chance to improve). I just love such videos because not only they help to improve your skills, but also to be realistic about your expectations and ambitions. Thanks for the video, Rob!
@DeadLine171
@DeadLine171 2 жыл бұрын
I have been working 2 years now with pandas and I can strongly affirm that I have made like 70% of those bad practices, appreciate a lot your video!
@robmulla
@robmulla 2 жыл бұрын
Thanks for commenting. Honestly I still make many of them to this day.
@kaymaqsood8920
@kaymaqsood8920 Жыл бұрын
Rob, thank you for all the time and energy you have put in for us. Would appreciate an updated video on "Exploratory Data Analysis" may be expanding on your year old one. Thank you again!
@realninja357
@realninja357 Жыл бұрын
One of the best videos I've seen on Pandas! So glad someone prominent enough is advocating for method chaining and pandas methods!
@realninja357
@realninja357 Жыл бұрын
The 'Query' method in particular is relatively unknown. In conjunction with not using 'snake case' this leads to beginners being very inefficient at code due to not being able to use dot syntax I am just an intermediate level so I can relate to many of these mistakes. It goes as deep as university however. They do not teach clean, efficient code at all!
@robmulla
@robmulla Жыл бұрын
Glad you enjoyed it! I confess I don't use chaining nearly as much as I should.
@ladiesperfume
@ladiesperfume 2 жыл бұрын
Wow dude! You are single handedly responsible for my data science growth. PLEASE keep making more of these videos I really appreciate it.
@robmulla
@robmulla 2 жыл бұрын
Wow! I love hearing feedback like this. I'll keep making videos if you all keep watching! :D
@emily2e2e
@emily2e2e Жыл бұрын
This is awesome, I’ve been wanting to know what are the better ways to write my code and why. Please continue to make these videos.
@robmulla
@robmulla Жыл бұрын
Wow! Thanks so much Emily. Really apprecaite the feedback and super thanks!
@magisterumbrae
@magisterumbrae 2 жыл бұрын
This can be, some of my first times commenting in youtube after years of usage. This video was INCREDIBLY USEFUL! There's a lot of my previous team members did on scripts and sometimes are complicated to maintain or create new ones following the same logic. This covers exactly what they used and what is the best option to rewrite it and make it more understandable. Thank you so much for this godly information.
@robmulla
@robmulla 2 жыл бұрын
You're very welcome! I really appreciate the positive feedback. I’ll try to keep making helpful videos like this. Share with your friends in the meantime!
@NERGYStudios
@NERGYStudios Жыл бұрын
Learned more about Pandas in this video than a whole many videos worth hours combined. Seriously, thank you.
@leaky3955
@leaky3955 5 ай бұрын
I had no experience with Pandas before joining a team where I need to work with it a lot. Have been learning as I go and it feels like the perfect time to see this video. I have enough time under my belt to have made or inherited code with many of these mistakes. With that context, I absorbed so much from what you shared. Thank you for helping me improve. I’m excited to refactor and apply what I learned!
@JustCrateIt
@JustCrateIt Жыл бұрын
I can't believe how good this video is. I love your no-nonsense delivery; I don't have time at work to watch a 4-hour "intro" video. Keep it up!
@digitsphinx
@digitsphinx Жыл бұрын
oh wow the quality and clarity is worth subscribing! thank you !
@singsinghai1505
@singsinghai1505 2 жыл бұрын
The pandas query function does not outperform the loc method. In fact, it is sometimes much slower when your query/data is so big. We industry users will utilize the loc method for quick EDA. Query might be useful when you have a scheduled cron
@robmulla
@robmulla 2 жыл бұрын
Yea. Query isn’t for speed of processing but speed of writing the code.
@alexanderreznik1700
@alexanderreznik1700 2 жыл бұрын
I used the Pandas lib more then 2 years, but today I learned something new! Thank you, man!
@robmulla
@robmulla 2 жыл бұрын
Glad you learned something new! Share with anyone else you think might appreciate it!
@philwebb59
@philwebb59 2 жыл бұрын
1:28 Before I discovered your videos, I'd never considered using the query method. The examples I've previously seen online made it look like a me-too add-on for seasoned SQL users. Using conditionals to mask off rows seemed just as easy and more pythonic. Also, at work, I typically filter with a script when I pull down the data, so by the time I get the data into pandas, I just need to tweak. But, you've shown me the light. Thanks!
@robmulla
@robmulla 2 жыл бұрын
I totally understand where you are coming from. Its important to keep in mind query can be slower, but for quick filtering it can be really quick and clean way to filter data. It really depends on what I'm doing. Glad I showed you something new though!
@TravisGore-ep4yk
@TravisGore-ep4yk Жыл бұрын
I can't believe I watched this whole video and only 2 of them were things I didn't know about! Thank you for sharing!
@omagodourado
@omagodourado 2 жыл бұрын
This video made me realize i have still a long road ahead in Pandas. Thanks! Just subscribed ;D
@robmulla
@robmulla 2 жыл бұрын
Thanks for the sub! We all start somewhere, but you'll pick it up quickly in no time.
@FrocketGaming
@FrocketGaming 2 жыл бұрын
This video rocked me. I've been using python for a few months and watching this video made me bust out my laptop so I could try all of these items out. Thank you for this.
@robmulla
@robmulla 2 жыл бұрын
So glad you found it helpful. Share with a friend!
@julsmanbr8152
@julsmanbr8152 Жыл бұрын
Awesome stuff. I've been using pandas for over 4 years, but it never occurred me to start using the query method instead of loc (despite me finding it tiresome to keep repeating "df" all over the place when using loc). I also appreciate the quick format. You see KZbinrs taking too long to say nothing at all, so congrats on actually going through 25 tips in 10 minutes. You got yourself a sub!
@popnitro
@popnitro 6 ай бұрын
I've had little to no formal training. These tips are amazing and concise. Thank you so much.
@DataCraftsman
@DataCraftsman 2 жыл бұрын
I feel personally attacked. Thanks so much for releasing this. I knew my code was bad, but not THIS bad.
@robmulla
@robmulla 2 жыл бұрын
Haha. With coding we all are learning and getting better every day. Me included. Thanks for watching!
@alberttu8120
@alberttu8120 2 жыл бұрын
These are fantastic refactoring suggestions.
@ryantakers
@ryantakers 2 жыл бұрын
I'm currently working on my first major pandas project and I reckon that I may have done around 15/25 of these 'mistakes'. Looks like I have some optimisation to do over the coming days!
@robmulla
@robmulla 2 жыл бұрын
We all have to start somewhere. I didn't learn many of these until I had been using pandas for years.
@SuperOMERH
@SuperOMERH 11 ай бұрын
This video is amazing, I am using pandas for a long time now and still learned so many new good practices thank you
@KenJee_ds
@KenJee_ds 2 жыл бұрын
I didn't know about suffixes. Amazing!
@robmulla
@robmulla 2 жыл бұрын
Thanks Ken, glad I you were able to learn something new! Love your videos.
@thaynangamarano3340
@thaynangamarano3340 Жыл бұрын
I started to watch your videos recently, and from now on I'm doing the chaining and putting each function in "one row" to make the data cleaner, and also, the query method, so powerful and simple, I was used to replicate the dataframe with the column and value searched to filter my df. You are boosting my studies! Thanks for that!
@JakeStetter-wo6jr
@JakeStetter-wo6jr 10 ай бұрын
Really enjoyed how fast this content came. I felt like it was a great speed to keep me engaged. I usually find these types of videos boring.
@dedoseis
@dedoseis Жыл бұрын
Dear Rob, I'm a total beginner in Python and Pandas. From what I understand, the warning at 3:30 is not about making a copy of sliced data, but rather about not using the .loc method and using "direct assignment" for columns (or whatever it's called). I could be wrong, but this is what I've gathered from reading the documentation and encountering a similar warning in my code. Thanks for your valuable content. It has been a great help
@checher100
@checher100 Жыл бұрын
Awesome video! I work with Pandas for +3 years and learned a lot here! Thanks
@robmulla
@robmulla Жыл бұрын
Happy to hear it. Tell your friends!
@TimoTalksTech
@TimoTalksTech 2 жыл бұрын
found your channels few days ago and man you have some epic content . The noob mistakes here are the exact way most tutorials teach you..just wondering why the hell the non noob ways are not taught as they are easier and shorter and the syntax makes more sense... thank you for this video
@robmulla
@robmulla 2 жыл бұрын
Glad you like them! I’m trying to continue to make more stuff like this so keep watching!
@joomzb
@joomzb 2 жыл бұрын
Thanks, great tips! I've been using pandas for years, and I've only recently started using some of these (particularly query, and didn't know about the @ operator)
@robmulla
@robmulla 2 жыл бұрын
Glad it was helpful! The @ operator is really useful. You can also do stuff like min() or or apply operations between columns within the query.
@hwlee03a
@hwlee03a Жыл бұрын
Oh god. I clicked on this video just to confirm that this is one more overly exaggerated self-confident dude trying to teach newbies of 2 weeks experience. After watching this, this is god damn life changing. As an engineer focusing on fluid dynamics and floater response, I use pandas daily basis. Out of 25, I didn’t know approximately 20. Every single person who has any plan to use pandas must watch this. Awesome!
@YassFuentes
@YassFuentes 2 жыл бұрын
Hey, Rob! Super video this one. I myself am Sr. DS working each day intensively with pandas, I will implement many of the tips you show! Thanks a million :)
@robmulla
@robmulla 2 жыл бұрын
Awesome to hear! I'm still learning new tricks with pandas every day.
@krmunoz2169
@krmunoz2169 2 жыл бұрын
Dude I've worked with pandas for 7 years and learned some new tricks, thanks a lot!
@robmulla
@robmulla 2 жыл бұрын
Great to hear! You've been working with it longer than I have. Please share my channel with any friends you think might also learn from it.
@mschuer100
@mschuer100 2 жыл бұрын
Rob, as always, fantastic video. I have to admit, i get caught on some of those mistakes so it is great to have you point out and make suggestions on how to correct them. Thanks for sharing. Much appreciated.
@robmulla
@robmulla 2 жыл бұрын
I fall into these a lot too! We can all get better, glad you found the video helpful.
@scottbrewer474
@scottbrewer474 2 жыл бұрын
Found lots of favorite annoyances and learned a few new tricks! I'll add a shout-out to the ".pipe()" method to allow for wrapping all your transforms in a single statement when a single .method can't cover the required transform. An added bonus of "pipe()" - since it's using user defined functions to do the transforms, you can add decorators to automatically print out metadata on the resulting transform steps to get a quick insight into potential bugs.
@robmulla
@robmulla 2 жыл бұрын
Oh. Great one. I forgot to add pipe and assign in this video but wish I did.
@shivangagarwal8332
@shivangagarwal8332 2 жыл бұрын
Excellent points! Learned new stuff that a lot of tutorials don't explicitly teach.
@robmulla
@robmulla 2 жыл бұрын
Glad it was helpful! Thanks for watching and please share with others.
@efi3825
@efi3825 4 ай бұрын
Oh man, I am making so many of these mistakes. Honestly, this is a great checklist to improve my clean coding.
@stellathron1079
@stellathron1079 Жыл бұрын
Thank you for creating such an amazing video on pandas. It has even been really helpful for me as a pandas new bee. Leanrt a lot! 🎉
@robmulla
@robmulla Жыл бұрын
Love it!
@bendirval3612
@bendirval3612 2 жыл бұрын
Oi! There were several of those I didn't know. I wouldn't have thought I was a noob, but I guess we all have a bit of that in us. Thanks for the video!
@robmulla
@robmulla 2 жыл бұрын
Glad you learned something new. I find I’m always learning something new with python and data science. That’s why I love it so much.
@ClearVista
@ClearVista 2 жыл бұрын
Learned tons with this. Short and succinct. New subscriber.
@robmulla
@robmulla 2 жыл бұрын
Thanks for subscribing!
@simonebenzi4189
@simonebenzi4189 2 ай бұрын
It's more useful this than many pyhton courses as a whole. Thanks!!
@flusyrom
@flusyrom 2 жыл бұрын
Wow, very useful - a true "tour de force" for better Pandas code. THX for this !
@robmulla
@robmulla 2 жыл бұрын
Glad it was helpful! Please consider sharing it with anyone else you think would benefit from watching.
@Loys2020
@Loys2020 2 жыл бұрын
I'm new to Pandas and all tips from this video are gold for me, thank you a lot!
@robmulla
@robmulla 2 жыл бұрын
Glad you learned something new. Welcome to the world of pandas!
@garyfritz4709
@garyfritz4709 2 жыл бұрын
+1000. I’m brand new to Pandas and still trying to grok the idiom. This video is GOLD.
@spaceyfounder5040
@spaceyfounder5040 2 жыл бұрын
Oh man, that guide is pro! Thanks, gonna apply all of that when refactoring my project!
@robmulla
@robmulla 2 жыл бұрын
Glad it helped! Tell a friend!
@narudh
@narudh 2 жыл бұрын
some great tips here. i usually chain with \ and i didn't know a query method exists!! guess you learn everything new all the time!
@robmulla
@robmulla 2 жыл бұрын
Glad you learned something new! Cheers.
@CodeEmporium
@CodeEmporium 2 жыл бұрын
Nice video! I have been using pandas for years and still run into these issues :)
@robmulla
@robmulla 2 жыл бұрын
Thanks! Glad you enjoyed the video. I really enjoy your videos too.
@artemissrijan473
@artemissrijan473 6 ай бұрын
This video is too damn good, I would love to find more videos like this.
@reneulloa2647
@reneulloa2647 Жыл бұрын
Rob, amazing video and intuitive. Happy to subscribe!
@piotrkulinski922
@piotrkulinski922 9 ай бұрын
OMG! I had to rest after first 10. So huge dose of information. Thanks.
@protohale
@protohale 2 жыл бұрын
I'm so guilty of number 8! Thank you for this!
@robmulla
@robmulla 2 жыл бұрын
I’ve made every one of these mistakes at some point so I know how you feel. Thanks for watching!
@nikhhiilreddi1371
@nikhhiilreddi1371 2 жыл бұрын
Extremely underrated channel Extremely helpful
@robmulla
@robmulla 2 жыл бұрын
Thanks Nikhhilil!
@gregglind
@gregglind 2 жыл бұрын
Releasing a notebook showing all these tips would be a great benefit to the community. The `.style()` trick at @9:18 is amazing.
@robmulla
@robmulla 2 жыл бұрын
If this video gets 100k views I’ll share the notebook cringe 😬!
@peterappel9154
@peterappel9154 Жыл бұрын
@@robmulla It currently has 241k views 😉
@haskellbear
@haskellbear 10 ай бұрын
I'm an experienced developer looking to get familiar with Pandas. I found this video very valuable.
@avirajankitjain256
@avirajankitjain256 2 жыл бұрын
Dude, Amazing video apparently clear the concept.
@robmulla
@robmulla 2 жыл бұрын
Glad you think so! Share with your friends!
@richjaxxon
@richjaxxon 2 жыл бұрын
Great video. Very helpful. Please keep making more like this
@robmulla
@robmulla 2 жыл бұрын
Appreciate that. I plan to!
@我想想-e5d
@我想想-e5d 2 жыл бұрын
The space need to be avoid part is so true! But wait a second, every time I face the space but not underscore is from others data, so I think what we actually need is how to deal with the space condition.(Which is a pain of journey)
@Saareem
@Saareem 2 жыл бұрын
Maybe rename all the columns with versions without a space. Like, you replace all the spaces with an underscore. df.rename can take dictionaries or even a mapper function so this is easy to do. Using a dictionary is preferable as you can just reverse map it, if you want to use the columns with spaces in them in the end.
@robmulla
@robmulla 2 жыл бұрын
Good point. In most cases to can be done with a list comprehension one liner!
@gabrielcosta4513
@gabrielcosta4513 2 жыл бұрын
Great video! I also like the jazz bass behind you, I also play bass :)
@robmulla
@robmulla 2 жыл бұрын
Awesome! I’m more of a guitar player but I also enjoy playing bass.
@yordanadaskalova
@yordanadaskalova Жыл бұрын
Great overview. I also found that ChatGPT is most useful in explaining existing code rather than writing it. Same with writing.
@robmulla
@robmulla Жыл бұрын
Yes, but chatGPT can also be very confident when it gives you bad code or code that doesn't work so don't trust it blindly.
@obrien8228
@obrien8228 Жыл бұрын
@@robmulla Chat GPT so arrogant lol
@anpham7108
@anpham7108 2 жыл бұрын
Very useful video, thank you for making this !
@robmulla
@robmulla 2 жыл бұрын
Glad it was helpful! Share it with anyone you think might also benefit.
@GeeNee25
@GeeNee25 2 жыл бұрын
Super useful! Thanks a lot, mate!
@robmulla
@robmulla 2 жыл бұрын
Thanks for watching. Please share with someone you think might also like it.
@werneckpaiva
@werneckpaiva 2 жыл бұрын
Very useful! Thank you for sharing in such an easy and agile way.
@robmulla
@robmulla 2 жыл бұрын
Hey! Glad you learned something. Appreciate the feedback!
@shreyaroraa2234
@shreyaroraa2234 2 жыл бұрын
Great video for new users not knowing tips and tricks.. Wish you shared the code also to keep it handy for reference
@robmulla
@robmulla 2 жыл бұрын
Thanks for watching. I don’t think I kept the code unfortunately
@deepakramani05
@deepakramani05 2 жыл бұрын
Another awesome, useful video, Rob. Thank you.
@robmulla
@robmulla 2 жыл бұрын
Thanks for watching Deepak!
@matthewcaron6446
@matthewcaron6446 Жыл бұрын
Great video. Lots of operations and procedures that are helpful for effective coding. Would be really helpful to have a cheat sheet linked for easy reference.
@siqueirapaty
@siqueirapaty 2 жыл бұрын
Great video. Thank you for being so direct and giving us valuable tips ☺
@robmulla
@robmulla Жыл бұрын
Glad you liked it! Thanks for giving feedback. Share the video with anyone else you think might also like it.
@adrianmuresan7764
@adrianmuresan7764 2 жыл бұрын
Thank you! The .diff method is a lifesaver when computing velocities. The advice on not using inplace is excellent i got into various troubles because of it but i thought that's what the "experienced guys" do.
@robmulla
@robmulla 2 жыл бұрын
Thanks for watching. inplace is very tricky. Diff method is really powerful, and there are parameters you can use within it depending on your use case.
@SamusUy
@SamusUy 2 жыл бұрын
Regarding the 'inplace' comment at 02:07 there's a very valid and very useful reason to prefer that and it's memory usage. `df = df.reset_index()` or anything similar creates an entire copy of the dataset before replacing it with the original and for extremely big data that is a problem, it may get over the physical memory available and have the OS kill the script.
@robmulla
@robmulla 2 жыл бұрын
Interesting. I’ve heard this but then also thought it was debunked. I think the fact that the pandas core developers want to remove inplace gives good reason to try and avoid using it.
@SamusUy
@SamusUy 2 жыл бұрын
@@robmulla I guess it's more "functional style" to do it like they want but I recently had this problem with the memory when creating copies and I solved it by using 'inplace' (Python 3.7 and Pandas 1.3.5 if it matters)
@robmulla
@robmulla 2 жыл бұрын
@@SamusUy good to know!
@justtohost3431
@justtohost3431 Жыл бұрын
I got to admit, that I regularly make 65% of this newbie "mistakes". That's why I am specifically helpful for your tips how to optimize my coding structure! Thanks a lot for your inputs!
@robmulla
@robmulla Жыл бұрын
Glad it was helpful!🙌
@Singularitarian
@Singularitarian Жыл бұрын
Very illuminating video! I learned a lot quickly.
@robmulla
@robmulla Жыл бұрын
Thanks for the feedback Daniel!
@julians.2597
@julians.2597 2 жыл бұрын
4:25 I'll add a pet peeve of mine: Using chaining, but not placing each method call on a new line. One of the greatest benefits of method chaining is easier track of changes, since everything is moving linearly: rightwards, or downwards, with linebreaks and parantheses. having multiple method calls on some lines but not others breaks this one directional thought process and makes it much harder to skim code.
@Mats-Hansen
@Mats-Hansen 2 жыл бұрын
He goes through this at 10:23.
@julians.2597
@julians.2597 2 жыл бұрын
@@Mats-Hansen and does it himslf earlier😉. This is just acheeky comment
@robmulla
@robmulla 2 жыл бұрын
Haha. Thanks for putting me in my place. I was leading up to the later point? At least I can pretend that’s my excuse 😝
@julians.2597
@julians.2597 2 жыл бұрын
@@robmulla happens. Actually, I got pissed about this the other day when I tried "Black Formatter", because that only puts methods on new lines, not dot-notation attributes. E.g., calling df.T or df.columns would not result in a new line. Utterly annoying for my little OCD brain.
@willykitheka7618
@willykitheka7618 2 жыл бұрын
Hey Rob! You got me on that one right off the bat! I write a file to csv and when I load it back in, I get an 'unnamed' column and I wonder why....then I have to drop the column. 🤐Unnecessary work! Thanks a heap!
@robmulla
@robmulla 2 жыл бұрын
That's good to hear that you learned something new only a few seconds into the video :D - if you enjoyed it please share it on social or with any friends who might learn from it.
@karlduckett
@karlduckett Жыл бұрын
This was great! Just what I needed :)
@sayantanghosh6619
@sayantanghosh6619 Жыл бұрын
I loved this to s be to my students. You did a great job in a short video!
@robmulla
@robmulla Жыл бұрын
Thank you so much! It's hard to make it short but is worth it in the end.
@nickolastradess
@nickolastradess 2 жыл бұрын
Have not, and will not make any of these mistakes because I’ve seen your “A Gentle Introduction to Pandas Guide” !!
@PMU004
@PMU004 2 жыл бұрын
Trueeeeee
@robmulla
@robmulla 2 жыл бұрын
Love it! Thanks nick.
@garyfritz4709
@garyfritz4709 2 жыл бұрын
Where, please? I found your twitter feed, and lots of “gentle introductions” from other people, but not yours.
@robmulla
@robmulla 2 жыл бұрын
@@garyfritz4709 here is the link kzbin.info/www/bejne/lXbFYaiqfreXodk
@garyfritz4709
@garyfritz4709 2 жыл бұрын
@@robmulla Aha. I was googling out on the web, and it didn't find THAT video in YT. Merci!
@trashantrathore4995
@trashantrathore4995 2 жыл бұрын
Great insights, thanks for these important tips
@robmulla
@robmulla 2 жыл бұрын
Glad you found them helpful. Share it somewhere on social you think people might learn from!
@gauravmalik3911
@gauravmalik3911 2 жыл бұрын
Great video as always. I will start exploring query method more. Rob, Can you please make a video on how feature engineering, especially how to create new features using aggregation etc. Thank you
@robmulla
@robmulla 2 жыл бұрын
Glad you enjoyed the video. Feature engineering would be a good topic for a future video. I'll add it to the list!
@santiagoperman3804
@santiagoperman3804 2 жыл бұрын
I do several of these and never imagined Pandas has styling. Time to rewrite and share with my peers.
@robmulla
@robmulla 2 жыл бұрын
My mind was blown when I found out about the styling and I use it a lot now. Please do share with others who you think might find this helpful.
@Lewstars
@Lewstars 2 жыл бұрын
I really think this should be written up in a medium blog article. Would be awesome to refer to.
@robmulla
@robmulla 2 жыл бұрын
That’s a good idea. I really want to make blogs for all my videos but I don’t have the time. Maybe someday
@RomanShchurko
@RomanShchurko 2 жыл бұрын
great video! wanted to add on #7, may be someone would find that helpful: in case you need to apply some function to a several values in a row, one of the fastest solution is numpy.vectorize smth like: def divide(num, denom): if denom == 0: return 0 else: return num / denom so instead of doing df["div"] = df.apply(lambda row: divide(row["value1"], row["value2"]), row=1) you go with df["div"] = np.vectorize(divide)(df["value1"], df["value2"])
@robmulla
@robmulla 2 жыл бұрын
Great tip! np.vectorize can be really handy. I think your example could be vectorized without having to use it though.
@RomanShchurko
@RomanShchurko 2 жыл бұрын
@@robmulla yeah) just couldn't come up with anything else))
@julians.2597
@julians.2597 2 жыл бұрын
9:13 another pet peeve, though this one is more important than the last one. Do not use backslashes. ever. well, not _never_, use them when writing a `with` statement with more than two context managers. But otherwise, don't. I'll quote the `Black` (the formatter) documentation: Backslashes and multiline strings are one of the two places in the Python grammar that break significant indentation. You never need backslashes, they are used to force the grammar to accept breaks that would otherwise be parse errors. That makes them confusing to look at and brittle to modify. This is why Black always gets rid of them.
@robmulla
@robmulla 2 жыл бұрын
Good point. Then backlashes are and old habit I’ve been trying to stop use. We all are learning constantly!
@fizipcfx
@fizipcfx 2 жыл бұрын
This video is literally a gem
@robmulla
@robmulla 2 жыл бұрын
Glad you liked it Fizip. Hopefully you learned a thing or two you that will help you write better code!
@fizipcfx
@fizipcfx 2 жыл бұрын
@@robmulla Thank you for your reply. I am thankful for your content.
@rafaelcaballeroroldan9582
@rafaelcaballeroroldan9582 2 жыл бұрын
Thanks for the video!! A small comment about number nine, creating multiple intermediate dataframes. I understand that this can be costly in terms of memory, but I also think it can be nice for debugging and understanding during the development phase. Moreover, using the same name 'df' once and another can be prune to errors if you have different operations in different cells and you are 'playing' skipping some of them to see the effect, because you don't know which 'df' are actually taking as input.
@robmulla
@robmulla 2 жыл бұрын
Good point! It really depends on what you're doing and the time it takes to develop sometimes is more important than the code itself. However, once you are done debugging then changing it to using chaining methods is typically preferred.
Жыл бұрын
At 6:23 (#14) you're returning the dataframe, but you're also modifying it in place. Having a return there gives the impression that the original dataframe isn't modified, specially if you also assign it to itself later. It ties back to #5.
@aspeno5613
@aspeno5613 Жыл бұрын
Thanks Rob! I just made my first Kaggle notebook and I think I made all 25 of these mistakes 😂
@myself4024
@myself4024 Жыл бұрын
*Introduction:* This video summarizes 25 common mistakes made by beginners learning pandas in Python. *Data Cleaning and Manipulation:* *Section 1 (**00:00**)* : Avoid unnecessary elements in CSV files by excluding the index or setting an index column when reading. *Section 2 (**00:52**)* : Use clear and consistent column names. Replace spaces with underscores for readability and dot syntax access. *Section 9 (**05:01**)* : Represent True/False conditions with boolean values for clarity and efficiency. Avoid using text strings ("yes", "no"). *Section 11 (**06:29**)* : Employ *_fillna_* for flexible missing value imputation (e.g., filling with mean, specific value). *Efficient Data Transformations and Calculations:* *Section 4 (**01:50**)* : Leverage `@` symbol for variables in queries for cleaner syntax. *Section 5 (**03:15**)* : Prioritize vectorized functions over `.apply` for efficient calculations. Use `.apply` judiciously when vectorization is not feasible. *Section 6 (**03:44**)* : Avoid unnecessary intermediate DataFrames. Chain transformations instead to modify the same DataFrame for cleaner code and memory efficiency. *Section 10 (**05:31**)* : Utilize built-in methods: * `df.plot` for quick data visualizations. * `.str` for efficient column-wise string manipulations. * Create reusable functions for common transformations. *Section 12 (**06:54**)* : * Use `rename` dictionary for clear and efficient column renaming. * Leverage `groupby` for flexible group-wise aggregations. * Utilize built-in methods like `pct_change` and `diff` for calculations. *Data Storage and Handling:* *Section 8 (**04:37**)* : Ensure proper data type assignment (e.g., datetime) for accurate operations and avoid errors. *Section 13 (**08:15**)* : Consider alternative file formats like parquet, feather, or pickle for large datasets. These offer better compression and performance compared to CSV. *Section 15 (**09:11**)* : Utilize `style` attribute for rich DataFrame formatting within pandas. *Advanced Techniques and Best Practices:* *Section 3 (**01:21**)* : Utilize the `.query` method for advanced filtering with concise and readable syntax. *Section 7 (**03:44**)* : Understand DataFrame slicing and copying. Treat slices as read-only to avoid unintended modifications. Use `.copy()` for truly independent DataFrames. *Section 14 (**08:15**)* : Choose the file format based on size, use case, and compatibility. Consider parquet for queryability and compression, feather for efficient data exchange, and pickle for flexibility. *Section 16 (**10:27**)* : * Break down chained method expressions for better readability. * Employ categorical data types for efficient storage and operations. * Prevent and identify duplicate columns using `df.columns.duplicated()`. I hope this combined and formatted transcript proves helpful!
@AlexTheAnalyst
@AlexTheAnalyst 2 жыл бұрын
I was genuinely worried I was making noob mistakes in Pandas...
@robmulla
@robmulla 2 жыл бұрын
😂 Hey Alex! Now I'm dying to know... did you have any reason to be worried?
@MariaSaleem-gi4uj
@MariaSaleem-gi4uj 10 ай бұрын
As a beginner this video made me learn some basic concept about pandas. thanks
@seaslugs
@seaslugs 2 жыл бұрын
"I can see how this would be confusing for new users". Sir, I have been using pandas for 10 years and had no idea I was making these mistakes!
@robmulla
@robmulla 2 жыл бұрын
Whoa! Glad you could learn something. I’m sure there are a few things you could teach me!
@juan.o.p.
@juan.o.p. 2 жыл бұрын
This is really useful, thank you!
@robmulla
@robmulla 2 жыл бұрын
Glad you found it useful, Juan!
@tharun541
@tharun541 2 жыл бұрын
Hi, I love your videos!!! Can you please make a video on how to handle missing values and outliers?
@robmulla
@robmulla 2 жыл бұрын
Great suggestion! I did have a whole video on this topic on Abhishek Thakur's channel. Check it out here: kzbin.info/www/bejne/e4rchIGAip2kiJI
@b16ftw
@b16ftw 2 жыл бұрын
lots of good info! thank you!
@robmulla
@robmulla 2 жыл бұрын
Glad you learned from it!
@alexanderdiazquintana3313
@alexanderdiazquintana3313 3 ай бұрын
8:17.....this loop maybe...can be replaced....maybe.....with creation of another colum with has the value of i-1....after_row.....extract the list of this column....[1:-1]....append(0)....then....insert this new list in row_after....then..percent_calc.... the end
@smiley-wu1kn
@smiley-wu1kn 2 жыл бұрын
This is amazing! Thanks a lot.
@robmulla
@robmulla 2 жыл бұрын
Glad you like it!
@notmyname42
@notmyname42 2 жыл бұрын
I didn't know about the .query neither the parenthesis for the chaining. Awesome video What is it with the \ on a chaining example you showed?
@robmulla
@robmulla 2 жыл бұрын
Thanks! Glad it helped. \ let’s you split lines for the same code.
@nishbhana
@nishbhana 2 жыл бұрын
Great Video! I think its important to add that the Pandas Vectorization doesn't always mean the code will run faster. In particular for the case of working with string data types it can sometimes be slower (even if it looks cleaner).
@robmulla
@robmulla 2 жыл бұрын
Good point! I didn't know that was the case. Do you have an example where vectorization is slower? I'd love to give it a look. Sometimes it's worth giving up a little bit of speed for readability. query is slightly slower than .loc but I prefer the former.
@MagnusAnand
@MagnusAnand 2 жыл бұрын
@@robmulla a couple of months ago I found an article that showed an example where vectorization wasn’t faster. If I found it I’ll post it here
@DiegoRamirez-kv3fq
@DiegoRamirez-kv3fq 2 жыл бұрын
When you mention the slice warning sometimes you don't care about the original data frame so it doesn't matter if you modified it
@robmulla
@robmulla 2 жыл бұрын
That’s true. But I don’t like seeing the warnings. And if you don’t need the rest of the data you can just overwrite it with the slice?
@jjhendriks8652
@jjhendriks8652 Жыл бұрын
Merge validator! Excellent thanks!
@robmulla
@robmulla Жыл бұрын
👍
This INCREDIBLE trick will speed up your data processes.
12:54
Rob Mulla
Рет қаралды 273 М.
“Don’t stop the chances.”
00:44
ISSEI / いっせい
Рет қаралды 62 МЛН
人是不能做到吗?#火影忍者 #家人  #佐助
00:20
火影忍者一家
Рет қаралды 20 МЛН
小丑女COCO的审判。#天使 #小丑 #超人不会飞
00:53
超人不会飞
Рет қаралды 16 МЛН
She made herself an ear of corn from his marmalade candies🌽🌽🌽
00:38
Valja & Maxim Family
Рет қаралды 18 МЛН
25 nooby Python habits you need to ditch
9:12
mCoding
Рет қаралды 1,8 МЛН
5 Python Libraries You Should Know in 2025!
22:30
Keith Galli
Рет қаралды 90 М.
Learning Pandas for Data Analysis? Start Here.
22:50
Rob Mulla
Рет қаралды 134 М.
Python dataclasses will save you HOURS, also featuring attrs
8:50
Pandas 2.0 : Everything You Need to Know
9:24
Rob Mulla
Рет қаралды 122 М.
5 Good Python Habits
17:35
Indently
Рет қаралды 711 М.
Please Master This MAGIC Python Feature... 🪄
25:10
Tech With Tim
Рет қаралды 167 М.
Speed Up Your Pandas Dataframes
11:15
Rob Mulla
Рет қаралды 73 М.
Pandas Query Filter Function Guide [Beginner Friendly]
14:46
Ryan & Matt Data Science
Рет қаралды 4,9 М.
This Is Why Python Data Classes Are Awesome
22:19
ArjanCodes
Рет қаралды 825 М.
“Don’t stop the chances.”
00:44
ISSEI / いっせい
Рет қаралды 62 МЛН