How to Clean Data Like a Pro: Pandas for Data Scientists and Analysts

  Рет қаралды 1,403

TrentDoesMath

TrentDoesMath

Күн бұрын

In this video, we will explore data cleaning techniques in Python with Pandas specifically tailored for data scientists and analysts. Whether you are a beginner or an experienced professional, these techniques will help you streamline your data cleaning process and enhance the accuracy of your analysis.
📖CHAPTERS
00:00 Intro
00:39 Data Walkthrough
03:48 Dropping Data
07:19 Dropping Duplicates
09:30 Cleaning String Data
18:29 Imputing Numeric Data
26:29 Imputing Categorical Data
31:53 Key Principal in Data Cleaning
35:18 Outro and Thanks!
UP NEXT:
- More Advanced Data Cleaning: • Master Missing Data wi...
🔗LINKS
- Data on Github: github.com/trentpark8800/pyth...
💵AFFILIATE LINKS (HELP SUPPORT THE CHANNEL)
- O'Reilly Media (Books courses and more): oreillymedia.pxf.io/python-fo...

Пікірлер: 18
@ChukwuemekaAmblessedchinenye
@ChukwuemekaAmblessedchinenye 15 күн бұрын
wow your are the real goat the best video so far please more video like this
@Carlos-wv4zk
@Carlos-wv4zk 12 күн бұрын
Dude I cannot explain how helpful this was, man! Seriously, you literally allowed me to pickup any datasets I download and immediately gave me the practical guidelines to clean/analyze it. Thank you!!
@trentdoesmath
@trentdoesmath 12 күн бұрын
You're very welcome!😎
@israsuazo3345
@israsuazo3345 29 күн бұрын
This is the 1st video I watched that actually seeing the python libraries in action. Thank you for this.
@trentdoesmath
@trentdoesmath 28 күн бұрын
You're very welcome! I'm excited to hear about what you will build with them 🙂
@dogsapparatus7504
@dogsapparatus7504 4 күн бұрын
nice tutorial
@LivingG6170
@LivingG6170 29 күн бұрын
Keep doing good work. Big help
@trentdoesmath
@trentdoesmath 28 күн бұрын
I appreciate the kind words 🙏 thanks for the support!
@trentdoesmath
@trentdoesmath Ай бұрын
What are some data cleaning techniques that you have used? 🤔
@totoarifiyanto8679
@totoarifiyanto8679 20 күн бұрын
Just like Thor said: "Another"
@CaribouDataScience
@CaribouDataScience 14 күн бұрын
You misspelled Tidyverse 😮
@trentdoesmath
@trentdoesmath 14 күн бұрын
🤣
@kikiboy2545
@kikiboy2545 28 күн бұрын
Hi ! Thanks for this video. I wanted to know, as a data scientist/analyst, why did you choose to use Jupyter and a .ipynb cleaning file ? Why not using pycharm and a .py for example ? Is that just a matter of personal preference ? Sorry I am new to python, proficient on Stata but trying to make a shift
@trentdoesmath
@trentdoesmath 28 күн бұрын
Hi @kikiboy2545 🙂 thank you for your question. TL; DR - I chose to use jupyter as it is easier for me to demo with and record the video with. To your point on creating a .py file - I would recommend this if you are creating cleaning logic that is going to be re-used and shipped to 'production' as it is easier to test and maintain a straight Python script IMO. That being said, there is increasing support for the use of notebooks as the preferred environment - as examples, Snowflake, Databricks, Azure Synapse and more all support the use of re-useable notebooks to contain all of your logic. I've worked in teams where notebooks are preferred for all data pipeline code due to how intuitive and approachable they are - but as I say my personal preference is: use notebooks for exploration, and .py scripts for your production code 🙂 No need to apologize! I am glad to be part of your learning journey - keep pushing man! 😎
@tmb8807
@tmb8807 13 күн бұрын
Cool, thanks. Is Polars making much of an impact in your world? I've used it a bit and I think I prefer the more explicit syntax - besides the potential for enormous performance gains it brings.
@trentdoesmath
@trentdoesmath 13 күн бұрын
Hi tmb8807 :) I have followed a couple of tutorials on polars, but never used it on anything in a professional setting as of yet 🤔 I'll test it out more extensively. Any good tutorials you'd recommend? Typically, when I've worked on projects that needed high performance I've used Apache Spark - but Polars could be a nice in-between pandas and spark? Thanks for the support!
@tmb8807
@tmb8807 12 күн бұрын
@@trentdoesmath thanks for the reply. There are a few tutorials on KZbin, the one from Rob Mulla is what got me onto it. Because Polars can work with larger-than-memory data via the streaming API I’ve seen it suggested it could replace Spark on a single node for some jobs, although I’ve not done that first hand! But it could potentially expand the 'in-between' area, as you say. Main reason I like it is that I just find the syntax much more consistent and readable (and easier to write as a result). Your mileage may vary on that, though, especially if you're extremely comfortable with Pandas (it's a bit less "Pythonic", with more explicit methods for everything). Lazy evaluation and the query optimisation engine are a big selling point of it as well - can greatly improve memory usage.
@trentdoesmath
@trentdoesmath 12 күн бұрын
Awesome! I'll check out the Rob Mulla stuff, thanks for the recommendation👍 For sure! It actually reminds me a bit of Scala 🤔... Very 'to the point'. Not sure if you have tried out Dask before? but it's yet another performance option out there.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 265 М.
Llegó al techo 😱
00:37
Juan De Dios Pantoja
Рет қаралды 61 МЛН
Пранк пошел не по плану…🥲
00:59
Саша Квашеная
Рет қаралды 7 МЛН
Как бесплатно замутить iphone 15 pro max
00:59
ЖЕЛЕЗНЫЙ КОРОЛЬ
Рет қаралды 8 МЛН
Three Best AI tools for Data Analysis
15:39
Gurru Tech Solutions
Рет қаралды 36 М.
HDF5 vs CSV for storing financial data
16:27
Chad Thackray
Рет қаралды 4,3 М.
Master Pandas: Boost Performance with These 3 Pro Tips
28:51
TrentDoesMath
Рет қаралды 103
How Fast can Python Parse 1 Billion Rows of Data?
16:31
Doug Mercer
Рет қаралды 198 М.
Stop, Intel’s Already Dead! - AMD Ryzen 9600X & 9700X Review
13:47
Linus Tech Tips
Рет қаралды 1,1 МЛН
The Biggest Issues I've Faced Web Scraping (and how to fix them)
15:03
I've been using Redis wrong this whole time...
20:53
Dreams of Code
Рет қаралды 348 М.
What I *actually* do as a Data Scientist (salary, job, reality)
8:38
Exploratory Data Analysis with Pandas Python
40:22
Rob Mulla
Рет қаралды 451 М.
Llegó al techo 😱
00:37
Juan De Dios Pantoja
Рет қаралды 61 МЛН