The A to Z Complete Guide to Data Preprocessing | Data Pre-processing in Python

The A to Z Complete Guide to Data Preprocessing | Data Pre-processing in Python | Data Science

Рет қаралды 737

Six Sigma Pro SMART

Күн бұрын

Пікірлер

@gumshuda24 Жыл бұрын

This is pure gold! Thanks for sharing the profound insights picked from the applied knowledge from the AI-ML industry.

@prosmartanalytics Жыл бұрын

Thank you! We are glad you liked it.

@janaosama6010 11 ай бұрын

is removing the duplicates in data done before or after handling the missing values

@prosmartanalytics 11 ай бұрын

Removing duplicates could turn out to be a bit tricky. Ideally, we should remove duplicates only if each row in the dataset has a unique identifier and that identifier itself is duplicate e.g. we know two employees can't have the same employee id, so based on this, we can remove duplicates or suggest corrections. However, two employees can have the same age, same education, same location, and same salary, as long as these are two different employees we don't want to remove duplicates. Once these points are checked and if it is found that duplicate records are just data entry errors, we can remove duplicates before removing missing values. Basically, this is hygiene, not even data preprocessing. Hope it helps!

@younesgasmi8518 Жыл бұрын

Whene i have positive or negative infinity values ..Can I replace it with NaN an after that transfert it to normal values using median or mean strategy?

@prosmartanalytics Жыл бұрын

Good question. First we should find out why a value would have become infinity e.g. we might have derived a ratio variable. It could be infinity because of division by zero? Second, what are the other feature values like in such rows where some features are attaining infinity and how many such values and rows are present in the data? You may refer to our tutorial on outlier treatment for the choice of imputation techniques.