This is pure gold! Thanks for sharing the profound insights picked from the applied knowledge from the AI-ML industry.
@prosmartanalytics Жыл бұрын
Thank you! We are glad you liked it.
@janaosama601011 ай бұрын
is removing the duplicates in data done before or after handling the missing values
@prosmartanalytics11 ай бұрын
Removing duplicates could turn out to be a bit tricky. Ideally, we should remove duplicates only if each row in the dataset has a unique identifier and that identifier itself is duplicate e.g. we know two employees can't have the same employee id, so based on this, we can remove duplicates or suggest corrections. However, two employees can have the same age, same education, same location, and same salary, as long as these are two different employees we don't want to remove duplicates. Once these points are checked and if it is found that duplicate records are just data entry errors, we can remove duplicates before removing missing values. Basically, this is hygiene, not even data preprocessing. Hope it helps!
@younesgasmi8518 Жыл бұрын
Whene i have positive or negative infinity values ..Can I replace it with NaN an after that transfert it to normal values using median or mean strategy?
@prosmartanalytics Жыл бұрын
Good question. First we should find out why a value would have become infinity e.g. we might have derived a ratio variable. It could be infinity because of division by zero? Second, what are the other feature values like in such rows where some features are attaining infinity and how many such values and rows are present in the data? You may refer to our tutorial on outlier treatment for the choice of imputation techniques.