We have collected raw data using specific sensors, and we are currently in the preprocessing stage. At this stage, we are focusing on identifying and extracting outliers. The question is: what is the best approach to handle outliers? Should we use algorithms such as the Interquartile Range (IQR) method and others, or rely on the sensor specifications to define the minimum and maximum values it can record, considering any values outside this range as outliers?
@Bentley64210 ай бұрын
Great video, explained in a very intuitive way!
@elderresearch10 ай бұрын
Glad you enjoyed the video!
@FIBONACCIVEGA21 күн бұрын
amazing video. Do you have any practical example with python??
@muslimahmukbang41710 ай бұрын
how are you getting the numbers -0.05, 0.10 and so on?
@elderresearch10 ай бұрын
Thanks for your question! Here's what Jericho had to say about how he got those numbers: Isolation forests use a large number of randomized attempts to separate the data and count how many cuts it takes in each attempt to separate each datapoint. From that collection of counts for each record, scores are calculated. Since this is not straightforward to show by hand, I used the scikit-learn Python package and the wine dataset to calculate the scores, limiting the wine dataset to flavonoids and malic acid features. Then I took some example points from the outer edges and one from the middle of the real results and illustrated them as closely as possible in the whiteboard example. --- Here are the links to the scikit-learn and Python resources: scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html archive.ics.uci.edu/dataset/109/wine