Isolation Forests: Identify Outliers in Data

Рет қаралды 20,998

Күн бұрын

Пікірлер: 7

@haiderqassim-bi8tk 7 күн бұрын

We have collected raw data using specific sensors, and we are currently in the preprocessing stage. At this stage, we are focusing on identifying and extracting outliers. The question is: what is the best approach to handle outliers? Should we use algorithms such as the Interquartile Range (IQR) method and others, or rely on the sensor specifications to define the minimum and maximum values it can record, considering any values outside this range as outliers?

@Bentley642 10 ай бұрын

Great video, explained in a very intuitive way!

@elderresearch 10 ай бұрын

Glad you enjoyed the video!

@FIBONACCIVEGA 21 күн бұрын

amazing video. Do you have any practical example with python??

@muslimahmukbang417 10 ай бұрын

how are you getting the numbers -0.05, 0.10 and so on?

@elderresearch 10 ай бұрын

Thanks for your question! Here's what Jericho had to say about how he got those numbers: Isolation forests use a large number of randomized attempts to separate the data and count how many cuts it takes in each attempt to separate each datapoint. From that collection of counts for each record, scores are calculated. Since this is not straightforward to show by hand, I used the scikit-learn Python package and the wine dataset to calculate the scores, limiting the wine dataset to flavonoids and malic acid features. Then I took some example points from the outer edges and one from the middle of the real results and illustrated them as closely as possible in the whiteboard example. --- Here are the links to the scikit-learn and Python resources: scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html archive.ics.uci.edu/dataset/109/wine