Anomaly detection using iforest

Рет қаралды 19,274

AI with Dr. Mo

Күн бұрын

Пікірлер: 48

@pedromoro561 3 жыл бұрын

It is hard to find such good explanations on Isolation Forest. Keep up the good work!

@tareqal-masri1782 2 жыл бұрын

Hi Dr. Esmalifalak, I'm a huge fan of all your videos, they've helped me with getting through university and get a career, can you please upload more videos, what data visualization tool do you use?

@saravanannatarajan6515 4 жыл бұрын

Thanks for great tutorial. I can easily pick it as the best tutorial on this topic. Much appreciated. Please continue providing more videos.

@AIwithDrMo 4 жыл бұрын

Thanks Saravanan. I am glad that it helped you. Please post any topic that seems interesting to you here and I will consider it for the next video.

@prafulh5252 3 жыл бұрын

@@AIwithDrMo Please cover other algorithms for anomaly detection in the similar way

@rubenr.2470 3 жыл бұрын

thanks for this video! its not easy to find high quality content like this! keep it up!

@zaynyao3863 4 жыл бұрын

You solved a big problem for me，thank you

@AIwithDrMo 4 жыл бұрын

I am glad that helped you.

@satishvavilapalli24 2 жыл бұрын

Just amazing

@wakilkhan8875 3 жыл бұрын

Please make another video on, Anomaly detection One-class SVM for Novelty detection

@soumikbasu1556 3 жыл бұрын

A very well-structured but simple way of explanation. Can we also have a look at measuring the efficacy of the model?

@AIwithDrMo Жыл бұрын

Thanks for the comment. Isolation Forest is an effective anomaly detection method that can handle high-dimensional data and has several advantages over other methods. Its efficacy depends on the specific characteristics of the data and hyperparameters used. For example, the performance of the algorithm can be affected by the choice of subsampling ratio, the number of trees in the forest, and the choice of distance metric used to evaluate the splits.

@joshuasuasnabar6058 Жыл бұрын

thanks you profesor, just a question. Is possible deal with categorical variables? Is important the type of enconding to use (one hot or label enconding)? Thanks you in advance

@AIwithDrMo Жыл бұрын

Joshua, Thanks for your comment. Yes it is possible! You can use Extended Isolation Forest (EIF). Please take a look at this page for more info and a python example: capable-timimus-00a.notion.site/Isolation-Forest-in-Categorical-Values-b5534c14548b4ba881199477939044c2

@rezamonadi4282 4 жыл бұрын

Great explanation...

@AIwithDrMo 4 жыл бұрын

Thanks Reza. I'm glad you liked it.

@aashi9781 4 жыл бұрын

Hello Dr. Mohammad, Is the algorithm effective with the real time streaming data? I have sensor data of around more than 100 sensors, should I need to find the important variables before feeding into the model or should I pass all the variables and let the algorithm decide by itself? Multicollinearity exist in the data .

@AIwithDrMo 4 жыл бұрын

Hi Aradhna, Isolation forest is one of the fast algorithms in anomaly detection and people use it with large datasets like financial datasets. For sensor data you don't have to process very high frequency data. You may need to find the right sampling rate (for example temperature usually is not changing sooner that 10-20 sec so sampling every second is not necessary ). If your window is 1 minute, you should not have noticeable problem in a regular application. I usually start will all of the data and the drop/minimize if I have to...

@tiger06t 4 жыл бұрын

Hi! Thanks for the great tutorial. But I have a question, is it possible that isolation forest output different result? I have used isolation forest on my dataset, but the output results are a bit different than previous results everytime (I haven't changed any parameter in the model and the dataset I used is the same).

@AIwithDrMo 4 жыл бұрын

Thanks Johnson. Isolation forest randomly splits the datasets so there is no guarantee to have exactly the same results each time but, if you do it enough times and average out the results, it should converge to one solution (with reasonable data sets of course).

@tiger06t 4 жыл бұрын

@@AIwithDrMo Thank you! Dr. Mohammad

@hamzasmidi3445 4 жыл бұрын

Thank you Mohammad

@AIwithDrMo 4 жыл бұрын

I am glad that you liked it.

@VladimirOlteanu 4 жыл бұрын

Hello! Just a question. Is this an algorithm a classic isolation forest or an extended isolation forest (I saw you named the object with the predictions eif)? Is there any way to implement an extended isolation forest? Basically the difference between EIF and IF is that the EIF takes random intercept and slope and does the split based on the trend line. Thank you for the video!

@AIwithDrMo 4 жыл бұрын

Hi Vladimir This is classic isolation forest and as you mentioned, EIF can also be used similarly.

@neginpirannanekaran1236 4 жыл бұрын

Great explanation. Thanks

@AIwithDrMo 4 жыл бұрын

Glad it was helpful!

@uvs8136 4 жыл бұрын

Thank you for easy to understand tutorial. What if we don't know the contamination? and that is the goal to find. How do we start, is it by using auto? how do you find true outliers. Its like k-means, where we do have to specify # of clusters to begin with, what if we want to know the clusters

@AIwithDrMo 4 жыл бұрын

Hi Urmil, Happy that it helped. For the contamination we usually start with small percentage and look at the results. This can be through plotting (use PCA for plots with more than 3D) or printing individual (anomalous) observations and inspecting them. If we see our model is not sensitive anough and skips anomalies, we will increase the contamination percentage. Remember that this is unsupervised and you are not providing labels of anomaly before training. You are only testing the results for a small portion of the data that you know the lables (say you are subject matter expert).

@MrSanghan1990 3 жыл бұрын

Thx, I will apply it~~

@shahrzadamini140 3 жыл бұрын

Hi, thanks I found it really helpful, but I have a question about the Contamination parameter, how we can choose a suitable value for this parameter?

@AIwithDrMo 3 жыл бұрын

glad you liked it. Contamination should be tested for your application. You can start with small numbers ( like 2%) and look at the results. If algorithm catches things that are normal to you, you may decrease the threshold otherwise keep increasing it ... You will find something reasonable for the data set you are working with.

@shahrzadamini140 3 жыл бұрын

@@AIwithDrMo Thanks a lot for your explanation.

@shahrzadamini5746 3 жыл бұрын

Hi, good job, I have a question, how we can resample according to the year?

@AIwithDrMo 3 жыл бұрын

I usually use 12 months resampling like "resample('12M')"

@alhanoufalsuwailem3992 3 жыл бұрын

Thanks for the clarification ! after applying iforest , how can I evaluate the cluster's result ? do you have specific method used for evaluation this type of unsupervised learning? I'd really appreciate that.

@AIwithDrMo 2 жыл бұрын

I usually prefer to have a small labeled dataset (from client etc.) and validate my results with those labels.

@tenten7379 2 жыл бұрын

I have a question, this is an unsupervised model, right? is there a way to make the model predict a user input?

@AIwithDrMo Жыл бұрын

This is unsupervised anomaly detection method. It can be applied to user input data to detect anomalies or unusual patterns in user behavior over time. The basic idea is to use the algorithm to learn the normal patterns of user behavior based on the historical data, and then to use the model to identify any deviations from these patterns.