It is hard to find such good explanations on Isolation Forest. Keep up the good work!
@tareqal-masri17822 жыл бұрын
Hi Dr. Esmalifalak, I'm a huge fan of all your videos, they've helped me with getting through university and get a career, can you please upload more videos, what data visualization tool do you use?
@saravanannatarajan65154 жыл бұрын
Thanks for great tutorial. I can easily pick it as the best tutorial on this topic. Much appreciated. Please continue providing more videos.
@AIwithDrMo4 жыл бұрын
Thanks Saravanan. I am glad that it helped you. Please post any topic that seems interesting to you here and I will consider it for the next video.
@prafulh52523 жыл бұрын
@@AIwithDrMo Please cover other algorithms for anomaly detection in the similar way
@rubenr.24703 жыл бұрын
thanks for this video! its not easy to find high quality content like this! keep it up!
@zaynyao38634 жыл бұрын
You solved a big problem for me,thank you
@AIwithDrMo4 жыл бұрын
I am glad that helped you.
@satishvavilapalli242 жыл бұрын
Just amazing
@wakilkhan88753 жыл бұрын
Please make another video on, Anomaly detection One-class SVM for Novelty detection
@soumikbasu15563 жыл бұрын
A very well-structured but simple way of explanation. Can we also have a look at measuring the efficacy of the model?
@AIwithDrMo Жыл бұрын
Thanks for the comment. Isolation Forest is an effective anomaly detection method that can handle high-dimensional data and has several advantages over other methods. Its efficacy depends on the specific characteristics of the data and hyperparameters used. For example, the performance of the algorithm can be affected by the choice of subsampling ratio, the number of trees in the forest, and the choice of distance metric used to evaluate the splits.
@joshuasuasnabar6058 Жыл бұрын
thanks you profesor, just a question. Is possible deal with categorical variables? Is important the type of enconding to use (one hot or label enconding)? Thanks you in advance
@AIwithDrMo Жыл бұрын
Joshua, Thanks for your comment. Yes it is possible! You can use Extended Isolation Forest (EIF). Please take a look at this page for more info and a python example: capable-timimus-00a.notion.site/Isolation-Forest-in-Categorical-Values-b5534c14548b4ba881199477939044c2
@rezamonadi42824 жыл бұрын
Great explanation...
@AIwithDrMo4 жыл бұрын
Thanks Reza. I'm glad you liked it.
@aashi97814 жыл бұрын
Hello Dr. Mohammad, Is the algorithm effective with the real time streaming data? I have sensor data of around more than 100 sensors, should I need to find the important variables before feeding into the model or should I pass all the variables and let the algorithm decide by itself? Multicollinearity exist in the data .
@AIwithDrMo4 жыл бұрын
Hi Aradhna, Isolation forest is one of the fast algorithms in anomaly detection and people use it with large datasets like financial datasets. For sensor data you don't have to process very high frequency data. You may need to find the right sampling rate (for example temperature usually is not changing sooner that 10-20 sec so sampling every second is not necessary ). If your window is 1 minute, you should not have noticeable problem in a regular application. I usually start will all of the data and the drop/minimize if I have to...
@tiger06t4 жыл бұрын
Hi! Thanks for the great tutorial. But I have a question, is it possible that isolation forest output different result? I have used isolation forest on my dataset, but the output results are a bit different than previous results everytime (I haven't changed any parameter in the model and the dataset I used is the same).
@AIwithDrMo4 жыл бұрын
Thanks Johnson. Isolation forest randomly splits the datasets so there is no guarantee to have exactly the same results each time but, if you do it enough times and average out the results, it should converge to one solution (with reasonable data sets of course).
@tiger06t4 жыл бұрын
@@AIwithDrMo Thank you! Dr. Mohammad
@hamzasmidi34454 жыл бұрын
Thank you Mohammad
@AIwithDrMo4 жыл бұрын
I am glad that you liked it.
@VladimirOlteanu4 жыл бұрын
Hello! Just a question. Is this an algorithm a classic isolation forest or an extended isolation forest (I saw you named the object with the predictions eif)? Is there any way to implement an extended isolation forest? Basically the difference between EIF and IF is that the EIF takes random intercept and slope and does the split based on the trend line. Thank you for the video!
@AIwithDrMo4 жыл бұрын
Hi Vladimir This is classic isolation forest and as you mentioned, EIF can also be used similarly.
@neginpirannanekaran12364 жыл бұрын
Great explanation. Thanks
@AIwithDrMo4 жыл бұрын
Glad it was helpful!
@uvs81364 жыл бұрын
Thank you for easy to understand tutorial. What if we don't know the contamination? and that is the goal to find. How do we start, is it by using auto? how do you find true outliers. Its like k-means, where we do have to specify # of clusters to begin with, what if we want to know the clusters
@AIwithDrMo4 жыл бұрын
Hi Urmil, Happy that it helped. For the contamination we usually start with small percentage and look at the results. This can be through plotting (use PCA for plots with more than 3D) or printing individual (anomalous) observations and inspecting them. If we see our model is not sensitive anough and skips anomalies, we will increase the contamination percentage. Remember that this is unsupervised and you are not providing labels of anomaly before training. You are only testing the results for a small portion of the data that you know the lables (say you are subject matter expert).
@MrSanghan19903 жыл бұрын
Thx, I will apply it~~
@shahrzadamini1403 жыл бұрын
Hi, thanks I found it really helpful, but I have a question about the Contamination parameter, how we can choose a suitable value for this parameter?
@AIwithDrMo3 жыл бұрын
glad you liked it. Contamination should be tested for your application. You can start with small numbers ( like 2%) and look at the results. If algorithm catches things that are normal to you, you may decrease the threshold otherwise keep increasing it ... You will find something reasonable for the data set you are working with.
@shahrzadamini1403 жыл бұрын
@@AIwithDrMo Thanks a lot for your explanation.
@shahrzadamini57463 жыл бұрын
Hi, good job, I have a question, how we can resample according to the year?
@AIwithDrMo3 жыл бұрын
I usually use 12 months resampling like "resample('12M')"
@alhanoufalsuwailem39923 жыл бұрын
Thanks for the clarification ! after applying iforest , how can I evaluate the cluster's result ? do you have specific method used for evaluation this type of unsupervised learning? I'd really appreciate that.
@AIwithDrMo2 жыл бұрын
I usually prefer to have a small labeled dataset (from client etc.) and validate my results with those labels.
@tenten73792 жыл бұрын
I have a question, this is an unsupervised model, right? is there a way to make the model predict a user input?
@AIwithDrMo Жыл бұрын
This is unsupervised anomaly detection method. It can be applied to user input data to detect anomalies or unusual patterns in user behavior over time. The basic idea is to use the algorithm to learn the normal patterns of user behavior based on the historical data, and then to use the model to identify any deviations from these patterns.
@alwaaffa2 жыл бұрын
You can help me with a master’s thesis for my software part (coding) in Python?
@AIwithDrMo2 жыл бұрын
Please fill out the following form for any specific questions, forms.gle/Jz4pkrNSGUqGhPug9