No video

Isolation Forest for Outlier Detection within Python

  Рет қаралды 27,200

Andy McDonald

Andy McDonald

Күн бұрын

Пікірлер: 23
@dougclendening5896
@dougclendening5896 23 күн бұрын
I haven't found a single video that basically explains what lines 8, 9 and 10. Some videos talk about trees but are too generic and don't give real examples in the nodes. Videos like this shows the code but don't talk about how any of this is related to an actual tree or set of logic. How the heck are we getting there? Also, I don't think you showed an example row of data. Are all of the data numbers?
@smn7074
@smn7074 Жыл бұрын
thanks for your great video. exactly what i needed.
@mngreta
@mngreta 8 ай бұрын
Can you please share the code? I took the time and tried to copy from the video but something is still wrong :(
@mwasimmit
@mwasimmit Жыл бұрын
for plotting in 2D if i reduce the dimensin to 2 dimensions using PCA and Plot it with the model result.. will it be a good summerize plot?
@FxbxxxScxlxrxxnx
@FxbxxxScxlxrxxnx Жыл бұрын
got a question: I have created a model using IF, and I fitted the model with my training dataset, now I want to apply this model to my test dataset. I don't really understand how I actually need to imagine this process of "fitting the IF model"? I mean, when I set contamination to, let's say, 5%, then my model calculates the anomaly scores of all values in the training dataset assigning to the 5% "most anomaly-like" data points the value -1 describing them as anomalies, right?, and after that when I pass my test dataset to the model, does my model then actually just reuse this structure of the IF trained with the training dataset for calculating the anomaly scores of the test data points and then it just compares if there are any anomaly-scores of test data points that superate the lowest one of these 5% "most anomaly-like" datapoints of the training dataset regarding their anomaly-score? And if any test data points are superating the lowest anomaly score of the 5% "most anomaly-like" data points in the training dataset then the data points in my test dataset are described as anomalies?
@johnbaptistbypassinglife
@johnbaptistbypassinglife Жыл бұрын
Yes, that's correct! When you fit an Isolation Forest (IF) model to your training data, the model will create a number of decision trees and use them to calculate anomaly scores for each data point in the training set. The data points with the highest anomaly scores will be considered the "most anomaly-like" and will be given a label of -1 to indicate that they are anomalies. When you apply the model to your test data, the model will use the same decision trees and calculation process to determine the anomaly scores for each data point in the test set. If any data points in the test set have anomaly scores that are higher than the lowest anomaly score of the "most anomaly-like" data points in the training set, they will also be given a label of -1 to indicate that they are anomalies. This process allows the model to identify anomalies in the test data that are similar to the anomalies identified in the training data. However, it's important to note that the model may also identify anomalies in the test data that were not present in the training data, as the model is designed to detect unusual or unexpected patterns in the data. I hope this helps to clarify the process of fitting and applying an IF model to your data! Let me know if you have any other questions.
@vitorribeirosa
@vitorribeirosa Жыл бұрын
Thanks, Andy!!! Great video!!!
@gourabguha3167
@gourabguha3167 Жыл бұрын
Any chance we can get the github link or the source code .ipynb file along with the dataset
@faicornelius2601
@faicornelius2601 Жыл бұрын
Thanks so much for your great videos.
@BabiryeShakira-g4s
@BabiryeShakira-g4s 12 күн бұрын
Is there a way I can get this exact dataset?
@MonuSaraswati
@MonuSaraswati 2 ай бұрын
Hi Andy - Can you please share this dataset ? I have not been able to find it online
@redpantherofmadrid
@redpantherofmadrid 7 ай бұрын
well explained, thanks a lot, and love the accent, its a bonus :)
@pioner40
@pioner40 Жыл бұрын
very good video. do you share the notebook ?
@user-eu5ri8cr1c
@user-eu5ri8cr1c Жыл бұрын
hi .. any python lib to create visual family tree with SQLite db ?
@rawabih4026
@rawabih4026 Жыл бұрын
شكرا من أعماق القلب
@fastisslow6177
@fastisslow6177 2 ай бұрын
nice explanation👍
@pramishprakash
@pramishprakash Жыл бұрын
Great explanation Sir
@faicornelius2601
@faicornelius2601 Жыл бұрын
Please Andy, after identifying the outliers, how do we remove them?
@AndyMcDonald42
@AndyMcDonald42 Жыл бұрын
Removing outliers needs to be done with due consideration. The cause of them being outliers needs to be properly understood and then the appropriate course of action can be taken. I discuss multiple methods of dealing with outliers in my medium article here: towardsdatascience.com/well-log-data-outlier-detection-with-machine-learning-a19cafc5ea37
@faicornelius2601
@faicornelius2601 Жыл бұрын
@@AndyMcDonald42 Thank you so much Andy. I have just followed you on Towards data Science. You are a great teacher.
@danymerizalde1942
@danymerizalde1942 11 ай бұрын
Where is the data?
@lashlarue7924
@lashlarue7924 Жыл бұрын
🫡👏👏👏❤
@nikolanovakovic7591
@nikolanovakovic7591 7 ай бұрын
really struggling to understand this accent
PyGWalker for Exploratory Data Analysis In Jupyter Notebooks
10:59
Andy McDonald
Рет қаралды 12 М.
GTA 5 vs GTA San Andreas Doctors🥼🚑
00:57
Xzit Thamer
Рет қаралды 25 МЛН
Gli occhiali da sole non mi hanno coperto! 😎
00:13
Senza Limiti
Рет қаралды 24 МЛН
The Joker kisses Harley Quinn underwater!#Harley Quinn #joker
00:49
Harley Quinn with the Joker
Рет қаралды 39 МЛН
Now it’s my turn ! 😂🥹 @danilisboom  #tiktok #elsarca
00:20
Elsa Arca
Рет қаралды 11 МЛН
Isolation Forest: A Tree based approach for Outlier Detection (Clearly Explained)
18:02
180 - LSTM Autoencoder for anomaly detection
26:53
DigitalSreeni
Рет қаралды 89 М.
I Studied Data Job Trends for 24 Hours to Save Your Career! (ft Datalore)
13:07
Thu Vu data analytics
Рет қаралды 211 М.
Stanford's FREE data science book and course are the best yet
4:52
Python Programmer
Рет қаралды 694 М.
Anomaly detection in time series with Python | Data Science with Marco
34:22
Data Science with Marco
Рет қаралды 32 М.
Find Outliers with Python- 4 Simple Ways
12:01
Absent Data
Рет қаралды 8 М.
Isolation Forests: Identify Outliers in Data
4:41
Elder Research
Рет қаралды 13 М.
GTA 5 vs GTA San Andreas Doctors🥼🚑
00:57
Xzit Thamer
Рет қаралды 25 МЛН