Stunning bro just clear cut explanation not wasting a single minute it's just a gold mine of information best video on a project explained step by step
@GregHogg Жыл бұрын
Thank you for the very kind words! Glad it was helpful 😀
@prathameshmore14023 жыл бұрын
Thank you for your amazing efforts! I don't have much experience in building different models, so this video helped me a lot! Btw, I tried increasing max_depth to 6 in random forest model, and it really increased model's performance better than I expected. Thanks again!
@GregHogg3 жыл бұрын
Interesting! Yeah it's surprisingly easy to mess around with models. That's great about the max_depth! And you're very welcome :)
@somechad3682 Жыл бұрын
One thing worth mentioning would be the data wrangling part. It's often a good idea to check for feature relevance and feature importance. Funny enough, the amount of transaction and the time of it were not considered as the features that had a substantial impact on the general outcome of the model to see if a transaction was fraudulent or not. This not only reduces bias in our data frame, but it can also substantially increase the computation speed of that model! (mine had a 36% boost in speed while losing only 0.01 points in F1 score, and 0.02 in precision.) Another thing would be to write a function that fits the training and validation data in each of the models automatically. It will substantially help with the cleanliness and readability of the project. I would also consider hyperparameter tuning and pipelining everything together to make it a robust project. However, great video and a great demonstration of how to check each model and measure their suitability for the problem at hand.
@kimchi628410 ай бұрын
please i have a poject in this topic could you pleeeease help me i don't know what to do
@petarganev42563 жыл бұрын
Great video on classification. Good luck with the channel!
@GregHogg3 жыл бұрын
Thanks so much Petar! I appreciate that 😊
@mellowftw3 жыл бұрын
I'll be trying this soon, thanks Greg
@GregHogg3 жыл бұрын
No problem Krish! 😊😊
@machinelearning36023 жыл бұрын
Hope to see more of this kind in the coming days!!
@GregHogg3 жыл бұрын
With an account name of "Machine Learning" I would expect nothing less! 😂 And absolutely ☺️
@sivanujansivakumar59073 жыл бұрын
Thanks man. I'm going to try this one. It's really helpful. 🙏😍
@GregHogg3 жыл бұрын
Enjoy! You're very welcome 😊
@motilalmeher766611 ай бұрын
After training the model on the balance population please find the model performance on the original population the imbalanced one.
@Worldwidenigespam2 ай бұрын
13:24. Why is the same data as used to fit the data used to score the model?
@aguspe5322 жыл бұрын
Great video and explanation! Thanks!
@GregHogg2 жыл бұрын
You're very welcome!
@garlicman27786 ай бұрын
really like your vidoe! One thing though, when you downsampling the data, shouldn't you still keep validating/testing on the ratio of data? In your case, you are basically assuming the testing data is also have a 50/50 split, which in reality will never be the case.
@mahelvson2 жыл бұрын
Great vídeo. I was just wondering if taking a slice from the original dataset to use as a test set is a more consistent way to evaluate the resampling procedure. Because in production, the model still has to deal with imbalanced data.
@Hash921111 ай бұрын
yes I agree. I've tried slice of original data for test set and the results look completely different.
@devjain70763 жыл бұрын
12:51 shouldn't shape of y_train be (240000, 1) since it consists of exactly one column?
@GregHogg3 жыл бұрын
(240000,) and (240000,1) are very close to the same thing. I'm not sure if they both work or not
@sushantpargaonkar51885 ай бұрын
how do you balance test set when you don't have labels in real life?
@saitejatangudu63203 жыл бұрын
Great video ❤❤ looking forward for more videos like this..
@GregHogg3 жыл бұрын
Thank you!! Absolutely 😊
@LinhGiaNguyen-x3m2 ай бұрын
why we have to scale "Time", it is necessary ?
@juanpabloherediacastello62122 ай бұрын
why not use train test evalu split using skit learn and stratify?
@vinsanargeese4384 Жыл бұрын
I just wanna know whether it gives the accuracy details only or detect whether card is fraud or not
@emrecoban3895 Жыл бұрын
Are we not supposed to test from original data instead of balanced one.
@Mwme2000 Жыл бұрын
well i have the same question but every code i saw for this dataset with high f1 score did like him and after a lot of research i found that if you have highly imbalanced data like this it is okay to test on the under sampled data if u know anything else please share it
@amannagarkar5 ай бұрын
In the predict function, you’re taking model as input arg but returning on shallow-nn. Is it correct? Or should it be model.predict() 28:31
@amannagarkar5 ай бұрын
Probably that’s why the values are exactly the same at 51:51
@joxa61199 ай бұрын
What is your opinion on doing oversampling (SMOTE) on the minority class?
@GregHogg9 ай бұрын
Definitely a solid option.
@unlucky-777 Жыл бұрын
Hey Greg, thank you for the video but I have a question. At first, we had a dataset that had 280000 rows and 30 columns but towards to end of the video, we decreased the dataset that only had 984 rows. Doesn't this make the model bad because we're trained on less data? Or the real problem was we were getting bad results at first because we had so many not_fraud data compared to fraud ones?
@SakshiSnaps3 жыл бұрын
Thanks greg!! Is it okay to do projects by looking at the tutorial videos!? When is the time, we need to do it on our own
@GregHogg3 жыл бұрын
Absolutely! Go ahead. You can do it on your own when you feel like you've got the general hang of things, if that makes sense.
@ArtistrystoriesUnleashed45 Жыл бұрын
can i try train_test_split function from sklearn to split data into train and test set?
@KeKuHauPiOx Жыл бұрын
im getting errposts on the rest train and val run for the numpy
@arsheyajain70553 жыл бұрын
Awesome 👏🥳
@GregHogg3 жыл бұрын
Thank you! 😊
@mubshali74893 жыл бұрын
Sweet. This is going to my github!!
@GregHogg3 жыл бұрын
I sure hope so!
@ottomaggio27252 жыл бұрын
Nice video, however, it is not completely clear to me how the undersampling relates to the overall problem. In the end, you have to provide the client (the bank) with a model capable of detecting fraud. Let's suppose we give them the model trained on the rebalanced dataset. Since frauds are unbalanced by nature, then they will end up using the model trained on a balanced dataset on a test set that is actually unbalanced. Isn't this causing issues? Isn't the prediction biased toward the fraud? Aren't we predicting way too many frauds?
@ottomaggio27252 жыл бұрын
To be more specific, I think you can try balancing the training set but you cannot balance the test set because, in the end, in the real scenario, the new data to be predicted will be always unbalanced.
@luqmanhrizal Жыл бұрын
its not practical to evaluate the model on the balanced the evaluation/test set since its ignore the real fraud representation. data representation is sacred.
@srijanshovit8442 жыл бұрын
That's amaaazzzing!!
@MatTheBene2 жыл бұрын
Are you not leaking targets if your normalize before splitting the data?
@GregHogg2 жыл бұрын
If I am, it isn't really a big deal
@MatTheBene2 жыл бұрын
@@GregHogg it isn't a big deal in most cases probably, but with time series data you are leaking future information that the model will not have during inference, such as changes in trend 📈 in future data points
@GregHogg2 жыл бұрын
@@MatTheBene For time series it would be more concerning yes
@allaboardthegravytrain598710 ай бұрын
thanks
@j_ckitchai Жыл бұрын
Hi thankyou a lot from making this video I learn a lot through this, I have some question at @52:05 the line print rf.predict(x_val_b) isn't that should be rf_b.predict(x_val_b) instead ? along with Gbc later on too it should use gbc_b.predict right ??
@jeremyklauber75357 ай бұрын
I thought that as well not entirely sure why he hadn't changed those when the neural_net_predictions function he had it under shallow_nn_b
@83Dunes5 ай бұрын
I have the same question. The inference and final choice of model may differ with that change.
@ashwanirathi9483 ай бұрын
Awesome
@TernaryM0112 күн бұрын
You're just fooling yourself (and your boss) with the final report. Run the classification report with the original dataset, not the much smaller dataset (that has been artificially made to be balanced)! To do this and then say that this demonstrates the power of balancing the dataset is foolish. Also, your neural_net_predictions function is wrong (a nasty bug you missed). At 29:50 you realized that you should change "x_train" to "x", but still overlooked that "shallow_nn" should also be changed to "model"! Because of this, neural_net_predictions always gives the prediction of shallow_nn, no matter what model you put in there, which explains the false classification report of your second neural network being EXACTLY the same as the first one (at 51:46). (It's amazing how it didn't bother you at all, at least give you a pause and suspicion that something might be wrong...)