No video

Random Forest in Python - Machine Learning From Scratch 10 - Python Tutorial

  Рет қаралды 18,016

Patrick Loeber

Patrick Loeber

Күн бұрын

Пікірлер: 26
@fedorlaputin9119
@fedorlaputin9119 3 жыл бұрын
Your code looks likes Bagging, but in Random Forest we randomly choose the number of features in each node of decision tree The process of building a tree is randomized: at the stage of choosing the optimal feature for which the splitting will take place, it is not searched among the entire set of features, but among a random subset of the size of q. Special attention should be paid to the fact that a random subset of size q is selected anew every time it is necessary to split another vertex. This is the main difference between this approach and the method of random subspaces, where a random subset of features was selected once before constructing the basic algorithm. Am i wrong?
@Tntpker
@Tntpker 2 жыл бұрын
His previous decision tree implementation already has a random feature selection statement (np.random.choice line), but yes here he uses nfeats = None again so bagging is performed. nfeats should be set to sqrt(n_features) in RF (or other value, but this seems to be the best empirically)
@muratsahin1775
@muratsahin1775 9 ай бұрын
Again, great work, as in the others. I learned how to code ML algorithms from you. Thanks a lot. I added random features and random row selection sections to your algorithm. Interested friends can try it. import numpy as np from dt import DecisionTree from collections import Counter def bootstrap_sample(X, y): n_samples, n_columns = X.shape # random rows selection n_samples_row = int(n_samples/10) row_idxs = np.random.choice(n_samples, size=n_samples_row, replace=False) # random features selection n_samp_col = int(np.sqrt(n_columns)) col_idxs = np.random.choice(n_columns, size=n_samp_col, replace=False) # creating sub dataset Xidx = X[row_idxs] newX = np.zeros([n_samples_row,n_samp_col]) for i in range(n_samp_col): newX[:,i] = Xidx[:,col_idxs[i]] return newX, y[row_idxs], col_idxs def most_common_label(y): counter = Counter(y) most_common = counter.most_common(1)[0][0] return most_common class RandomForest: def __init__(self, n_trees=100, min_samples_split=2, max_depth=10, n_feats=None): self.n_trees = n_trees self.min_samples_split = min_samples_split self.max_depth = max_depth self.n_feats = n_feats self.trees = [] def fit(self, X, y): self.n_feats = X.shape[1] if not self.n_feats else min(self.n_feats, X.shape[1]) self.trees = [] self.rand_feats = [] for _ in range(self.n_trees): X_sample, y_sample, rand_feat = bootstrap_sample(X, y) tree = DecisionTree(min_samples_split=self.min_samples_split, max_depth=self.max_depth, n_feats=self.n_feats) tree.fit(X_sample, y_sample) self.trees.append(tree) self.rand_feats.append(rand_feat) def predict(self, X): # creating prediction array y_pred = [] for j in range(self.n_trees): # selection of necessary features len_feat = len(self.rand_feats[1]) new_X = np.zeros([len(X), len_feat]) new_feats = self.rand_feats[j] for i in range(len_feat): new_X[:,i] = X[:,new_feats[i]] # prediction is made for each tree y_pred.append(self.trees[j].predict(new_X)) # majority vote y_pred = np.swapaxes(y_pred, 0, 1) y_pred_final = [most_common_label(tree_pred) for tree_pred in y_pred] return np.array(y_pred_final)
@airesearch0844
@airesearch0844 3 жыл бұрын
I thoroughly enjoyed learning (1) Decision Tree and (2) Random Forest from your videos. Thanks a lot. The Decision Tree program was sleek and modular and is easy to understand and remember. If you throw in something from your lecture as comments, it will be a great learning tool.
@patloeber
@patloeber 3 жыл бұрын
Thanks :) Glad you enjoyed it!
@dheerajkumark2268
@dheerajkumark2268 3 жыл бұрын
For Regression instead of most_common_label we can calculate mean of the predictions right?
@shahbazkhalilli8593
@shahbazkhalilli8593 Жыл бұрын
what is optimizing for a lot of trees
@abseenahabeeb7917
@abseenahabeeb7917 Жыл бұрын
hi..nice tutorial..what about plotting each tree in notebook?
@jasonyam3282
@jasonyam3282 4 жыл бұрын
Here I see random forest class is using almost all the __init__ from decision tree. Can I use super() to get the init from Decision tree class instead of manually copying the parameters?
@patloeber
@patloeber 4 жыл бұрын
Hi. Good obersvation. But no you cannot, there is no inheritance. DecisionTree is a complete independent class that you can run alone, so it needs all the parameters. And a RandomForest needs all parameters so that it can pass it on to all the trees that it creates.
@fedorlaputin9119
@fedorlaputin9119 3 жыл бұрын
what about make playlist like this for deep learning, it will be very helpful!
@MdMainuddincse
@MdMainuddincse 4 жыл бұрын
You referred your previous video to better understand this video but you did not keep a link of your previous video anywhere. This is not good. You should mention the previous video link in description or a pinned comment.
@patloeber
@patloeber 4 жыл бұрын
thanks for the tip, and sorry I forgot it. I put the links in the description: Decision Tree Part 1: kzbin.info/www/bejne/oIfLZoF3bqqFeqM Decision Tree Part 2: kzbin.info/www/bejne/eKLMaHh8e9uBhck
@prashantsharmastunning
@prashantsharmastunning 4 жыл бұрын
hi np.swapaxes is giving this error "numpy.AxisError: axis2: axis 1 is out of bounds for array of dimension 1"
@bassamal-kaaki3253
@bassamal-kaaki3253 4 жыл бұрын
prashant sharma i am getting same error!
@charmilam920
@charmilam920 3 жыл бұрын
thank you for this its amazing
@ZobeirRaisi
@ZobeirRaisi 4 жыл бұрын
Thanks
@jacjacl324
@jacjacl324 4 жыл бұрын
Shouldn't it be np.random.choice(n_sampe, size = subset_sample_value, replace = True) so that it choose a subset of data. BTW you should make a paid course, I will surely purchase. I had bought lot's of udemy courses, yours is the best and it's free lol. All the course just give a slight overview (theory) then import scikitLean, 1. import data: do feature scaling and engineering, do some necessary data engineering stuff. then import model. 2. trainTestsplit. 3. fit. 4. predict 5. accuracy done lol. yours is the best.
@howardsmith9032
@howardsmith9032 4 жыл бұрын
Good question.
@patloeber
@patloeber 4 жыл бұрын
indeed good question and yes this can be the case. We could use an optional argument for the tree (max_samples), and then use it here. But I didn't want to make this example too complex, hence I just used all samples. And thanks for the tip. Maybe in the future I will make a paid course :) You can join my newsletter to get informed about such things...
@jacjacl324
@jacjacl324 4 жыл бұрын
@@patloeber i will, thank you for the videos again :)
@patloeber
@patloeber 4 жыл бұрын
@@jacjacl324 sure :)
@jasony7477
@jasony7477 4 жыл бұрын
Can you please one more Adaboost video to complete this series
@patloeber
@patloeber 4 жыл бұрын
Thank you for the suggestion :) I will definitely consider it for my future videos
@ray811030
@ray811030 4 жыл бұрын
@@patloeber HI, ur series of ML from Scratch is really great. I'm another one to expect from you the video talking about GBDT
@ray811030
@ray811030 4 жыл бұрын
jiayou
Fake watermelon by Secret Vlog
00:16
Secret Vlog
Рет қаралды 2,7 МЛН
Glow Stick Secret Pt.4 😱 #shorts
00:35
Mr DegrEE
Рет қаралды 18 МЛН
나랑 아빠가 아이스크림 먹을 때
00:15
진영민yeongmin
Рет қаралды 14 МЛН
Or is Harriet Quinn good? #cosplay#joker #Harriet Quinn
00:20
佐助与鸣人
Рет қаралды 46 МЛН
60 - How to use Random Forest in Python?
32:17
DigitalSreeni
Рет қаралды 39 М.
Decision Tree Classification in Python (from scratch!)
17:43
Normalized Nerd
Рет қаралды 189 М.
How to implement Decision Trees from scratch with Python
37:24
AssemblyAI
Рет қаралды 63 М.
Tutorial 43-Random Forest Classifier and Regressor
10:18
Krish Naik
Рет қаралды 399 М.
How to implement K-Means from scratch with Python
23:42
AssemblyAI
Рет қаралды 15 М.
Random Forest Classification | Machine Learning | Python
17:29
Stats Wire
Рет қаралды 33 М.
Fake watermelon by Secret Vlog
00:16
Secret Vlog
Рет қаралды 2,7 МЛН