Random Forest in Python - Machine Learning From Scratch 10

No video

Random Forest in Python - Machine Learning From Scratch 10 - Python Tutorial

Рет қаралды 18,016

Patrick Loeber

Күн бұрын

Пікірлер: 26

@fedorlaputin9119 3 жыл бұрын

Your code looks likes Bagging, but in Random Forest we randomly choose the number of features in each node of decision tree The process of building a tree is randomized: at the stage of choosing the optimal feature for which the splitting will take place, it is not searched among the entire set of features, but among a random subset of the size of q. Special attention should be paid to the fact that a random subset of size q is selected anew every time it is necessary to split another vertex. This is the main difference between this approach and the method of random subspaces, where a random subset of features was selected once before constructing the basic algorithm. Am i wrong?

@Tntpker 2 жыл бұрын

His previous decision tree implementation already has a random feature selection statement (np.random.choice line), but yes here he uses nfeats = None again so bagging is performed. nfeats should be set to sqrt(n_features) in RF (or other value, but this seems to be the best empirically)

@muratsahin1775 9 ай бұрын

Again, great work, as in the others. I learned how to code ML algorithms from you. Thanks a lot. I added random features and random row selection sections to your algorithm. Interested friends can try it. import numpy as np from dt import DecisionTree from collections import Counter def bootstrap_sample(X, y): n_samples, n_columns = X.shape # random rows selection n_samples_row = int(n_samples/10) row_idxs = np.random.choice(n_samples, size=n_samples_row, replace=False) # random features selection n_samp_col = int(np.sqrt(n_columns)) col_idxs = np.random.choice(n_columns, size=n_samp_col, replace=False) # creating sub dataset Xidx = X[row_idxs] newX = np.zeros([n_samples_row,n_samp_col]) for i in range(n_samp_col): newX[:,i] = Xidx[:,col_idxs[i]] return newX, y[row_idxs], col_idxs def most_common_label(y): counter = Counter(y) most_common = counter.most_common(1)[0][0] return most_common class RandomForest: def __init__(self, n_trees=100, min_samples_split=2, max_depth=10, n_feats=None): self.n_trees = n_trees self.min_samples_split = min_samples_split self.max_depth = max_depth self.n_feats = n_feats self.trees = [] def fit(self, X, y): self.n_feats = X.shape[1] if not self.n_feats else min(self.n_feats, X.shape[1]) self.trees = [] self.rand_feats = [] for _ in range(self.n_trees): X_sample, y_sample, rand_feat = bootstrap_sample(X, y) tree = DecisionTree(min_samples_split=self.min_samples_split, max_depth=self.max_depth, n_feats=self.n_feats) tree.fit(X_sample, y_sample) self.trees.append(tree) self.rand_feats.append(rand_feat) def predict(self, X): # creating prediction array y_pred = [] for j in range(self.n_trees): # selection of necessary features len_feat = len(self.rand_feats[1]) new_X = np.zeros([len(X), len_feat]) new_feats = self.rand_feats[j] for i in range(len_feat): new_X[:,i] = X[:,new_feats[i]] # prediction is made for each tree y_pred.append(self.trees[j].predict(new_X)) # majority vote y_pred = np.swapaxes(y_pred, 0, 1) y_pred_final = [most_common_label(tree_pred) for tree_pred in y_pred] return np.array(y_pred_final)

@airesearch0844 3 жыл бұрын

I thoroughly enjoyed learning (1) Decision Tree and (2) Random Forest from your videos. Thanks a lot. The Decision Tree program was sleek and modular and is easy to understand and remember. If you throw in something from your lecture as comments, it will be a great learning tool.

@patloeber 3 жыл бұрын

Thanks :) Glad you enjoyed it!

@dheerajkumark2268 3 жыл бұрын

For Regression instead of most_common_label we can calculate mean of the predictions right?

@shahbazkhalilli8593 Жыл бұрын

what is optimizing for a lot of trees

@abseenahabeeb7917 Жыл бұрын

hi..nice tutorial..what about plotting each tree in notebook?

@jasonyam3282 4 жыл бұрын

Here I see random forest class is using almost all the __init__ from decision tree. Can I use super() to get the init from Decision tree class instead of manually copying the parameters?

@patloeber 4 жыл бұрын

Hi. Good obersvation. But no you cannot, there is no inheritance. DecisionTree is a complete independent class that you can run alone, so it needs all the parameters. And a RandomForest needs all parameters so that it can pass it on to all the trees that it creates.

@fedorlaputin9119 3 жыл бұрын

what about make playlist like this for deep learning, it will be very helpful!

@MdMainuddincse 4 жыл бұрын

You referred your previous video to better understand this video but you did not keep a link of your previous video anywhere. This is not good. You should mention the previous video link in description or a pinned comment.

@patloeber 4 жыл бұрын

thanks for the tip, and sorry I forgot it. I put the links in the description: Decision Tree Part 1: kzbin.info/www/bejne/oIfLZoF3bqqFeqM Decision Tree Part 2: kzbin.info/www/bejne/eKLMaHh8e9uBhck

@prashantsharmastunning 4 жыл бұрын

hi np.swapaxes is giving this error "numpy.AxisError: axis2: axis 1 is out of bounds for array of dimension 1"

@bassamal-kaaki3253 4 жыл бұрын

prashant sharma i am getting same error!

@charmilam920 3 жыл бұрын

thank you for this its amazing

@ZobeirRaisi 4 жыл бұрын

Thanks

@jacjacl324 4 жыл бұрын

Shouldn't it be np.random.choice(n_sampe, size = subset_sample_value, replace = True) so that it choose a subset of data. BTW you should make a paid course, I will surely purchase. I had bought lot's of udemy courses, yours is the best and it's free lol. All the course just give a slight overview (theory) then import scikitLean, 1. import data: do feature scaling and engineering, do some necessary data engineering stuff. then import model. 2. trainTestsplit. 3. fit. 4. predict 5. accuracy done lol. yours is the best.

@howardsmith9032 4 жыл бұрын

Good question.

@patloeber 4 жыл бұрын

indeed good question and yes this can be the case. We could use an optional argument for the tree (max_samples), and then use it here. But I didn't want to make this example too complex, hence I just used all samples. And thanks for the tip. Maybe in the future I will make a paid course :) You can join my newsletter to get informed about such things...

@jacjacl324 4 жыл бұрын

@@patloeber i will, thank you for the videos again :)

@patloeber 4 жыл бұрын

@@jacjacl324 sure :)