No video

Naive Bayes in Python - Machine Learning From Scratch 05 - Python Tutorial

  Рет қаралды 63,673

Patrick Loeber

Patrick Loeber

Күн бұрын

Пікірлер: 119
@patloeber
@patloeber 4 жыл бұрын
There is a slight fix in the fit method that must be applied if class labels do not start at 0: for idx, c in enumerate(self._classes) instead of for c in self._classes
@AliHussain-kb3ew
@AliHussain-kb3ew 4 жыл бұрын
how to solve this problem.what I do. for idx, c in enumerate(self._classes): X_c = X[y==c] self._mean[idx, :] = X_c.mean(axis=0) self._var[idx, :] = X_c.var(axis=0) self._priors[idx] = X_c.shape[0] / float(n_samples) boolean index did not match indexed array along dimension 1; dimension is 5 but corresponding boolean dimension is 1
@alitaangel8650
@alitaangel8650 4 жыл бұрын
@@AliHussain-kb3ew Above code works fine for me, maybe something is wrong with your input data ?
@Dhanush-zj7mf
@Dhanush-zj7mf 3 жыл бұрын
I was stucked for 2 days and also posted question in stack overflow I think I should have watched comments first
@robinsonnadar5457
@robinsonnadar5457 3 жыл бұрын
@@AliHussain-kb3ew Even I am stuck up with the same error :(
@umarmughal5922
@umarmughal5922 2 жыл бұрын
@Python Engineer could you please explain how to apply Laplace to this?
@mattgoodman2687
@mattgoodman2687 4 жыл бұрын
Thank you for this. I had no clue how to conceptually grasp Naive Bayes, but after watching your video I understand it very well
@patloeber
@patloeber 4 жыл бұрын
I’m glad it is helpful :)
@kougamishinya6566
@kougamishinya6566 2 жыл бұрын
I love the way you explain what each line is doing and relate it back to the formulae, that's super helpful thank you!
@tkaczoro
@tkaczoro 6 ай бұрын
Looks like for the same reason you removed P(X) from formula for y, you can also remove the prior term P(y). You will get the same result in calculation of accuracy.
@vanshikajain8353
@vanshikajain8353 3 жыл бұрын
In the second function predict, under the for loop, there is misplaced x which can be replaced by c in class conditional otherwise you get an exception of ValueError.
@chandank5266
@chandank5266 Жыл бұрын
Yeah! Actually I got confused at that point but now its clear. Thanks for confirming :)
@akshaygoel2184
@akshaygoel2184 2 жыл бұрын
Amazing implementation! Small question/point - for the PDF shouldn't the numerator var have a square term? i.e. (2 * var**2)?
@BlackHeart-AI
@BlackHeart-AI Жыл бұрын
f(x) = (1 / (σ * sqrt(2π))) * e^(-((x-μ)^2) / (2σ^2)) In statistics, σ (the Greek letter sigma) represents the standard deviation of a population. The standard deviation is a measure of the spread or dispersion of a set of data around its mean. Standard deviation is closely related to the variance, which is equal to the square of the standard deviation, and is denoted by σ^2. Just σ^2 == variance
@Fresh290PL
@Fresh290PL 2 жыл бұрын
Great video, thanks! Just one thing - how we can avoid the zero-frequency problem in this implementation?
@matthewcallinankeenan2034
@matthewcallinankeenan2034 3 жыл бұрын
@PythonEngineer I'm using this on a large dataset with 8 columns and ~16000 rows. Its saying 'IndexError: index 10000 is out of bounds for axis 0 with size 210" Do you know how I can fix this?
@matthewcallinankeenan2034
@matthewcallinankeenan2034 3 жыл бұрын
What do we change about this program if the class isn't just True/False eg self._classes isn't just [0,1]
@patloeber
@patloeber 3 жыл бұрын
It works for multiple classes, however you have to change the for loop like this: for idx, c in enumerate(self._classes): In my gitHub repo I already updated this fix....
@andreaq.y1770
@andreaq.y1770 4 жыл бұрын
very good tutorial !!! hope you will update more about algorithm implementations
@patloeber
@patloeber 4 жыл бұрын
Thank you! Yes more videos are coming soon :)
@posadzd7343
@posadzd7343 3 жыл бұрын
Good video, learnt a lot, please can you implement Bayes-classifier based on parzen window density estimation?
@abhisheksuryavanshi979
@abhisheksuryavanshi979 Жыл бұрын
No init function inside the NaiveBayes class?
@heidycespedes9220
@heidycespedes9220 Жыл бұрын
Awesome explanation! It helped me to understand the concept and work on my project. Thanks a lot!
@Lanipops
@Lanipops 4 жыл бұрын
Tried to run this but i keep getting this error: ~/anaconda3/envs/XXXXXX6/aima-python-master/naivebayes.py in fit(self, X, y) 15 for c in self._classes: 16 X_c = X[y==c] ---> 17 self._mean[c, :] = X_c.mean(axis=0) 18 self._var[c, :] = X_c.var(axis=0) 19 self._priors[c] = X_c.shape[0] / float(n_samples) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
@omkarpatil4386
@omkarpatil4386 4 жыл бұрын
make your labels binary or encode the labels .
@ramazanburakguler5842
@ramazanburakguler5842 Жыл бұрын
In terms of regularization, what can be done?
@ozysjahputera7669
@ozysjahputera7669 2 жыл бұрын
The pdf implemented here is only for univariate gaussian, correct? Multivariate would have involved covariance matrix inverse, and determinant. Never mind. You assume all features are independent of each other.
@jossyrayonieram5231
@jossyrayonieram5231 2 жыл бұрын
Hi. What do you mean by "classes" here. You mention classes "0" and "1", but still not sure what you meant or why they are called "classes".
@changsinlee4634
@changsinlee4634 3 жыл бұрын
A great tutorial and implementation. Just one correction on the implementation. _pdf is implemented differently than the formula. It should be: numerator = np.exp(- (x-mean)**2 / (2 * var**2)) denominator = np.sqrt(2 * np.pi * var**2) The implemented code is missing the squared part. numerator = np.exp(- (x-mean)**2 / (2 * var)) denominator = np.sqrt(2 * np.pi * var)
@patloeber
@patloeber 3 жыл бұрын
thanks for the feedback. but you are wrong, you may have confused standard deviation and variance. in most formulas (and this video) it is written with the squared standard deviation, which is equal to the variance (so no square when using the variance directly) :)
@changsinlee4634
@changsinlee4634 3 жыл бұрын
@@patloeber Thanks for the quick reply. Ah, yes, I see it. In that case, it should be std**2. You get different values based on whether you use var or std**2. I was comparing the results with those of the standard library (from scipy.stats import norm ) and that's when I discovered the differences.
@patloeber
@patloeber 3 жыл бұрын
@@changsinlee4634 oh this is interesting. Thanks for noticing this! I would expect that std**2 and var are exactly the same except for rounding errors
@MuhammadAli-pf4ww
@MuhammadAli-pf4ww 2 жыл бұрын
Can anyone explain what X_c = X[c==y] is doing? I'm a little confused
@amauryribeiro1860
@amauryribeiro1860 4 жыл бұрын
just... thank you !! for your help! ^^
@patloeber
@patloeber 4 жыл бұрын
You are welcome!
@dinarakhaydarova4898
@dinarakhaydarova4898 2 жыл бұрын
exactly what i needed! thank you bunchesss
@OnlineGreg
@OnlineGreg 2 жыл бұрын
hey, thanks a lot for this series. One question: why do you often put an underscore _ in front of a function or a variable?
@derilraju2106
@derilraju2106 2 жыл бұрын
It's a general way to describe private methods which need not be called in the main function
@shehanjanidu2334
@shehanjanidu2334 3 жыл бұрын
I was using my own csv file as my dataset but it gives ufunc 'subtract' did not contain a loop with signature matching types (dtype('
@samii8104
@samii8104 2 жыл бұрын
So i'm trying to run the algorithm for a dataset which have features for y_train first half 0 and second half 1. The problem is that when im trying to get the predict for the first half of y_train im getting error of dividing with 0. Is there anyway using laplace in the code help me???
@abhisheksuryavanshi979
@abhisheksuryavanshi979 Жыл бұрын
can anyone pls tell why are we adding prior+class_conditional variables?
@srikaramanaganti1285
@srikaramanaganti1285 3 жыл бұрын
can you model class conditional probability using Multinomail distribution
@anjaliacharya9506
@anjaliacharya9506 4 жыл бұрын
I try to implement this in wbcd dataset but getting an error in the line " numerator = np.exp(- (x-mean)**2 / (2 * var))" UFuncTypeError, could you help me with this
@anjaliacharya9506
@anjaliacharya9506 4 жыл бұрын
I have used label encoder to change 'diagnosis' target column to integer type but the error persists in the same line I mentioned. UFuncTypeError: ufunc 'subtract' did not contain a loop with signature matching types (dtype('
@jonn6897
@jonn6897 4 жыл бұрын
I have the same error with another dataset, looking forward to any help!
@anjaliacharya9506
@anjaliacharya9506 4 жыл бұрын
@@jonn6897 I tried converting all columns with feature except target to numpy array for probability calculation, then it works. In my case it is WBCD dataset. y = wbcd_data.diagnosis X = wbcd_data.drop('diagnosis',axis=1) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) #convert all columns with feature except target to numpy array to calculate probability X_train = np.array(X_train) X_test = np.array(X_test)
@patloeber
@patloeber 4 жыл бұрын
try casting your x to dtype=np.float64 before calling fit(), and yes of course it must be a numpy array
@robertrey7002
@robertrey7002 2 жыл бұрын
Hey man that was a great tutorial! I would just like to ask however, is there a way to know when you should use the Naive Bayes classifier?
@no_guarantees
@no_guarantees 2 жыл бұрын
Simplest application would be a binary classifier (0/1) or (no/yes) such as spam classification. You could experiment with NB where you would typically use logistic regression to build your intuition.
@user-tp7ry2sf4l
@user-tp7ry2sf4l 3 жыл бұрын
Thank you so much friend, very helpfull
@patloeber
@patloeber 3 жыл бұрын
Glad you like it!
@BlueSkyGoldSun
@BlueSkyGoldSun Жыл бұрын
Any book you recommend to learn ml in native python?
@FoodieTechVoyager
@FoodieTechVoyager 3 жыл бұрын
Hi, I am new to Machine learning, it would be very helpful if you could provide the dataset too , or share a tutorial on how to create that
@patloeber
@patloeber 3 жыл бұрын
thanks for the suggestion
@tanziahkhanam6451
@tanziahkhanam6451 3 жыл бұрын
I got very less accuracy for my own dataset. Accuracy only 0.3 , what is the reason? And also got warning, RuntimeWarning: divide by zero encountered in true_divide numerator = np.exp(- (x - mean) ** 2 / (2 * var))
@bong-techie
@bong-techie 2 жыл бұрын
how did you fix it, i'm facing the problem now, please help[
@_Shrivi_
@_Shrivi_ 4 жыл бұрын
Hi, very good explanation . Can I use this code to train data for sentiment analysis as well?
@patloeber
@patloeber 4 жыл бұрын
yes
@kidspast7294
@kidspast7294 2 жыл бұрын
Great tutorial thanks!
@AliHaider-hg7lj
@AliHaider-hg7lj 4 жыл бұрын
How can we train any model on it? I mean if we have a csv file so how can we use it on this model?
@patloeber
@patloeber 4 жыл бұрын
load the data with pandas or just manually with open(filename) and convert each line to your x and y vectors. then create training and testing data and train your model
@patloeber
@patloeber 4 жыл бұрын
I'm actually planning to release a short video in the next 1-2 days on how to load your own datasets from csv
@AliHaider-hg7lj
@AliHaider-hg7lj 4 жыл бұрын
@@patloeber Perfect & Thanks:)
@T4l0nITA
@T4l0nITA 4 жыл бұрын
data = pandas.read_csv("file_name.csv") X = data.iloc[samples, features].values y = data.iloc[samples, y_column].values
@joydeepkr.devnath193
@joydeepkr.devnath193 4 жыл бұрын
Hi, great video btw...1 question at 4:43, where you define P(x_i|y) = Gaussian formula..but the Gaussian pdf is a distribution, so to get the probabilities we need integration. So, do we approximate this integration as area inside the rectangle having height=pdf and breadth = some delta. So, since we have a ratio of probabilities in the Bayesian formula, so the numerator delta cancels the denominator delta. So, that is why we dont include that delta term in our formula. Is this how you are doing ?
@patloeber
@patloeber 4 жыл бұрын
This is a very good question! I hope this helps: stats.stackexchange.com/questions/26624/pdfs-and-probability-in-naive-bayes-classification
@joydeepkr.devnath193
@joydeepkr.devnath193 3 жыл бұрын
@@patloeber yes this link was helpful. Thanks !
@patloeber
@patloeber 3 жыл бұрын
@@joydeepkr.devnath193 sure :)
@bryanchambers1964
@bryanchambers1964 3 жыл бұрын
Hey there, I like your videos you explain well but I am confused about something. There is a step in your code where you have: for c in self.classes: X_c = X[c==y] I understand the first line in the code (for c in self.classes:), but I have no idea why you have X_c = X[c==y]., if my c values are for example [ 1, 4, 8] , then X_c = X[1==1] just gives me X_c with an extra dimension. For example if X is a 3x4 matrix, X_c is now the same matrix except it has dimension 1x3x4. Am I just dumb or overthinking this detail?
@patloeber
@patloeber 3 жыл бұрын
Note that y is an array as well, not just a number, and the length of y has to be the same as the first dimension of X! So X_c[1==y] gives you all rows of X where y is 1. Please note also that my code has a slight but. It should be this (compare with my code on Github): for idx, c in enumerate(self._classes): X_c = X[y==c] self._mean[idx, :] = X_c.mean(axis=0)
@bryanchambers1964
@bryanchambers1964 3 жыл бұрын
@@patloeber Thanks, yeah I kind of realized this after a while. So, this will extract the rows of X that have that class y=1. Makes sense.
@godwingeorgethekkanath
@godwingeorgethekkanath 3 жыл бұрын
Great tutorial😍 It was useful for me.
@patloeber
@patloeber 3 жыл бұрын
thanks, glad you like it!
@prithviamin6847
@prithviamin6847 4 жыл бұрын
hi i'm getting this error: UFuncTypeError: ufunc 'subtract' did not contain a loop with signature matching types (dtype('
@patloeber
@patloeber 4 жыл бұрын
Try converting your data to np.float. And check if all your data is valid, probably you have NaN for some data points...
@AliHussain-kb3ew
@AliHussain-kb3ew 4 жыл бұрын
Hi, I face a Same problem ,you got it right. if correct the code please suggest me what I do.
@AliHussain-kb3ew
@AliHussain-kb3ew 4 жыл бұрын
Hi
@T4l0nITA
@T4l0nITA 4 жыл бұрын
Really good explanation.
@patloeber
@patloeber 4 жыл бұрын
Thanks!
@prateekarora4549
@prateekarora4549 3 жыл бұрын
very good tutorial !
@kritamdangol5349
@kritamdangol5349 4 жыл бұрын
I got this errror while performing run .Please provide me solution for this. line 54, in predicted_values=(model.predict(Features_test)) line 20, in predict y_pred=[self._predict(x) for x in X] , in y_pred=[self._predict(x) for x in X] line 29, in _predict line 40, in _pdf numerator=np.exp(-(x-mean)**2/(2*var)) numpy.core._exceptions.UFuncTypeError: ufunc 'subtract' did not contain a loop with signature matching types (dtype('
@patloeber
@patloeber 4 жыл бұрын
probably your datatype or the shape of your vector is not correct. try casting to np.float32
@kritamdangol5349
@kritamdangol5349 4 жыл бұрын
@@patloeber Thank u !
@debatradas9268
@debatradas9268 2 жыл бұрын
thank you
@seyeeet8063
@seyeeet8063 4 жыл бұрын
so NB does not have any updating rule like gradient decent?
@patloeber
@patloeber 4 жыл бұрын
No you just have to pre calculate priors and mean and var, and then apply the formula using Bayes‘ theorem
@viperz301
@viperz301 4 жыл бұрын
Hi! what do you mean by the self that you pass into every function? is it the data frame?
@patloeber
@patloeber 4 жыл бұрын
This is an essential concept of object oriented programming and using classes in Python. self represents the instance of the class. By using the “self” keyword we can access the attributes and methods of the class in python. It binds the attributes with the given arguments.
@jossyrayonieram5231
@jossyrayonieram5231 2 жыл бұрын
@@patloeber out of all the things Python does for you automatically, they stopped with "self". >_
@nafesafirdous3670
@nafesafirdous3670 4 жыл бұрын
If I have my on dataset which is not present in sklearn datasets then how can I make classification? please help!
@patloeber
@patloeber 4 жыл бұрын
You need to load the dataset (probably from a csv file) and setup your X and y numpy arrays
@nafesafirdous3670
@nafesafirdous3670 4 жыл бұрын
@@patloeber Helpful Thanks
@amitupadhyay6511
@amitupadhyay6511 3 жыл бұрын
what if the values in _pdf matrix are inf, then?
@patloeber
@patloeber 3 жыл бұрын
then you have a problem ;) yeah you should add some error checking and maybe clip the allowed range in the calculation
@boooringlearning
@boooringlearning 3 жыл бұрын
great video!
@patloeber
@patloeber 3 жыл бұрын
Thanks!
@nobody2937
@nobody2937 2 жыл бұрын
Also, make sure var is NOT 0 ...
@marcosraphael3390
@marcosraphael3390 4 жыл бұрын
This is an unlabeled classifier?
@patloeber
@patloeber 4 жыл бұрын
No, it is supervised learning
@madsmith1352
@madsmith1352 11 ай бұрын
Guass.. rhymes with house..
@AliHussain-kb3ew
@AliHussain-kb3ew 4 жыл бұрын
How to use this code in python Anaconda ?,
@patloeber
@patloeber 4 жыл бұрын
I have a tutorial for Anaconda setup
@Lanipops
@Lanipops 4 жыл бұрын
need to make the naive bayes file allow 2d array
@patloeber
@patloeber 4 жыл бұрын
try to cast y to int before fitting the data: y = y.astype(np.int)
@tsotnegams
@tsotnegams 4 жыл бұрын
In the pdf method you wrote (2*var), it should be(2*var**2) because of squared variance in the formula. Great tutorial otherwise.
@patloeber
@patloeber 4 жыл бұрын
No. The formula shows the squared standard deviation, which is equal to the variance (small sigma is always used in statistics for standard deviation). probably i should have pointed this out better. thanks for watching :)
@tsotnegams
@tsotnegams 4 жыл бұрын
@@patloeber You are right, thanks for the reply.
@patloeber
@patloeber 4 жыл бұрын
No problem :) you can always reach out when you have questions or find different errors
@AliHussain-kb3ew
@AliHussain-kb3ew 4 жыл бұрын
I try to Run this code on Anaconda an other iris dataset but ,i face a problen.
@patloeber
@patloeber 4 жыл бұрын
Which problem ?
@redhwanalgabri7281
@redhwanalgabri7281 3 жыл бұрын
('Naive Bayes classification accuracy', 0)
@reellezahl
@reellezahl 2 жыл бұрын
You need either a better microphone or to better adjust your sound settings. Your volume levels keep crashing and it's very grating on the ear.
@ragaistanto6722
@ragaistanto6722 4 жыл бұрын
Terimakasih. Untuk teman" lainya saya juga ada nih video tutorial ngoding Naive Bayes python 3 bisa di cek barangkali cocok. kzbin.info/www/bejne/o2Grh3ecmpWeb5I
Oh No! My Doll Fell In The Dirt🤧💩
00:17
ToolTastic
Рет қаралды 13 МЛН
Real Or Cake For $10,000
00:37
MrBeast
Рет қаралды 58 МЛН
What will he say ? 😱 #smarthome #cleaning #homecleaning #gadgets
01:00
Unveiling my winning secret to defeating Maxim!😎| Free Fire Official
00:14
Garena Free Fire Global
Рет қаралды 16 МЛН
Naive Bayes classifier: A friendly approach
20:29
Serrano.Academy
Рет қаралды 142 М.
Naive Bayes, Clearly Explained!!!
15:12
StatQuest with Josh Starmer
Рет қаралды 1 МЛН
Bayes' Theorem, Clearly Explained!!!!
14:00
StatQuest with Josh Starmer
Рет қаралды 359 М.
Andrew Ng Naive Bayes Generative Learning Algorithms
11:54
Wang Zhiyang
Рет қаралды 84 М.
How to implement Naive Bayes from scratch with Python
14:37
AssemblyAI
Рет қаралды 27 М.
Naive Bayes Classifier in Python (from scratch!)
17:15
Normalized Nerd
Рет қаралды 85 М.
I run untested, viewer-submitted code on my 500-LED christmas tree.
45:17
Naïve Bayes Classifier -  Fun and Easy Machine Learning
11:59
Augmented AI
Рет қаралды 438 М.
Oh No! My Doll Fell In The Dirt🤧💩
00:17
ToolTastic
Рет қаралды 13 МЛН