Tutorial 33- P Value,T test, Correlation Implementation with Python- Hypothesis Testing

Рет қаралды 173,522

Күн бұрын

Пікірлер: 145

@srinathyasoda5545 3 жыл бұрын

Great Video, Kudos! But, we should never say "we are accepting Null Hypothesis", we SHOULD Say it as "We are Failing to Reject Null Hypothesis", as there is always a possibility of error that lies with our sample data

@shreyaskulkarni7612 3 жыл бұрын

It's really true

@ammar46 2 жыл бұрын

Height and age doesn't follow Poisson distribution, it follows normal distribution.

@zijieliu6654 2 жыл бұрын

This is the best explanation of the T test on KZbin no doubt!!!

@nilupulperera 4 жыл бұрын

Thank you Krish for the introduction of statistical tools in python. Now only I realized how to do comprehensive statistical analysis without depending on Microsoft Excel (and addon software apps) which has limited capabilities in the Data Science field.

@manikantasai721 5 жыл бұрын

Very near to...100k sir congrats sir!

@ivancarrillo1889 4 жыл бұрын

Thanks for the explanation. By the way this is the second video from you I watched. I understood you much more clearly this time I guess because of the microphone (non native English speaker). Please keep it in mind.

@mallikharjunv6805 3 жыл бұрын

Thank you so much Krish ..excellent.

@Ghodkeshubham6cool 2 жыл бұрын

Very great learnings✌🏻

@shrikantagrawal6642 4 жыл бұрын

Summary: One categorical variable One sample t-test Two categorical variable Chi square test One continuous variable T test Two or more continuous Correlation and then T-test variable One Continuous and one T test categorical which has two categories One Continuous and one ANNOVA Test or more categorical which has more than two categories Two variables and you want Two sample T test to compute if their means are different One variable and we have Paired sample T test created one more variable based on first variable by adding some proportion to it on time basis @Krish - Please suggest

@cinemascope8847 4 жыл бұрын

Shrikant Agrawal please answer this Krish as it makes sense for all of us

@narensingh6728 Жыл бұрын

Thanks bro

@SameerAli-nm8xn Жыл бұрын

You are great Sir, that's a lot.

@mommysboy8015 2 жыл бұрын

Great teaching skill

@pavanaramu721 3 жыл бұрын

The previous video teaching was excellent. Present video need to include chalk board activity for better understanding......

@sumittagadiya3497 4 жыл бұрын

very good explanation sir, thanks a lot

@venkivtz9961 4 жыл бұрын

Hi Krish, your explanation and example is excellent. But a small correction to the conclusions at the end of the test. We have to conclude the statement with respect to the alternate hypothesis. We should never say that "we are accepting null hypothesis".

@sandipansarkar9211 4 жыл бұрын

watched the video for the second time and practices on to Jupiter notebook.Thanks

@harishkumar-zx6vg 4 жыл бұрын

Y he used np. Random. Seed (6)?

@harshdewangan1951 3 жыл бұрын

@@harishkumar-zx6vg np.random.seed(n) is used to make the random number predictable, i.e., we will get same set of numbers whenever the code executed

@skumarr53 5 жыл бұрын

Thank you so much, sir, for the effort you are putting to educate us all. I want to make a career transition in Artificial Intelligence in computer vision and the NLP processing field. My question is do I need to be familiar with the ML concepts like feature engineering etc and ML algorithms or is it enough if I focus only on Deep Learning. I don't see much overlap between those two but both are treated as part of Data Science in the industrial setup.

@snehakoul9818 4 жыл бұрын

@adiflorense1477 Жыл бұрын

1:49 krish, i have question. what t-test we use to see difference in machine learning model?

@soujanyabagam2034 3 жыл бұрын

sir, what is the difference between mean that you are passing as an argument in possion distribution and mean you are calculating subsequently?

@shivangikdesai 2 ай бұрын

@001Debjeet 5 жыл бұрын

btw congo on 100k silver incoming

@sandipansarkar9211 4 жыл бұрын

Awesome video Krish but don't forget to practice on Jupiter notebook.Thanks

@batulkhan9772 4 жыл бұрын

HEY, @ 7:07 minutes you're telling to reject the null, but the p_value is more than the 0.05

@swarajkumarsahoo4736 4 жыл бұрын

yeah, and the output shown is "We are accepting null Hypothesis"

@questforprogramming 4 жыл бұрын

We fail to reject the null hypothesis, because it is > 5%. 74% >5%

@questforprogramming 4 жыл бұрын

@@swarajkumarsahoo4736 he didn't run that cell at all. He ran previously means not the n video. So that output is wrong I guess and what he said is also wrong

@BlueSkyGoldSun 2 жыл бұрын

Yes me to iam confused , did he make a mistake?

@06madhav 5 жыл бұрын

Bhai, incredibly clear video. One doubt- how to go ahead with the hypothesis testing looking at the dataset? Means, how to decide whether any sort of such tests are required to be done on the dataset?

@mishuchugh1777 3 жыл бұрын

@madhav srimohan..did u got this?

@divyanshuaswal1843 Жыл бұрын

@@mishuchugh1777 did u got this?

@wealth_developer_researcher 3 жыл бұрын

Sir, I have a doubt. On timeline 7:03 you said we reject null hypothesis in this case. But p_value > 0.05 and output is we are accepting null hypothesis. Please correct me if i am wrong

@kaifahmed316 2 жыл бұрын

Same here

@BlueSkyGoldSun 2 жыл бұрын

Iam also confused

@sh__-- Жыл бұрын

Please correct guys Accepting the Alternative hypothesis and rejecting the null hypothesis is the correct answer. Mistakes will happen sometimes😊 I am also a learner..👍

@louerleseigneur4532 3 жыл бұрын

Thanks Krish

@rambaldotra2221 3 жыл бұрын

Thanks a lot Sir

@Himanshusingh-ep1hc 3 жыл бұрын

@10:35 the p value 1.1390 which is greater than 0.05 but still its printed rejecting null hypothesis ?

@Abhishek-st4mu 3 жыл бұрын

same here, @10.00 i confusing on that statement, how can 0.05 is greater than 1.139

@sumitmaiti2218 4 жыл бұрын

Great explanation sir... It clears the understanding of the concepts... I have just one doubt: How are we selecting which statement to be the Null Hypothesis and which one for the Alternate Hypothesis? Because based on that and the p value, we would come to the conclusion.... Thanks :)

@samerrkhann 4 жыл бұрын

Usually null hypothesis is used when we say there's no difference between two groups. For example, you draw a sample from a population and want to check if there is any difference between the mean of the sample or mean of the population. You will make null hypothesis that the two means are no different. Similarly when comparing two groups if you want to check if there means are same, you will develop null hypothesis that there are no difference between two. One last example, first let's say you flip a coin 5 times and get heads 5 times. You will make a null hypothesis that my coin is no different than the normal coin. Hope this helps :)

@snehalpophale6287 2 жыл бұрын

Thank you so much!

@pratikshagwalwanshi8676 4 жыл бұрын

When we already established poisson mean (mu) as 30 in classA_ages=stats.poisson.rvs(loc=18,mu=30,size=60) Then why do we get different value for classA_ages.mean()?

@mahenderboda1339 3 жыл бұрын

I also got the same doubt got any answer?

@madhavilathamandaleeka5953 3 жыл бұрын

I also ....☹️.....and how can we take those mu values ..?? Plz anyone clear my doubt

@pratikshagwalwanshi8676 3 жыл бұрын

Nope didn't get it yet. If someone gets this doubt clear then please tell.

@sashpatra88 4 жыл бұрын

Hi Krish, Can you please share the ANOVA Implementation with Python video as I couldn't find it in your list?

@madhureddy5328 5 жыл бұрын

Why we do P or T or anova test? If we come to conclusion what we do with the dataset

@gopichand8874 3 жыл бұрын

Have you got the answer ?

@Arasu89 2 жыл бұрын

Hi Krish, After rejecting the Null Hypothesis or accepting the null hypothesis, what is the next steps we will do with data. Do we remove the features?

@anveshpoloju7331 5 жыл бұрын

Hi Krish, for beginners can you please suggest 'order' of preparing for DATA SCIENCE... For example 1st statistics 2nd python 3rd ML 4th DL.... Or simultaneously. Where to start exactly is confusion for many people.... Thank you

@megirija1897 4 жыл бұрын

pls upload video on impementation for anova and chi square test...

@somtonnamah5734 3 жыл бұрын

please i would like to know if the distribution of data groups matter when checking correlation

@sumitsaurav1710 4 жыл бұрын

Hi Sir The value of mu selected as 30 in "classB_ages=stats.poisson.rvs(loc=18,mu=30,size=60)" is mean of what? and how does it differ from classB_ages.mean()

@kiranchowdary8100 4 жыл бұрын

ya same doubt i think mu is possion parameter

@ankitgadwe2200 4 жыл бұрын

@@kiranchowdary8100 You are right. It is some parameter. You can check it here: docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.poisson.html

@architagarwal7379 2 жыл бұрын

Bootstrapping is enough to implement this all t test, chi swuare test etc ??

@payalbhattad8048 10 ай бұрын

Hey Krish, it was a great explanation. I have 1 doubt though, I am trying to use it on my dataset where I am finding if there is a significant difference between gender based on salary. But I am getting p-value as nan even if there is no null values in the dataset. Whereas if I am performing the same on SPSS it gives a p-value of 0.43. any suggestions?

@soujanyabagam2034 3 жыл бұрын

sir, are you using poisson distribution to create a fake data set of ages since we dont have a real data set?

@veereshbk4394 4 жыл бұрын

99k subscribers were there while making this video, now it is 262k subscribers, means 262k-99k people are on the way to become data scientist during covid pandemic. 2021 will have less demand for data scientists as supply is increasing! This is just sample testing

@shahidraza7965 Жыл бұрын

Can someone tell me why the mean calculated differs from mu in one t test of class_ages

@DataInsights2001 3 жыл бұрын

Nice! Test control analysis is good for if promotion programs apply to test group and test whether there is any difference? Also nice to know that which direction to test, whether it is two tailed, left tailed, or right tailed? Also need to consider Type 1 error and type 2 errors?

@neelroy3 2 жыл бұрын

which statistical test can be used to find difference between two groups' percentage values?

@prashu25925 Жыл бұрын

can we perform hypothesis testing on multiple column data?

@itzmekallam7277 2 жыл бұрын

how can 1.13 < 0.05, at 10:10 , is it mistake or just Krish Naik logic

@NishatJillani Жыл бұрын

At the end it shows -13 in power which you actaully missing . so how -13 power is greater then 0.05 even.

@gunjanchaturvedi-m1b Жыл бұрын

Let’s suppose 5 years ago, the average cost-per-person at a cafe was 300, has it changed now. (perform hypothesis testing to conclude). how to solve this which test need to perform here ..

@kusumakatamneni3404 5 жыл бұрын

Hi Krish we have to perform T-test in between two population is meaningful but how can we do on one population? How can we get significance difference on one population?

@i_black_hawk 5 жыл бұрын

if you need to test effect of drug on a sample of population then same sample of population needs to be taken before and after drug dose . This is done to reduce bias. And in such type of stats modelling we use paired sample t test

@i_black_hawk 5 жыл бұрын

Another example you can take is effect of workout on weight of a person. You need same sample of people whose weight were recorded before work out

@kusumakatamneni3404 5 жыл бұрын

@@i_black_hawk tanq so much

@nishabhatt5268 Жыл бұрын

Thanks for sharing this video it is very helpful, can you please advise on how to integrate python script for p value in power bi? Many thanks

@ankita684 4 жыл бұрын

Thank you for this video Krish. In the two sample t test , 'ttest_ind' function you have taken equal _var to be 'False'. The code reads as : _,p_value=stats.ttest_ind(a=classA_height,b=ClassB_ages,equal_var=False). Shouldnt the 'equal_var' be True? As T test assumes that the populations have identical variances by default. Could you please check once. Thanks

@munishasharma5265 5 жыл бұрын

Hi Krish, I hope you are doing, Awesome explanation. However I have One Question What we will do after the analysis of these P test values?

@piyushsaurav5791 4 жыл бұрын

Hypothesis testing has been used by analysts to make inferences about the population .These tests are done to answer business questions eg . A/B testing ( version A is better than version B etc)

@adityamahimkar6138 3 жыл бұрын

I'm not an expert bt I think on a dataset we can make lot of changes statistically and train a model on the data bt using such test we can in the first look out of corrections in the data before training thus saving time and computation power. Do correct if I'm wrong, it just a null hypothesis 😅 :)

@prasadrajmane4696 2 жыл бұрын

Thank u sir

@hvchetan1 4 жыл бұрын

How do we get to know whether the test is one tail or two tail test... How python interpret this thing whether it's a one tail or two tail test?? As we are not specifying that thing .

@sunilpatil1923 5 жыл бұрын

Hello, thanks for the detailed explanations, Why p value is 5% only? Why can't it is 10% or 8% or any other value? Pls clarify.

@krishnaik06 5 жыл бұрын

It is decided before itself and yes it may changes...this value is decided by domain expertise

@Andynath100 4 жыл бұрын

Look up Statistical Hypothesis testing(Inferential Stats), It depends on the confidence level (alpha) for the test. In stats this measure of likelihood is taken as .05 (5%), .01(1%), .001(.1%) or in simpler terms what is the likelihood that the alternate is true if we assume the Null to be true. The t test calculates a p value and if it is less than alpha (.05 or 5% in this case) we reject the null because the probability of getting this sample if the null is true is very small (magnitude of p value). Or in simpler terms it's not possible (or very unlikely) to get this sample by chance. If you want a better explanation please follow the Udacity Intro to inferential stats course, its free.

@abhinavjain5561 3 жыл бұрын

Sir in 1sample t test second example,we take the mean of classA_ages as 30 but in next step it is coming 46 so what about that

@shashikantrrathod3617 4 жыл бұрын

Hello Krish, What are the limitation of linear statistical test? why we choose non-linear classifier over linear classifier?

@rsinh3792 3 жыл бұрын

Sir reviewer has asked me this question I don't know how to address it, can you please guide me "Use some statistical significant test such as T-test or ANOVA to prove you validate the proposed diagnostic model on patients and quality improvements of your method". I have two datasets. Dataset 1 was used to train the model and dataset 2 was used to validate the trained model. I have trained the ML model deployed it and Validated it on new data and presented the results. Actually, I have understood the question. Shall I apply the statistical test between the performance metrics of trained model results and validation results? Please help me, sir.

@aksaini9063 3 жыл бұрын

Sir if two independent feature are highly positive correlated or highly negative correlated. What is the best solution for this ?is the right to drp the one feature?

@itplacementprep 3 жыл бұрын

Summary : A One sample t-test tests the mean of a sample group against a known population mean. Two sample t test, An Independent Samples t-test compares the means for two groups. A Paired sample t-test compares means from the same group at different times (say, one year apart).

@pramitbanerjee4381 3 жыл бұрын

Why the classA_ages.mean() and classA_ages.var() not equal and what is the role of mu if it is not equal to classA_ages.mean()?

@santiagorey1382 4 жыл бұрын

very good video but please tell me, if the two features are statistically different, p_value> 0.05, then does that mean we should discard or keep that feature?

@autismblessingindisguise 4 жыл бұрын

Could not see the next video on chi Sq test implementation using python.. Please load it soon. Thanks

@tannurohela6192 2 жыл бұрын

Great explanation sir, but I didn't get why are we using np.random.seed() ? Can anyone please help with the seed thing.

@meenadalvi9743 3 жыл бұрын

@5.43 the p value is greater than 0.05 so in that case we should accept the NULL hypothesis.Correct me if am wrong.

@Abhishek-st4mu 3 жыл бұрын

same here, @10.00 i confusing on that statement, how can 0.05 is greater than 1.139

@sohinibanerjee9617 3 жыл бұрын

In the second one sample t test how is p value of 1.13 less than 0.05? Can someone please explain.

@NishatJillani Жыл бұрын

t the end it shows -13 in power which you actaully missing . so how -13 power is greater then 0.05 even.

@zionramdinthara8403 4 жыл бұрын

Hi Krish, can i compare two different population using t test. I want to compare the height of plants overtime with controlled and uncontrolled temperature. I actually have different datasets for both. Please help

@dipk.mishra 4 жыл бұрын

Sir , How do u decide whether Null is there is no difference ? Is there any logic behind?

@ammar46 2 жыл бұрын

Height and age doesn't follow Poisson distribution, it follows normal distribution.

@hemantsharma7986 4 жыл бұрын

Is one sample t test and one tail t test same?

@mohinimarathe8769 3 жыл бұрын

GOD OF STATS :)

@dharamjeetsingh2936 4 жыл бұрын

Krish i have a 3 years experience in business resilience analyst but we only use Excel not python SQL tableu. Do u think i have an advantage for becoming DS

@sohinimitra5131 4 жыл бұрын

In the first example of ages, you passed the expected NULL hypothesis value as 30 [ttest(ages_sample,30)]. Shouldn't it be 0? [ttest(ages_sample,0)]. Since The NULL hypothesis states the difference between mean of population and sample is 0. Why is the population mean passed there? Also, in many scenarios we will not have access to the population mean too.

@lancelotdsouza4705 2 жыл бұрын

pls explain this on a real dataset

@c.dharmeshwaran3470 4 жыл бұрын

In the 2 Sample T-Test, Were the samples/groups are selected from the same population or from 2 different populations?

@soumyaranjansahu4262 3 жыл бұрын

Hi Krish , In this School age problem, You have taken Sample Size=60 which is more that >30 .Hence shouldn't you calculate the P value on based on z- distribution rather than t-distribution?

@amansinghrathore8308 3 жыл бұрын

The t-test can be applied to any size (even n>30 also).

@manitachakraborty2348 3 жыл бұрын

can u please solve this T test problem without python

@arpitjaiswal5972 3 жыл бұрын

Why T test is used? Because there is no information given for Population SD so Z can't be used. If population SD was given then use Z test t distribution is Normal distribution / chi square. Check the formula you will be able to find the realtion

@anupamjamatia Жыл бұрын

hi, your tutorial is great, but I have a doubt regarding the statistical significance in this scenario -- if I do train data on Lang0 language and generate a model. afterward using the Lang0 model I do testing on other languages like Lang1, Lan2...Lang5 used different algorithms like AlgoA, AlgoB, and AlgoC and got the accuracy. so in that case is it possible to do the statistical significance test? no cross-validation is done while training. Say I have Lang Algo1 Algo2 Algo3 Algo4 Algo5 Lang1 80 32 95 93 96.67 Lang2 88 11 98 97 92.51 Lang3 49 12 76 80 72.75 Lang4 81 2 95 94 77.7 Lang5 81 43 95 96 94.95

@adidbaker7607 3 жыл бұрын

hey guys ive got a doubt in first one sample t test he said he is rejecting the null hyp when the p value is 0.740 which is higher than 0.05 ,so is isnt he supposed to accept the null hyp??

@richasharma7968 Жыл бұрын

I have the same doubt. Can anyone explain it?

@NishatJillani Жыл бұрын

t the end it shows -13 in power which you actaully missing . so how -13 power is greater then 0.05 even.

@pratikbambulkar8981 3 жыл бұрын

But why we used hypothesis for ML?

@sunitam1025 2 жыл бұрын

sir, can you provide pdf of this

@OnkarSingh-rg5jp 3 жыл бұрын

Sir, in what case do we divide the p-value by 2?

@siyabongamyeza5315 2 жыл бұрын

He was supposed to divide the p-value by 2 since its a two sided test. Two sided test occur when you use the word "difference". When it is one sided, i.e either less than or greater than, you do not divide alpha by 2

@lopamudrachandra2493 3 жыл бұрын

Thank you so much for your video. Your channel is really helpful for students who cannot afford to online courses. I would like to know if I join your 59/month membership will it help me learn better on Data Science overall?

@ankeshsingh2576 4 жыл бұрын

If you execute the function ttest_1samp(), p_values keeps changing after every excution, varying from 0.05. How can we fix it ?

@shreyasaxena5169 4 жыл бұрын

If you execute random.choice then it will resample and change mean accordingly. For same sample , p value cannot vary.

@questforprogramming 4 жыл бұрын

Fix a number in random state.

@utsavroy5346 4 жыл бұрын

What if I reverse the assumptions? I means if H0 becomes H1 and vice versa. In that case how to move ahead?

@akashprabhakar6353 4 жыл бұрын

Yes you can do but u need to ensure that null hypothesis statement is chosen in such a way that you can conduct the experiment based on that null hypothesis. For example, you observation is that you got 10 head on 10 coin toss. Now you want to check if the coin is biased or not. Now, If you take Ho(null hypothesis ): coin is biased...then the problem is how will u find the p value or conduct the experiment ..bcz the coin can be biased with any probability And suppose you take Ho : coin is unbiased ...means probability of getting 10 heads on tossing the coin 10 times is : (0.5)^10......as probability of getting one heads is 0.5 for single toss when ""coin is unbiased"" Now u will get the values as 0.00097

@devmani100 4 жыл бұрын

Since you are dealing with the sample size and not the population, the relationship you might be getting from the sample may be due to random chances. The idea behind the null hypothesis is that relationship you are observing in the variables are due to randomness. S, my null hypothesis is always of the form that, "There is no relationship between the selected variables. This is what I have derived from all the sources from the StatsLand :D . Please correct me if I an wrong.

@pratikbhansali4086 4 жыл бұрын

What are we even achieving by doing one sample t Test

@thousandsunny100 4 жыл бұрын

ttest, p_value = ttest_1samp(covid, 30) TypeError: 'module' object is not callable

@ppriyesh30 4 жыл бұрын

Sir, this is unfair...I am just trying to build concepts of Data science and I come to know that there has been some term used which are totally new. Some times you import preprocessing, sometimes model_selection, sometime metrics, and, now, import maths..and poisons distribution and scipy stats.. please let us know when to choose what..Thanks..

@ppriyesh30 4 жыл бұрын

Specially please help us with the scikit learn library..remaining I guess has not that much importance

@ManishKumar-qs1fm 5 жыл бұрын

Plz corr explaine in details , bz m confuse in this

@adeyinkaAdedejiNaMe Жыл бұрын

Educative video but you calculated the population to be 30.4375 why did assume the population mean to be 30

@shrikantdeshmukh7951 3 жыл бұрын

Poisson distribution it's not poison distribution

@BlueSkyGoldSun 2 жыл бұрын

Fix the mistake ,how come 1.13 is less than 0.05?

@karanbisht6359 2 жыл бұрын

it was 1.12 *10^-somthing

@SoumyaDasgupta 3 жыл бұрын

Krish likes WWE. My Man

@aws6143 3 жыл бұрын

dimaag ho to essa ho jinda to pappu bhi h

@vaibhavberiwal 4 жыл бұрын

Watch khan academy videos for a more intuitive and in-depth understanding of the concepts :)

@rsinh3792 3 жыл бұрын

Sir reviewer has asked me this question I don't know how to address it, can you please guide me "Use some statistical significant test such as T-test or ANOVA to prove you validate the proposed diagnostic model on patients and quality improvements of you method" I have trained the ML model deployed it and Validated it on new data and presented the results. Actually, I have understood the question. Shall I apply the statistical test between the performance metrics of trained model results and validation results? Please help me, sir

@001Debjeet 5 жыл бұрын

1st

@manikantasai721 5 жыл бұрын

i too ...will do the same once upon a time .