Quantile-Quantile Plots (QQ plots), Clearly Explained!!!

  Рет қаралды 510,263

StatQuest with Josh Starmer

StatQuest with Josh Starmer

6 жыл бұрын

Quantile-Quantile (QQ) plots are used to determine if data can be approximated by a statistical distribution. For example, you might collect some data and wonder if it is normally distributed. A QQ plot will help you answer that question. You can also use QQ plots to compare to different datasets that you collected to determine if their distributions are comparable. This video shows you how to do both things.
NOTE: The data in this video are measures of gene expression. If "gene expression" doesn't mean anything to you, just imagine that the data represents how tall a bunch of people are, or how much they weigh. Then consider the y-axis to be the height or weight of the people, and the x-axis just represents all of the data you collected on a single day. In this case, all of the data were collected on the same day, so they form a single column.
For a complete index of all the StatQuest videos, check out:
statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumroad.com/l/wvtmc
Paperback - www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - www.amazon.com/dp/B09ZG79HXC
Patreon: / statquest
...or...
KZbin Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshirt.com/statques...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
Corrections:
4:35 The Uniform Distribution has one extra quantile
5:30 I should have said that Quartiles divide the data into 4 parts.
#statquest #quantile #qqplot

Пікірлер: 468
@statquest
@statquest 2 жыл бұрын
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@maindepth8830
@maindepth8830 2 жыл бұрын
that intro alone, made me forget my hate for statistics and instantly fall in love with it
@statquest
@statquest 2 жыл бұрын
Hooray!
@timonveurink6335
@timonveurink6335 4 жыл бұрын
Haven't seen the video yet, but that intro earned you a subscription
@angiemycine6509
@angiemycine6509 4 жыл бұрын
It made me think that the whole video was going to be a song lol. Very interesting nonethless
@marioadiez
@marioadiez 4 жыл бұрын
I'am not suscribed for the plots, but for the music!
@setsu2221
@setsu2221 4 жыл бұрын
That intro hit me hard xD
@kittyxing
@kittyxing 3 жыл бұрын
Thanks sooooooo much! This is the only video I found explained the details of generating QQ plot and also make the concept so clear and easy to understand!
@statquest
@statquest 3 жыл бұрын
Thank you very much! :)
@robertopizziol7459
@robertopizziol7459 4 жыл бұрын
I was waiting for the "BAAM" all video long, got just a couple of great "HOORAY!". Thank you for the awesome channel Josh!
@statquest
@statquest 4 жыл бұрын
You made me laugh. :)
@kevon217
@kevon217 2 жыл бұрын
Couldn’t have asked for more clear explanation, thanks!
@statquest
@statquest 2 жыл бұрын
Glad to help!
@aashishshrivastav9531
@aashishshrivastav9531 6 жыл бұрын
🤔🤔🤔🤔🤔 well I thought that q-q plot was difficult but thanks to you I got it now. thanks and keep it up!!!
@alisalehi4980
@alisalehi4980 6 жыл бұрын
I really appreciate from your very easy way explanation. I faced with so difficult and rough terminologies that I could not even understand the meaning of them.
@pradiptithakur3655
@pradiptithakur3655 4 жыл бұрын
Awesome video. Explained so clearly. Really helped me a lot!
@statquest
@statquest 4 жыл бұрын
Hooray! :)
@Clarin3t1
@Clarin3t1 2 жыл бұрын
You had my like at the beginning with the jingle. Thanks for explaining this so well!!
@statquest
@statquest 2 жыл бұрын
Glad you liked it!
@robertb-l5422
@robertb-l5422 5 жыл бұрын
Very well explained, thanks so much
@dominicj7977
@dominicj7977 5 жыл бұрын
Can you do a video on normality tests like shapiro wilk and anderson darling? If not anytime soon, can you share link to some good materials?
@user-ii5ch8nw6s
@user-ii5ch8nw6s 6 жыл бұрын
It's so clear! Thanks a lot for your video.
@hebaebrahem7893
@hebaebrahem7893 5 жыл бұрын
Your videos are cool and concise , thank you .
@robertocannella1881
@robertocannella1881 2 жыл бұрын
Thanks for all the videos! Great music BTW. Also I'm looking forward to rockin' my new SQ hoodie!
@statquest
@statquest 2 жыл бұрын
TRIPLE BAM! Thank you for your support!
@sirisudweeks9334
@sirisudweeks9334 5 жыл бұрын
very nicely explained. it was a tricky concept until this video! thanks!
@statquest
@statquest 5 жыл бұрын
Hooray! I'm glad the video helped. :)
@kusocm
@kusocm 4 жыл бұрын
Best intro song, it can be used as a 'mnemonic' for what QQ plots are used for =)
@josevaldes7493
@josevaldes7493 2 жыл бұрын
Triple BAMM! Serious man your channel is pure art. Thanks
@statquest
@statquest 2 жыл бұрын
Thank you!
@tymothylim6550
@tymothylim6550 4 жыл бұрын
Thank you for the video! It was short and easy to understand :)
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@guillemvia6813
@guillemvia6813 5 жыл бұрын
Awesomely explained! Good job!
@statquest
@statquest 5 жыл бұрын
Thank you! :)
@ThuyPham-yu7cw
@ThuyPham-yu7cw 4 жыл бұрын
wow, now I can clearly understand it ! thanks alot !
@statquest
@statquest 4 жыл бұрын
Hooray!!! :)
@matavalamuttej841
@matavalamuttej841 3 жыл бұрын
You made it very clear man !!! Great doing
@statquest
@statquest 3 жыл бұрын
Glad to hear that!
@gianlucalepiscopia3123
@gianlucalepiscopia3123 2 жыл бұрын
This is very very cool, more likely to learn on KZbin than in a classroom. Grazie
@statquest
@statquest 2 жыл бұрын
Glad it was helpful!
@navatagames
@navatagames 2 жыл бұрын
Nice video. Explained everything in just under 7 mins. Awesome. 😄👍👍
@statquest
@statquest 2 жыл бұрын
bam!
@navatagames
@navatagames 2 жыл бұрын
@@statquest bam indeed. 😁
@piotrszocik7775
@piotrszocik7775 4 жыл бұрын
Great explanation, have a nice day :)
@padraiggluck5633
@padraiggluck5633 3 жыл бұрын
Really excellent presentation, Josh. ⭐️
@statquest
@statquest 3 жыл бұрын
Thank you! :)
@km2052
@km2052 6 жыл бұрын
thanks , awesone , this is useful in measuring gene expression effect
@MeWatchingYouTubeVideos
@MeWatchingYouTubeVideos Жыл бұрын
How helpful! Thanks a lot for your amazing videos
@statquest
@statquest Жыл бұрын
Thanks!
@joaovasconcelos5360
@joaovasconcelos5360 2 жыл бұрын
Your videos are awesome, thank you so much!
@statquest
@statquest 2 жыл бұрын
Glad you like them!
@ehsans2135
@ehsans2135 3 жыл бұрын
so clear, so good , so nuce thank you , Josh
@statquest
@statquest 3 жыл бұрын
Thanks!
@urjaswitayadav3188
@urjaswitayadav3188 6 жыл бұрын
Thanks for the great explanation as always! So QQ is just a way to plot and visualize the similarity of two distributions? Are there any other scenarios when these can be used? Thanks!!
@TheAbhimait
@TheAbhimait 4 жыл бұрын
QQ is mostly used to check tail conditions. Density plots and cumulative plots are the best way to check distribution symmetry.
@jorenmaes498
@jorenmaes498 20 күн бұрын
I just noticed when you said "please subscribe" at the end of the video, the subscribe button lit up:)
@statquest
@statquest 20 күн бұрын
bam! :)
@heplaysguitar1090
@heplaysguitar1090 3 жыл бұрын
Explained like a pro. Tripple BAM!!!
@statquest
@statquest 3 жыл бұрын
Thank you! :)
@thechickendiet
@thechickendiet 3 жыл бұрын
very clear with great examples!
@statquest
@statquest 3 жыл бұрын
Thanks!
@aj_actuarial_ca
@aj_actuarial_ca Жыл бұрын
Thanks a lot for the wonderful explanation!
@statquest
@statquest Жыл бұрын
Thank you!
@response2u
@response2u 2 жыл бұрын
Legendary explanation! Fantastic!
@statquest
@statquest 2 жыл бұрын
Thank you!
@julesd3115
@julesd3115 2 жыл бұрын
Awesome video - thank you SO much for saving my sanity.
@statquest
@statquest 2 жыл бұрын
Thanks!
@alanpdrv
@alanpdrv 2 жыл бұрын
Thanks for this! Finally I understand
@statquest
@statquest 2 жыл бұрын
Hooray!
@wanhope3660
@wanhope3660 6 жыл бұрын
Sweet, its not that difficult to grasp anymore! Thanks
@Shred427
@Shred427 9 ай бұрын
such an awesome video, thanks!
@statquest
@statquest 9 ай бұрын
Glad you liked it!
@Danielbassist13
@Danielbassist13 Жыл бұрын
phenomenal explanation and really cool intro music man!
@statquest
@statquest Жыл бұрын
Thanks!
@joerich10
@joerich10 6 жыл бұрын
is there a statistical test we can do to determine how far away the dots are allowed to deviate, rather than just eyeballing it? Or is eyeballing good enough? I.e. a stat test that could say 'the chance of these 2 distributions being the same is less than X%
@statquest
@statquest 6 жыл бұрын
The "K-S Test" is what you want. However, it is very strict and tends to reject the null too easily. It's one of the few statistical tests where a large p-value (suggesting no difference) is more convincing than a small one. en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
@kissapeles
@kissapeles 3 күн бұрын
@@statquest How were the lines drawn? Least Squares? Maybe doing R^2 calculations can provide an idea? Still trying to grasp my statistics a bit better :( :)
@biancafeitoza4030
@biancafeitoza4030 3 күн бұрын
Thank you for your help! Greetings from Brazil.
@statquest
@statquest 3 күн бұрын
Muito obrigado! :)
@sarrae100
@sarrae100 2 жыл бұрын
How beautiful and simple is that explaination 🥳
@statquest
@statquest 2 жыл бұрын
Thank you!
@richardbarton9076
@richardbarton9076 5 жыл бұрын
This was super helpful!
@statquest
@statquest 5 жыл бұрын
Thank you! :)
@user-or7ji5hv8y
@user-or7ji5hv8y 4 жыл бұрын
great video. a video on the intuition on why q-q plot works might be interesting.
@statquest
@statquest 4 жыл бұрын
I'll keep that in mind.
@arneoosterlinck7590
@arneoosterlinck7590 4 жыл бұрын
Great explanation, thanks!
@statquest
@statquest 4 жыл бұрын
Thank you! :)
@vineetkaur1667
@vineetkaur1667 2 ай бұрын
Very well explained !
@statquest
@statquest 2 ай бұрын
thank you!
@asmaulhosnanisha4657
@asmaulhosnanisha4657 3 жыл бұрын
I could have better grades if i had faculties like you...thank you Josh!!
@statquest
@statquest 3 жыл бұрын
Thanks!
@bingxinyan8103
@bingxinyan8103 2 жыл бұрын
Helpful and easy to undderstand.
@statquest
@statquest 2 жыл бұрын
Thanks!
@user-id1rf6gt6h
@user-id1rf6gt6h 4 жыл бұрын
Helped a lot! Thank you :D
@statquest
@statquest 4 жыл бұрын
Hooray! :)
@hanhan2360
@hanhan2360 4 жыл бұрын
QBUS?
@geogeo14000
@geogeo14000 2 жыл бұрын
And again, thank you for another amazing video ! A little question : most of the points have to fit in the straight line for the data to be considered as normally distributed and at 4:15 you said it is not the case. Althought the intersection points are really close to the line, it does not matter, most of the point have to be strictly ON the line, right ? The fact that other intersection points are close or far from the line does not give any relevant information ?
@statquest
@statquest 2 жыл бұрын
I'm not sure I understand your question. For more details on how to interpret QQ-plots, see: stats.stackexchange.com/questions/101274/how-to-interpret-a-qq-plot
@geogeo14000
@geogeo14000 2 жыл бұрын
@@statquest ok thank you !
@schiu867
@schiu867 2 жыл бұрын
It helps a lot. Thanks!
@statquest
@statquest 2 жыл бұрын
Glad it helped!
@alecvan7143
@alecvan7143 4 жыл бұрын
Best intro by far so far
@statquest
@statquest 4 жыл бұрын
Hooray!!!! :)
@12copablo
@12copablo 3 жыл бұрын
Hoorray! Thx for the video :)
@statquest
@statquest 3 жыл бұрын
BAM! :)
@tallwaters9708
@tallwaters9708 6 жыл бұрын
Thanks for the videos, if you're still looking for ideas how about k-l divergence?
@jameswhitaker4357
@jameswhitaker4357 8 ай бұрын
Not gonna lie, stats is my super weak spot. You've helped me a lot in my Data Models course and interpreting my results. +1
@statquest
@statquest 8 ай бұрын
Happy to help!
@jameswhitaker4357
@jameswhitaker4357 8 ай бұрын
@@statquest Thank you! I'm just kicking myself for not taking more stats courses at this point!
@statquest
@statquest 8 ай бұрын
@@jameswhitaker4357 My stats courses were all pretty terrible, so you never really know what you're going to get. I had to teach myself statistics, and these videos are how I taught myself.
@jameswhitaker4357
@jameswhitaker4357 8 ай бұрын
@@statquest That's what I'm going through right now! I've been using your videos and a "Intro to Statistical Learning with Applications in R" textbook which has helped a lot. I think when I saw terms like "heteroscedasticity" or the crazy formulas I would get scared and put off the studying, until I took a course that required knowing it LOL. And luckily most of these statistical tests and concepts are now pretty easy to perform in programming. Cheers!
@statquest
@statquest 8 ай бұрын
@@jameswhitaker4357 I actually wrote a little about heteroscedasticity. Maybe I should record it.
@Shuffellove
@Shuffellove 5 жыл бұрын
i love statquest!
@statquest
@statquest 5 жыл бұрын
Hooray! :)
@vlakrunn
@vlakrunn Жыл бұрын
You simply saved my life
@statquest
@statquest Жыл бұрын
Bam!
@user-hv9wx5kd9u
@user-hv9wx5kd9u 8 ай бұрын
Best Explanation ever!!! 🎉🎉🎉
@statquest
@statquest 8 ай бұрын
Thanks!
@user-wx4vf5gj2f
@user-wx4vf5gj2f 4 жыл бұрын
Thanks for saving my life
@statquest
@statquest 4 жыл бұрын
Hooray! :)
@Atomflinga
@Atomflinga 5 жыл бұрын
What's the approach for determining which distribution has the best fit for the data? Would the r-squared of the data against the straight line be a suitable measure for how well the distribution describes the data?
@statquest
@statquest 5 жыл бұрын
This is a good question, and, to be honest, I'm not sure what the answer is. I like your idea, but it may oversimplify the problem. i.e. you could get a high R^squared value, but still have some real obvious problems if you looked at it visually.
@alexgimeno170
@alexgimeno170 4 жыл бұрын
Understand it now - thank you!
@statquest
@statquest 4 жыл бұрын
Hooray! :)
@averkij
@averkij 4 жыл бұрын
Thank you, Josh.
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@ananyaagarwal7108
@ananyaagarwal7108 2 жыл бұрын
Hi Josh, Amazing Video there :) Just want to understand the intuition behind the working of QQ plots ? Is it the fact that quantiles of every normal distribution are just scaled up values of a standard normal distribution and that is why we expect a straight line ?
@statquest
@statquest 2 жыл бұрын
Pretty much
@ananyaagarwal7108
@ananyaagarwal7108 2 жыл бұрын
@@statquest Thanks for the response ;) Would really appreciate if you could make something on the same or share some content that could explain the intuition behind QQ Plots.
@alifia276
@alifia276 3 жыл бұрын
Thank you! Awesome explanation
@statquest
@statquest 3 жыл бұрын
Thank you! :)
@bharathkumar5870
@bharathkumar5870 3 жыл бұрын
i have a doubt...why to use this method,instead just plot the points and see if it forms a bell curve....correct me
@statquest
@statquest 3 жыл бұрын
@@bharathkumar5870 I'm not sure I understand your question. Are you asking, "why don't we just create a histogram with the data and see if the histogram looks like a normal distribution"? If so, histograms can be very tricky in terms of selecting the correct bin size. In contrast, with a q-q plot we don't have to worry about optimizing a bin size or anything else.
@bharathkumar5870
@bharathkumar5870 3 жыл бұрын
@@statquest thank you sir ..u cleared my doubt. Different bins give different distributions😀
@lashlarue7924
@lashlarue7924 Жыл бұрын
It's a party time with Josh Starmer and his quantiles! 😆🤘 Party on, Wayne!
@statquest
@statquest Жыл бұрын
:)
@maple49027
@maple49027 3 жыл бұрын
Thanks for this wonderful explanation! I'm curious if we can tell anything about the slope in a QQ plot? Does the slope always equal 1 when the data follows a certain distribution?
@statquest
@statquest 3 жыл бұрын
As long as the x and y-axes are normalized to quantiles, then the slope should be 1 if the data follows a certain distribution. However, usually the x and y-axes are in the original units, and this means that the actual slope isn't super important. What is important is that the points form a straight line.
@maple49027
@maple49027 3 жыл бұрын
@@statquest Thanks!!
@FlopMeister71
@FlopMeister71 5 жыл бұрын
Hi, I understand how the quantile points are plotted wrt observed vs theoretical distributions, what I don't understand is what determines the slope of the straight line. While this is fairly intutitive for a normal distribution, for say the Weibull distribution I am unclear how the slope of the striaght line is used to determine whether the observed vs theoretical quantiles are a good fit for a given distribution. Any ideas?
@ngocnguyen9517
@ngocnguyen9517 2 жыл бұрын
I came here for the same question and left with no answer LOL
@mikii2755
@mikii2755 6 жыл бұрын
This video vas quite helpful
@BossCock17
@BossCock17 5 жыл бұрын
du hast zerfetzt bro, danke
@statquest
@statquest 5 жыл бұрын
Bitte!!!
@81-jdowlwp
@81-jdowlwp 5 жыл бұрын
@@statquest quick question to 2:04 in your video: if we have 15 data points and we divide the dataset into 15 quantiles, then shouldn't the smallest quantile be 0.06666 so around 0.07? because in your video you are saying that it is 0.7, which would mean, that 70% of all data is covered by just one datapoint. Thank you for your video :)
@wenweipeng7056
@wenweipeng7056 5 жыл бұрын
Do I need to matter the exact size size or probability when dividing the contribution? Or just need to only make sure the sizes are equal?
@statquest
@statquest 5 жыл бұрын
Just equal sizes.
@brayanmurillo4427
@brayanmurillo4427 Жыл бұрын
thanks for the explanation, can you clarify this please?: if we have 15 quantiles, then I thought you should plot 14 red lines in the normal distribution and the 15th line should reside in +infinite. and a little question: is the straight line generated by linear regression?
@statquest
@statquest Жыл бұрын
Plotting a line at infinity would be hard to do and you can fit the line with regression.
@apoostle
@apoostle 2 жыл бұрын
Thanks! It helps.
@statquest
@statquest Жыл бұрын
Wow!!! Thank you very much for your support! BAM! :)
@snehasampath3486
@snehasampath3486 Жыл бұрын
Hello, the video was amazing and I was able to get an idea of QQ plots. I do have a question though. How do we draw the normal distribution and uniform distribution? Is it just random?
@statquest
@statquest Жыл бұрын
The normal and uniform distributions are well defined by equations. So we just plug numbers into them to get the values out.
@robderon
@robderon 3 жыл бұрын
precious help, thank you
@statquest
@statquest 3 жыл бұрын
Thanks!
@MasterMan2015
@MasterMan2015 5 жыл бұрын
Step 3 is not very clear. How do you put the lines on the normal distribution. How do you start putting the lines ? and How about the distance between each two lines ?
@statquest
@statquest 5 жыл бұрын
The quantiles for the normal distribution divide it so that the area under the curve between two lines is equal for all of the divisions. Since the normal distribution isn't as tall on the edges, there is more space between lines then in the middle, where the distribution is tall. Thus, the spacing between lines makes the area under the curve between the middle two lines is the same as the area under the curve between lines on the edges.
@MasterMan2015
@MasterMan2015 5 жыл бұрын
Thanks! It is easy to see that in the case of Uniform distribution. How about the starting point ? I think it's randomly that you started by -1.5 but I can start from -2 or -1 or ..
@statquest
@statquest 5 жыл бұрын
The starting point is defined by the need for each unit between lines to have the same exact area under the curve. To understand what this means, imagine you had to divide a normal distribution with a single line so that 50% of the area under the curve was on the left side of the line and 50% of the area under the curve was on the right side of the line. Where would you draw that line? Well, there is only one choice - right down the middle of the normal curve. If you drew it anywhere else, there would either be more area under the curve on the left side or the right side. Now imagine you had to divide the area under the curve into 4 equal amounts. Again, there is is only one option - you put a line in middle, and then you put another line so that the area under the curve on the left side is divided in half and then a third line so that the area under the curve on the left side is divided in half. Any other locations for those lines will result in the areas under the curve not being equal to each other. Thus, in this example, we have no choice about where to put the lines - they have to be put in the one configuration that makes the area under the curve between every pair of lines equal.
@MasterMan2015
@MasterMan2015 5 жыл бұрын
Perfect! got it!
@statquest
@statquest 5 жыл бұрын
Hooray!!! :)
@nutellaturtle2931
@nutellaturtle2931 5 жыл бұрын
How would you tell skewness from looking at the QQ plot? For example if some of the data points fell below the straight line (representative of a normal distribution) does that indicate positive/negative skew? Cheers :)
@nutellaturtle2931
@nutellaturtle2931 5 жыл бұрын
Ah, is it correct to say that, roughly, if at the beginning the majority of points were consistently/mostly above the line since the beginning then you'd say the data is positively skewed, and conversely, if most points laid below the line since initially then the data is negatively skewed?
@statquest
@statquest 5 жыл бұрын
Here's a link that explains how to interpret qq-plots: stats.stackexchange.com/questions/101274/how-to-interpret-a-qq-plot
@outerplanetexplorer1711
@outerplanetexplorer1711 5 жыл бұрын
Little confused by the scales for the axis of a q-q plot (I think I'm overthinking the fitting of the straight line part) What if my sample contains values strictly between 100 - 200 (normally distributed) and I want to check this distribution against the normal distribution. My interpretation of the video is that a q-q plot should return values that fit perfectly on y = x. However because of the scale of my data, surely this isn't true. Rather is it that we can say that they will fit some line y = x + C(arbitrary constant)? Not even sure if I'm right about this haha... Is this the latter the correct interpretation? And in a hands on context we run a regression on a Q-Q plot to confirm this sort of thing?
@statquest
@statquest 5 жыл бұрын
You are correct that the line will not always by y = x, but depends on the values for the different distributions. So y = ax + c is a more "general" line that the data should be on - the important thing is that they data are on a line, any line, if so, then you can use quantile normalization (or z-scale normalization, depending on the distribution) to compare the two samples. For example, I could compare two normal distributions with very different means, one with mean = 4 and the other with mean = 400 (here's the R code): data1
@outerplanetexplorer1711
@outerplanetexplorer1711 5 жыл бұрын
Thank you! And as others have said, you're truly a phenomenal teacher!
@jalbertomendivil
@jalbertomendivil 2 жыл бұрын
I know it may sound dumb but i just got it when i understood that theoretical quantiles were the quantiles of a normal standard distribution or Z-value.
@statquest
@statquest 2 жыл бұрын
bam! :)
@anaswahid8520
@anaswahid8520 4 жыл бұрын
Sir I have been facing problem in ggplot2 package in R programming now a days Could you please help?
@gayathrikurada3315
@gayathrikurada3315 3 жыл бұрын
Hi Josh, can we use percentiles in place of quantiles to plot QQ plot ? If so, in case of percentiles we can only have upto hundred percentile no matter how big our data is then how to have a definitive answer whether or not the 2 datasets have similar distributions as mention in the video at 6:30 ?
@statquest
@statquest 3 жыл бұрын
The terms "quantiles" and "percentiles" are often used interchangeably, and in this case you can swap out quantiles for percentiles. And you can have as many percentiles as you want - however, the largest percentile is always 100. For example, you could have the 0.5 percentile, or the 1.23 percentile.
@gayathrikurada3315
@gayathrikurada3315 3 жыл бұрын
@@statquest Thanks Josh.
@urjaswitayadav3188
@urjaswitayadav3188 6 жыл бұрын
I have another question: does shape of a QQ plot also has some information? Like difference in the beginning or at the end, or overall shift of the values.
@urjaswitayadav3188
@urjaswitayadav3188 6 жыл бұрын
Thank you!
@Dekike2
@Dekike2 5 жыл бұрын
Hi!!! Great video!!!! It was very helpful to understand Q-Q Plots!!!! But just one question, how do you calculate the quantiles for your dataset?? I mean, the first observation of your dataset is 0.6, but I don't understand why, since the first observation leaves 0 observations on one of its sides. Should the quantile be 0? In the video where you explain how to calculate quantiles, you explained that the quantile for each observation is calculated dividing the number of observations that this value leaves below between the total number of observations... So, for the first point... 0/15 = 0. Why 0.6??
@statquest
@statquest 5 жыл бұрын
I think I see the confusion here. The x and y-axes on the QQ-plot (on the right side) are labeled "Normal Quantiles" and "Data Quantiles". This is a little misleading - what we are plotting are the values at each quantile, not the quantile name itself. So if the first quantile is called "quantile 0", but it represents -1.5 in the normal distribution and 0.6 in the data, then we draw a dot at -1.5, 0.6 to represent the first quantile. Does that make sense?
@Dekike2
@Dekike2 5 жыл бұрын
@@statquest Perfectly. I understood this after watching some more videos. I would suggest you to clarify this if you make a new version!! As I already told you, congratulations for your videos and of course, your quick reply!! You explain really well, and the videos are perfect (easy to follow and to understand). I'm doing my Ph.D and it is really helpful people like you. Thanks a lot.
@Fan-fb4tz
@Fan-fb4tz 2 жыл бұрын
@@statquest Thank you very much for all your videos! They help me a lot. Just a follow-up question on this: how can we decide where to start as smallest quantile value in the theoretical distribution? Like you mentioned, "quantile 0" value in the sample distribution is 0.6, but how can it represent -1.5 in the normal distribution? My confusion is normal distribution doesn't technically have "quantile 0" value because it's infinity on the both tails.
@statquest
@statquest 2 жыл бұрын
@@Fan-fb4tz On the left side the first quantile is defined for the first point of 15 data points, meaning that 1/15 of the data is equal to or less than that point. Thus, we find the corresponding point on the normal curve such that 1/15th of the area under the curve is to the left of it.
@yangyu5525
@yangyu5525 5 ай бұрын
@@statquest strictly speaking, the 15 lines(15 data points) divide the whole data into 16 equal groups or parts,So corresponding to normal distribution should be divided into 16 bins so that every bin has the same probability of 1/16 ,right?
@pankajverma3842
@pankajverma3842 3 жыл бұрын
what a nice lecture!
@statquest
@statquest 3 жыл бұрын
Thanks! :)
@sumayyakamal8857
@sumayyakamal8857 2 жыл бұрын
THANK YOU!!!!!!
@statquest
@statquest 2 жыл бұрын
Thanks!
@dioic13
@dioic13 5 жыл бұрын
Nice lecture, but how do u identify boundary conditions, like - 1.5, for normal distribution?
@statquest
@statquest 5 жыл бұрын
This is explained at the very start of the video at 0:38. We have 15 data points, so our data have 15 quantiles. We then divide the normal distribution into 15 quantiles. Each quantile should have an equal probability - thus, with the normal distribution, the quantiles on the edge are relatively far apart, to compensate for the relatively low probability of observing a value out there. In the middle of the curve, the quantiles are close together since there is a higher probability of observing a value there. Since each quantile has to have the same probability, then there is only one way to configure the 15 lines that we draw. If that last part doesn't make sense, then just imagine we only had one quantile - so we needed to divide the normal distribution into two equal parts. Where would we put the line? Well, there's no choice involved here because there is only one location for that line - right in the middle. Similarly, when we have to divide the normal distribution into 15 equal parts, there isn't a choice about where to put the lines, there is only one option.
@tawkameyu
@tawkameyu 4 жыл бұрын
It just saved me, the person who did this => you're the best
@statquest
@statquest 4 жыл бұрын
Thank you! :)
@user-bz8nm6eb6g
@user-bz8nm6eb6g 2 жыл бұрын
Thank you!!!
@statquest
@statquest 2 жыл бұрын
bam!
@MusafirHoonYaro
@MusafirHoonYaro Жыл бұрын
Mr. Starmer: I am trying to understand the values 0.6, 1.1, 1.9 etc. that you have for the points on the y-axis. Are these the "raw" or "observed" data or are these numbers derived from some calculation? And, these points have a corresponding values on the x-axis (-1.5, -1.2, -0.89 etc.). I am totally lost as to how these values were derived. I am trying to understand this video in the context of linear regression where I have seen "sample quantiles" plotted on y-axis against "theoretical quantiles" on y-axis but it is not clear as to how these values have been derived? I apologize if my question doesn't make sense - I am not sure as to how to word the question purely because of my ignorance of the topic. Thank you for any direction you may give.
@statquest
@statquest Жыл бұрын
Those are raw measurement values. The corresponding x-axis values come from asking a computer program (like 'R', or even excel) to give us the quantiles.
@praveenchalampalem4038
@praveenchalampalem4038 4 жыл бұрын
You are really Awesome!!!
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@bin4ry_d3struct0r
@bin4ry_d3struct0r Жыл бұрын
I always wondered how statisticians choose a distribution to which to fit the data when eyeballing it is insufficient. Now I know the answer: QQ-plots. Thank you for this!
@statquest
@statquest Жыл бұрын
bam! :)
@Azazello1482
@Azazello1482 6 ай бұрын
Hi, thanks for these videos! I'm confused by the logic of what happens at 1:20, when you split the data using 15 lines that occur at the points themselves. Wouldn't the intuitive thing to do be to cut the _space_ so each point resides in its own piece, rather than to use the data (i.e., the demarcations) as the cut points? In other words, you need only 14 lines to slice this data space into portions that each contain a single point. When this is then repeated on the standard normal distribution curve, wouldn't we then want to slice that curve into n equal pieces using n-1 cuts (not make n cuts resulting in n+1 equal pieces)? Edit to add one more question. I feel like I mostly get it, but I'm fuzzy on the intuition that underpins why this all works. Is it enough for the line to be straight, or must it also be at a 45-degree angle? I see how we're graphically connecting the data to the standard normal distribution, but I'm not certain as to what the intuitive connection is. It feels like it has something to do with calculus---that we're trying to compare the rates of change between quantiles for both our data and the data in a known distribution to ensure that they are constant and in lockstep.
@statquest
@statquest 6 ай бұрын
We put the lines on the points, because those are the known values we are working with. And the line only needs to be straight - the angle doesn't matter because it's a function of the scale that the data are on.
@raghavgaur8901
@raghavgaur8901 4 жыл бұрын
Hi Josh,I wanted to know that how did you choose the 4 data points from the original set by observing the another set of datpoints containing only 4 data points.
@statquest
@statquest 4 жыл бұрын
Can you tell me what part of the video (time and seconds) you are talking about?
@fkhan4504
@fkhan4504 6 жыл бұрын
Crytal clear explanation
@statquest
@statquest 6 жыл бұрын
Thanks! :)
@bobo0612
@bobo0612 3 жыл бұрын
Thank you
@statquest
@statquest 3 жыл бұрын
Thanks!
@Yambaization
@Yambaization 4 жыл бұрын
5:30 I am confused... I thought that quartiles are three (not four) values, which divide the dataset into four equal numbers of data points. In your example you say that four data points are quartiles? 🤔
@statquest
@statquest 4 жыл бұрын
Oops. That's a mistake. Quartiles divide the data into 4 parts.
@Cozmaus
@Cozmaus 8 ай бұрын
Actuary studies is something else bro
@statquest
@statquest 8 ай бұрын
Noted!
@pratapseshachalam2859
@pratapseshachalam2859 5 жыл бұрын
very nice video. What's the difference between normal and uniform distribution? I thought both are same
@statquest
@statquest 5 жыл бұрын
Normal Distribution: en.wikipedia.org/wiki/Normal_distribution Uniform Distribution: en.wikipedia.org/wiki/Uniform_distribution_(continuous)
@adhiyamaanpon4168
@adhiyamaanpon4168 4 жыл бұрын
hey joh one doubt... i read one article stating inorder to check whether a dataset is changing over periodic intervals(similar to stock market dataset but in this case we don't know whether it changes or not),so to find out they will split the data into parts(previous week data will be treated as test data and week before the previous week will be considered as train data..now they have mentioned we will plot QQ plot between these 2 datasets to find if they are similar or not..if they are similar we can understand dataset hasn't changed over period of time... My doubt is till now i have seen qqplot for a single variable only...but how this concept can be extended to entire dataset??
@statquest
@statquest 4 жыл бұрын
I guess you could do it separately for each variable in the dataset.
Quantile Normalization, Clearly Explained!!!
4:52
StatQuest with Josh Starmer
Рет қаралды 70 М.
Quantiles and Percentiles, Clearly Explained!!!
6:30
StatQuest with Josh Starmer
Рет қаралды 310 М.
1❤️#thankyou #shorts
00:21
あみか部
Рет қаралды 76 МЛН
La revancha 😱
00:55
Juan De Dios Pantoja 2
Рет қаралды 41 МЛН
Каха инструкция по шашлыку
01:00
К-Media
Рет қаралды 8 МЛН
Covariance, Clearly Explained!!!
22:23
StatQuest with Josh Starmer
Рет қаралды 536 М.
How to Interpret Quantile Quantile Plot (QQ Plot)
5:42
Selva Prabhakaran (ML+)
Рет қаралды 1,8 М.
Normal Quantile-Quantile Plots
12:09
jbstatistics
Рет қаралды 232 М.
How To Create A QQ Plot In Excel
8:41
Steven Bradburn
Рет қаралды 50 М.
Maximum Likelihood, clearly explained!!!
6:12
StatQuest with Josh Starmer
Рет қаралды 1,3 МЛН
p-values: What they are and how to interpret them
11:21
StatQuest with Josh Starmer
Рет қаралды 1 МЛН
Normal Probability Plots Explained (OpenIntro textbook supplement)
10:15
Testing For Normality - Clearly Explained
9:56
Steven Bradburn
Рет қаралды 183 М.
Percentiles, Quantiles and Quartiles in Statistics | Statistics Tutorial | MarinStatsLectures
7:30
MarinStatsLectures-R Programming & Statistics
Рет қаралды 55 М.
1❤️#thankyou #shorts
00:21
あみか部
Рет қаралды 76 МЛН