The Kolmogorov-Smirnov test - are stock returns normally distributed? (Excel) (SUB)

Рет қаралды 10,910

Күн бұрын

Пікірлер: 52

@NEDLeducation 4 жыл бұрын

You can find the spreadsheets for this video and some additional materials here: drive.google.com/drive/folders/1sP40IW0p0w5IETCgo464uhDFfdyR6rh7 Please consider supporting NEDL on Patreon: www.patreon.com/NEDLeducation

@samm8457 4 жыл бұрын

Hello Sir, can you please explain in detail that how you got 1% in critical value? and from where you got 1,517 in critical value? and secondly if I have 5% of Significance level in S&P 500 returns, where can I put this 5% in Excel sheet? -Keenly waiting for your kind reply!

@NEDLeducation 4 жыл бұрын

Hi Liza, thanks for the question. There is a functional relationship between critical value and confidence interval. It has been tabulated, so as with any statistical test, you can just refer to the critical value table. The function itself is Critical value = sqrt(-ln(a/2)*1/2), where a is your confidence level (1%, 5%, etc.). Therefore, if your observed value of Kolmogorov-Smirnov statistic exceeds this threshold for 5%, as in your example, you should reject the null in favour of the alternative (in that case, that the return distribution is non-normal). In this video, I mostly refer to p-value computation that is much more standard for academic research. Here, you calculate the p-value from the observed Kolmogorov-Smirnov statistic value and check whether your p-value is higher or lower than your confidence level. Both procedures are conceptually equivalent. Hope it helps.

@limingbluetooths 2 жыл бұрын

and all critical values are less than 1.0, while your is 1.517. Based on above formula (n>40), the critical value should be 1.63/sqrt(1258)=0.0459. conclusion would be different. Please explain, thanks!

@hifigecko 10 күн бұрын

Thanks for the video.But If you use the parameters from the sample,in this case Donsker's theorem still valid?

@olivertwist8996 3 жыл бұрын

For anyone wondering how to make the chart: column C is the X-axis, column E and F are the Y-axis, select them and choose ''Scatter with smooth lines'' type of chart. A special thanks to NEDL for all his work

@neerajpradiplahoti7019 9 ай бұрын

Hi can u tell me how would I calculate p value using ks distance for uniform distribution

@learning_with_irving4266 10 ай бұрын

Why is the empirical distribution needed? To see if its normallly distributed?

@drek273 Жыл бұрын

what if monthly data showed no statistical significance but daily data did show significance?

@nihartripathy146 4 жыл бұрын

Hello sir, thank you for the video, it's really very helpful. I have a doubt can you kindly help me how did you plot the graph ?

@NEDLeducation 4 жыл бұрын

Hi Nihar, thanks very much for your comment! This is a usual Excel chart, where you plot both the theoretical and the empirical distribution functions against ordered data as two simple line graphs. So if you select columns C, E, and F on the spreadsheet it should work. The graph is by no means essential for the test itself, but it can be a useful visualisation for goodness of fit and supremum (supremum is just the maximal distance between two graphs). Hope it helps and thanks again!

@nihartripathy146 4 жыл бұрын

@@NEDLeducation thank you very much, I truly appreciate it.

@georgyandreev7469 3 жыл бұрын

Спасибо большое за русские субтитры!

@tomp4925 2 жыл бұрын

Can the KS test be used for categorical data? Specifically, whether the data series conforms to Benford's Law.

@NEDLeducation 2 жыл бұрын

Hi and thanks for the question! Yes, it can, I have got a video showcasing this: kzbin.info/www/bejne/jXnIgKV_iL6KeqM. In practice, Kuiper test is also frequently used in the same context when testing for Benford's law violations. I have got a general video on Kuiper's test as well: kzbin.info/www/bejne/aniuoZ6Of7Carq8

@alialjanabi9157 4 жыл бұрын

thank you very much.

@elenaaccetturo6066 2 жыл бұрын

Hi, thanks for the precise video. I was wondering why in many videos they use to calculate the critical value by dividing it by the square root of n and then comparing it not with ks-statistic , as in your video, but with supremum. what is the difference?

@NEDLeducation 2 жыл бұрын

Hi Elena, and glad you liked the video! As for your question, both approaches are equivalent, this is like testing for significance using either critical values of the statistics or calculating the p-values.

@alisagvozdeva8617 2 жыл бұрын

I'm curious, what if we have two or more identical values in the data sample? Does it change ranking? Do these identical values need to be marked as having the same rank?

@NEDLeducation 2 жыл бұрын

Hi Alisa, and thanks for the question! Generally, this is not an issue for stock return modelling, as it is improbable you would have two identical returns for daily data. If you model high-frequency returns or returns for very illiquid assets where there are lots of zero observations, you could simply remove zero observations before fitting distributions. Overall, the problem of identical values can be avoided by using the "first" method, where they are assigned different ranks. In Excel, using the SMALL function can achieve this very easily when constructing an empirical distribution function.

@dpamazon2104 4 жыл бұрын

Hi, Thanks for the video. Your videos are precise and crisp. Well done! Just wondering if you can share the excel sheets used in the lesson, so that it can be a aid in my understanding the concept better, please. And let me know if you offer training in Risk Management, please. Thanks.

@NEDLeducation 4 жыл бұрын

Hi and many thanks for your feedback! Just drop me an email on s.shanaev@northumbria.ac.uk and I will send you the spreadsheet. As for personal training, I am not offering anything like that at the moment.

@hanst7218 4 жыл бұрын

Finally

@zishiwu7757 4 жыл бұрын

Thank you for making this video. I am a Computer Science student, not a finance student, but I found this really helpful. I was reading a research paper on a tool called Data Diff by Sutton et al. 2018 and they said they used the Kolmogorov-Smirnov test to determine how similar two datasets were to each other. This is really useful test for machine learning applications where you need to monitor the quality of a new dataset with your old dataset.

@NEDLeducation 4 жыл бұрын

Hi Zishi and many thanks for your feedback! We are absolutely excited to know our videos are helpful for Computer Science students as well :) I have read through the Sutton et al. paper, and it is an excellent and non-trivial application of the KS test. Thanks for the heads up!

@joeaoun6321 3 жыл бұрын

Another very clear and helpful video. Thanks for the great work that you are doing. Perhaps you address this in another video, but if S&P 500 returns are not normally distributed, does this mean that all the modern portfolio theory about minimum variance is not on a strong foundation when determining optimal asset allocation?

@NEDLeducation 3 жыл бұрын

Hi Joe, and glad you liked the video! Actually, I have got a whole series of videos on modelling stock returns with advanced (non-normal) distribution functions, and for S&P 500 in particular, the Johnson SU distribution seems to work best: kzbin.info/www/bejne/iqaqoJaGmcipgNU. As for portfolio management implications, you are correct. As a simple first-order solution, a utility function adjusted for skewness and kurtosis can be used to generate optimal portfolios, I address that in this video: kzbin.info/www/bejne/qZzQin-dbNueack. Do check these out if you are interested! Hope it helps!

@marinakholomjeva7776 4 жыл бұрын

Hello, please tell, how do you build those chats?

@NEDLeducation 4 жыл бұрын

Hi Marina, and many thanks for the question! The charts are built using simple Excel tools, nothing fancy there. Just drop me an email to s.shanaev@northumbria.ac.uk, and I will be able to send you the spreadsheet :)

@HugoGobatoLanguageCoach Жыл бұрын

Thank you very much for the video! I was wondering if you could provide a source for the formula used to calculate the p-value (p = exp(-sup^2*n)) since I could not find any source for it on the internet. Kind Regards, Hugo

@NEDLeducation Жыл бұрын

Hi Hugo, and thanks for the excellent question! This can be directly retrieved from the formula for Kolmogorov-Smirnov critical test statistics by inverting the function.

@HugoGobatoLanguageCoach Жыл бұрын

@@NEDLeducation Thank you very much for the information! Thanks to your videos I have started my own research project as an undergraduate student. If I get to publish it, I can send it to know!

@robertbond 2 жыл бұрын

Critical Value - cell K3: Should it not be 10% instead of 1%? Very interested in your answer...

@NEDLeducation 2 жыл бұрын

Hi Robert, and thanks for the question! It depends on the significance level you are after. If you are looking at 90% rather than 99%, then yes, feel free to change this figure.

@thomasjaeger648 9 ай бұрын

@@NEDLeducation I had the same question. The issue is in the video the professor says p = e^(-supremum^2*n). This is not correct. Instead p = e^(-supremum^2*n*2)

@MG-yt4om 2 жыл бұрын

Hi would you explain how the critical value at 1% is calculated? I'm not able to replicate your 1.517 critical value alpha = 1-(99/100) left-tailed test: (-∞, Q(α)) = -2.3263478740408408 right-tailed test: (Q(1 - α), ∞) = 2.3263478740408408 two-tailed test: (-∞, Q(α/2)) ∪ (Q(1 - α/2), ∞) = -2.5758293035489 ∪ 2.5758293035489 where Q is the inverse of the cumulative distribution function of the normal distribution.

@NEDLeducation 2 жыл бұрын

Hi, and thanks for the question! The p-value of the Kolmorogov-Smirnov test can be calculated as p = e^(-supremum^2*n). From here, you can invert the function and construct critical values. Here is also why the critical statistic is supremum times square root of sample size.

@carminebevilacqua8508 2 жыл бұрын

@@NEDLeducation How can I invert the function in order to find the critical value ? What is the excel function to do that? I tried several functions but I am still not be able to replicate the 1.517 value. Thank you so much , I really appreciate what you do .

@thomasjaeger648 11 ай бұрын

Isn't the value 1.517 the 10% critical value (not 1%). I believe the math here is c(0.1)=SQRT(-LN(0.1))=1.517? FYI the 1% critical value would be c(0.01)=SQRT(-LN(0.01))=2.145966. EDIT: nvm the critical value is 1% issue was the professor says p = e^(-supremum^2*n). This is not correct. Instead p = e^(-supremum^2*n*2). The fact ln(0.1) === ln(0.01)/2 makes this an easy miss.

@thomasjaeger648 9 ай бұрын

@@carminebevilacqua8508 you can use python to get the 1.517 value e.g.: from scipy.stats import ksone import numpy as np print(ksone.ppf(1-0.01, 1256000)*np.sqrt(1256000))

@faizaahmed6488 2 жыл бұрын

Sir is it a uniformaty test? If unformaty test rejected than it is a sign of psycholohical barries in stock market some how. Plz clear this concept. Thank you so much.

@NEDLeducation 2 жыл бұрын

Hi again Faiza, and thanks for the question! This is a test that can be very generally used to check whether an empirical distribution (real-world data) conforms with a theoretical distribution function. You can apply to last digits of stock prices and see whether they are consistent with the uniform distribution. As this is a discrete distribution (the last digit can be an integer from 0 to 9), this can also be tested using a Chi-squared test (kzbin.info/www/bejne/bJK0kIyfoZ6FbZY), but a Kolmogorov-Smirnov test is also applicable. Hope this helps!

@faizaahmed6488 2 жыл бұрын

@@NEDLeducationthank you for ans me sir. I have one more question that if i want to check the uniformity of stock .which test is best?

@NEDLeducation 2 жыл бұрын

@@faizaahmed6488 Hi Faiza, the video on price clustering detection and psychological barriers in stock prices is out now, check it out if you are interested: kzbin.info/www/bejne/onWcZoJnZpKbZ9k

@axelacuna5244 7 ай бұрын

You are the best Chanel i have seen ! Combining explaining straight to the point and Excel manipulation ! Great job !

@abirhossain3437 4 жыл бұрын

Hello, how did you calculate the critical value? How is the critical value going to change for a different datasets?

@NEDLeducation 4 жыл бұрын

Hi Abir, thanks for your question. A critical value is a value of the statistic that gives a specified p-value. If the statistic exceeds a critical value, the null hypothesis can be rejected at the respective confidence interval. For Kolmogorov-Smirnov test the relationship between supremum and p-value is: p = exp(-sup^2*n). Talking log of both sides: log(p) = -sup^2*n sup = sqrt(-log(p)/n) This function will give you critical values of the supremum for different datasets and different confidence intervals (C.I. of 95% would be a p-value of 5%, for example). As can be seen, critical values are lower when your dataset is large. That is typical for most statistical tests, it is easier to reject the null with reasonable confidence when your sample is large. Hope this helps.

@thomasjaeger648 9 ай бұрын

@@NEDLeducation lim n-> inf p = e^(-supremum^2*n*2)

@RustuYucel 4 жыл бұрын

Perfect!

@sarindam74 4 жыл бұрын

You are great. You made education accessible. May be silly: How have you got that CDF graph prefitted??

@NEDLeducation 4 жыл бұрын

Hi Arindam, and many thanks for such kind words! As for your question - it was just a pre-constructed chart that was populated as I inputted the data in the arrays (I just did not want spend extra time in the video to plot and format the graphs). You can check the link at the pinned comment, all spreadsheets are available through our Google Drive.