To learn more about Lightning: lightning.ai/ Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@marchanselthomas6 ай бұрын
The explanation is so clean. I was clapping for him from my room. How can someone be so good at their job!
@statquest6 ай бұрын
Thank you! :)
@nuridaw95864 ай бұрын
I clapped too, twice! :)
@jwilliams8210 Жыл бұрын
You are EXCEPTIONALLY good at CLEARLY describing complex topics!!! Thank you!
@statquest Жыл бұрын
Thank you very much! :)
@mattgenaro Жыл бұрын
Such a simple, yet, a beautiful and powerful concept of similarity. Thanks, StatQuest!
@statquest Жыл бұрын
bam!
@nossonweissman Жыл бұрын
You literally make it so easy!! I can't help but smile 😊😊😊❤️❤️❤️ By far one of my favorite KZbin channels!
@statquest Жыл бұрын
Thank you so much! :)
@torley Жыл бұрын
QUADRUPLE BAM!!! Thanks for such fun yet pragmatic explainers.
@statquest Жыл бұрын
Thank you!
@usamsersultanov689 Жыл бұрын
I think and hope that this video is a preamble for more comlex NLP topics such as Word Embeddings etc.. many thanks for all of your efforts!
@statquest Жыл бұрын
Yes it is! :)
@xanderortega43598 ай бұрын
Cosine Similarity is used as an evaluation tool on word2vec
@tysontakayushi83947 ай бұрын
I usually hate when people say that a video explains well, because usually this is not the case. But, haha, amazing job! Well done, really nice explained, it's a gamification, they way I understand!
@statquest7 ай бұрын
Thanks!
@jasonlough66409 ай бұрын
Dude these are so good. I have to watch them several times, and then I try write some code to reinforce the concept. Your vides are absolutely amazing.
@statquest9 ай бұрын
Thank you!
@DrKnowsMore4 ай бұрын
Like most things, it is relatively straightforward when you remove the jargon
@statquest4 ай бұрын
bam! :)
@RonnieDenzel3 ай бұрын
Never understood something in such a slow but efficient pace,thanks💯
@statquest3 ай бұрын
Thanks!
@virenpai939511 ай бұрын
My Love for learning Data Science and Statistics has increased multi-folds because of you. Thank you Josh!!🙂
@statquest11 ай бұрын
bam! :)
@dukeduke19109 ай бұрын
This guy is seriously funny. I thought I was the only person who ever watched gymkata (like 50 times, especially the part in the town where everyone was crazy). This video def explains cosine sim clearly. Thk u!
@statquest9 ай бұрын
BAM! :)
@MOROCCANFREEMIND11 ай бұрын
The quality of your explanation is more than triple bam!!😂
@statquest11 ай бұрын
Thanks!
@KarthikNaga329 Жыл бұрын
This is another great video, Josh! question: @3:51 you talk about having 3 Hellos and that still results in a 45 degree angle with Hello World. However, comparing Hello to Hello World seems to be a diff angle from comparing Hello to Hello World World. Is there an intuition as to why this is the case? That is adding as many Hellos to Hello keeps the angle the same, but adding more Worlds to Hello World seems to change the Cosine Similarity.
@statquest Жыл бұрын
Two answers: 1) Just plots the points on a 2-dimensional graph for the two pairs of phrases and you'll see that the angles are different. 2) The key difference is that "hello hello hello" only contains the word "hello". If we had included "world", then the angles would be different. Again, you can plot the points to see the differences.
@insaiyancvk5 ай бұрын
Wonderfully explained, Josh! You've earned a subscriber!
@statquest5 ай бұрын
Thank you!
@gunnerstone1203 ай бұрын
Great video. It would've been worth noting that magnitude of the feature space does matter in certain cases and doesn't in others. Your example of [Hello Hello Hello], caught my eye. In that example, the magnitude of that feature didn't matter because its direction didn't change. However the difference between [Hello World!] and [Hello Hello Hello World!] does have an impact on the angle.
@statquest3 ай бұрын
Good point!
@nedlin793425 күн бұрын
Danke!
@statquest24 күн бұрын
TRIPLE BAM!!! Thank you for supporting StatQuest!
@Neeeemg3 ай бұрын
U nailed it enjoyed it hello ,BAM and best teacher ever 😂😂😂
@statquest3 ай бұрын
Thank you!
@olucasharp Жыл бұрын
It all seems so easy when you speak about such complicated things! Huge talent! And so funny ⚡⚡⚡
@statquest Жыл бұрын
Thank you!
@abdulrafay242011 ай бұрын
What a great way of explaination !! Love it ❤
@statquest11 ай бұрын
Thanks!
@lynnekilgore60113 ай бұрын
Terrific video, thanks Josh! I learned the basics of Linear Algebra before, but the explanations were never this clear (or fun).
@statquest3 ай бұрын
Thank you!
@kforay42 Жыл бұрын
Your videos are such a lifesaver! Could you do one on the difference between PCA and ICA?
@statquest Жыл бұрын
I'll keep that in mind.
@jitendrakumar-k4d7k10 ай бұрын
you are the King Josh 👏👏👏👏 wonderful job!!!
@statquest10 ай бұрын
Thank you! 😃
@magicfox94 Жыл бұрын
Excellent explaination! I hope it is the first of a NLP series of videos!
@statquest Жыл бұрын
I hope to do word embeddings soon.
@AmineBELALIA Жыл бұрын
this video needs more views it is awesome
@statquest Жыл бұрын
Thank you! :)
@zeinebkefeya19466 күн бұрын
The Best!!! thank you for your efforts
@statquest5 күн бұрын
Glad you enjoyed it!
@Artinm89.25 ай бұрын
you are insane at explaining clearly, btw you sing really well😂
@statquest5 ай бұрын
Thanks! 😃
@davidmurphy563 Жыл бұрын
Could you cover discrete cosine/fourier transforms pretty please?* I've love to know how to break signals up into their component frequencies. If you haven't already!
@statquest Жыл бұрын
I'll keep that in mind.
@Ankara_pharao Жыл бұрын
Have you seen 3blue1brown video on this topic? Not sure if it about descreet FT.
@azzahamed206311 ай бұрын
This is an AMAZING explanation !!
@statquest11 ай бұрын
Thank you!
@bladongarland86357 ай бұрын
Hilarious, easy to understand, and entertaining. Bravo!
@statquest7 ай бұрын
Glad you enjoyed it!
@bjornnorenjobb Жыл бұрын
Awesome video! I had no idea what Cosine Similarity was, but you explained super clearly
@statquest Жыл бұрын
Thanks!
@suzhenkang Жыл бұрын
pretty good .
@exoticcoder5365 Жыл бұрын
I must watch Gymkata ! Thanks for the recommendation ! And excellent explanation of the topic !
@statquest Жыл бұрын
bam! :)
@chris-graham Жыл бұрын
"in contrast, this last sentence is from someone who does not like troll 2" - I was expecting a BOOOO after that lol
@statquest Жыл бұрын
Ha! That would have been great.
@RaynerGS Жыл бұрын
I love you!!!! Salute from Brazil.
@statquest Жыл бұрын
Muito obrigado! :)
@AreyHawUstad6 ай бұрын
Holy shit did I land on a gold mine. Love the explanation (minus the intro, sorry Josh). Thanks a bunch!
@statquest6 ай бұрын
Thanks!
@willw4096 Жыл бұрын
Great video! My notes: 3:52 4:23
@statquest Жыл бұрын
bam!
@tarangrathod255Ай бұрын
You're a very good teacher!
@statquestАй бұрын
Thank you! 😃
@anuj5576 Жыл бұрын
Super simplistic explanation! Thanks for your effort.
@statquest Жыл бұрын
Thanks!
@infraia6 ай бұрын
Excellent explanation!
@statquest6 ай бұрын
Thanks!
@Ogunbiyi_Ibrahim8 ай бұрын
I came here as I need to learn something in NLP. Thank you, I understood it clearly.
@statquest8 ай бұрын
BAM! :)
@AU-hs6zw Жыл бұрын
You deliver the moment I need it. Thanks
@statquest Жыл бұрын
BAM! :)
@limebro8833 Жыл бұрын
This video saved me, I cannot thank you enough.
@statquest Жыл бұрын
Bam! :)
@muhammadazeemmohsin5666 Жыл бұрын
what's an amazing explanation. Thanks for the video.
@statquest Жыл бұрын
Thanks!
@abrahammahanaim3859 Жыл бұрын
Hey josh thanks for the video nice explanation.
@statquest Жыл бұрын
You bet!
@artmiss-x8o4 ай бұрын
You really are the best !
@statquest4 ай бұрын
Thank you!
@suaridebbarma12557 ай бұрын
this video was absolutely a BAM!!
@statquest7 ай бұрын
Thanks!
@kavita8925 Жыл бұрын
Your Explanation is great
@statquest Жыл бұрын
Thanks!
@bachdx2812 Жыл бұрын
thanks a lot. this kind of videos are super helpful for me !!!
@statquest Жыл бұрын
Thanks! :)
@성이름-g3q Жыл бұрын
wow thankyou!!! i don't know how to calculate it , but after watching this, i become mathmatician!!
@statquest Жыл бұрын
bam!
@theedspage Жыл бұрын
Hello! Hello! Hello! Thank you for introducing me to this topic! Subscribed.
@statquest Жыл бұрын
Awesome! Thank you!
@ymperformance Жыл бұрын
Great video and great explanation! Thanks.
@statquest Жыл бұрын
Glad it was helpful!
@murilopalomosebilla2999 Жыл бұрын
Hello!! Nice video!
@statquest Жыл бұрын
Thank you!
@raphaelbonillo21927 ай бұрын
Você democratiza a matemática! Deveriam fazer assim nas escolas.
@statquest7 ай бұрын
Muito obrigado!
@RichardGreco Жыл бұрын
Great video. Very interesting. I hope to see you apply this to more examples.
@statquest Жыл бұрын
We'll see it used in CatBoost for sure.
@yang90513 ай бұрын
super clear, thank you dude
@statquest3 ай бұрын
Thanks!
@longgg972Ай бұрын
just beautiful.
@statquestАй бұрын
Thank you!
@Ghulinzer Жыл бұрын
Great video! I've seen though in many articles out there that people consider cosine similarity the same as Pearson's correlation since they produce the same outcome when E(X) = E(Y) = 0 and the means of X and Y = 0. This is not true since both measure different things. Cosine similarly measures the cosine of the angle between two vectors in a multi-dimensional space and returns a similarity score as explained in the video, while Pearson's correlation measure the linear relationship between 2 variables.
@statquest Жыл бұрын
Correct!
@chrisguiney Жыл бұрын
This video also does a good job highlighting how cosine and dot products are the same. Unless I'm mistaken, that equation can be written dot(a, b) / (magnitude(a) * magnitude(b)), where magnitude(x) = sqrt(dot(x, x))
@statquest Жыл бұрын
yep
@dataanalyticswithmichael8931 Жыл бұрын
superb ! Thank you for the explanation
@statquest Жыл бұрын
Thanks!
@millennialm1money5008 ай бұрын
Great video 🎉
@statquest8 ай бұрын
Thank you 😁!
@marceloamado6223Ай бұрын
This video was the goat!
@statquestАй бұрын
Thanks!
@smegala3815 Жыл бұрын
Very useful 👍
@statquest Жыл бұрын
Thank you! :)
@Francescoct Жыл бұрын
Great video! Have you made one for the Word Embeddings?
@statquest Жыл бұрын
Coming soon!
@AxDhan Жыл бұрын
I'm a native spanish speaker, and it surprised me when it started speaking spanish, it will reach more people, but they will miss your motivating silly songs xD
@statquest Жыл бұрын
Thanks! Yeah - I'm not sure what to do about the silly songs. :)
@pouryajafarzadeh5610 Жыл бұрын
Cosine similarity is a good method for comparing the embedding vectors, especially for face recognition.
@statquest Жыл бұрын
Nice!
@gsp_admirador Жыл бұрын
nice easy explanation
@statquest Жыл бұрын
Thanks!
@mystmuffin3600 Жыл бұрын
Cool! (in StatQuest voice)
@statquest Жыл бұрын
bam! :)
@ericvaish4 ай бұрын
How am I able to understand this topic? Wasn't this supposed to be difficult? 😭 Seriously Great Explanation Josh.
@statquest4 ай бұрын
Thank you!
@CristianoGarcia10 Жыл бұрын
Excellent and clear video! I wonder why NLP applications use more often cosine distance rather than other metrics, such as euclidean distance. Is there a clear reason for that? Thanks in advance
@statquest Жыл бұрын
I'm not certain, but one factor might be how easy it is to compute (people often omit the denominator making the calculation even easier) and it might be nice that the cosine similarity is always between 0 and 1 and doesn't need to be normalized.
@fazelamirvahedi9911 Жыл бұрын
Thank you for making all of these informative, simple and precise videos. I wondered what happens if two phrases deliver the same meaning but have different orders of words, for instance: A) I like Gymkata. B) I really like Gymkata. In this case doesn't the extra adverb "really" in the second sentence disturb the phrase matrix? And one more question, if the three phrases have the same length and two of them have the same meaning but have used different words, like: A) I like Gymkata. B) I love Gymkata. C) I like volleyball. In this case, would the cosine similarity between A and B be more than A and C?
@statquest Жыл бұрын
In this video, we're simply counting the number of words that are the same in different phrases, however, you can use other metrics to calculate the cosine similarity, and that is often the case. For example, we could calculate "word embeddings" for each word in each phrase and calculate the cosine similarity using the word embedding values and that would allow phrases with similar meanings to have larger similarities. To learn more about word embeddings, see: kzbin.info/www/bejne/rJq9o4Kkf8ifj5I
@edmiltonpeixeira3221 Жыл бұрын
Parabéns pelo conteúdo. Excelente explicação, como não encontrei em nenhum outro vídeo
@statquest Жыл бұрын
Muito obrigado! :)
@sciab36749 ай бұрын
thanks a lot. easy to understand
@statquest9 ай бұрын
Thanks!
@Sohy365b2 ай бұрын
Perfection! BAM
@statquest2 ай бұрын
Thank you!
@banibratamanna54468 ай бұрын
the generalized equation of cosine similarity comes from the dot product of 2 vectors in multidimension.....by the way big fan of yours❤
@statquest8 ай бұрын
scaled to be between -1 and 1. :)
@jonathanramos66909 ай бұрын
Amazing!!
@statquest9 ай бұрын
Thanks!
@gokulsubramanian3875Ай бұрын
BAM!!!!!! SUBSCRIBED FROM CHENNAI
@statquestАй бұрын
Thank you!
@cartulinito Жыл бұрын
Great video as we are used to.
@statquest Жыл бұрын
Thank you! :)
@lifeisbeautifu110 ай бұрын
Thank you!
@statquest10 ай бұрын
Thanks!
@nidhi_singh94949 ай бұрын
Hey...so cosine is only depends on angle not on lengths... When the case of three Hello were shown, how it can be distinguished between them as similarity is same for both sentence
@statquest9 ай бұрын
What time point, minutes and seconds, are you asking about?
@debatradas1597 Жыл бұрын
Thank you so much
@statquest Жыл бұрын
You're most welcome!
@SalahMusicOfficial Жыл бұрын
Hi Josh, I’m trying to understand why cosine similarity may be the best metric to find semantically similar texts (using pertained embeddings). It sounds like the two vectors have to only directionally similar for cosine similarity to be high. What about using something like Euclidean or Manhattan distance. Would a distance metric be better to see if two texts are semantically similar?
@statquest Жыл бұрын
That's a good question and, to be honest, I don't know the answer. I do know, however, that most neural networks - when they use "attention" (like in transformers, which are used for ChatGPT) - just use the numerator of the cosine similarity as the "similarity metric". In other words, they just compute the dot-product. Maybe they do this because it's super fast, and the speed outweighs the benefits of using another, more sophisticated method. Also, it's worth noting that this is a similarity metric and not a distance. In other words, as the value goes up, things are "more similar" (the angle is smaller). In contrast, the Euclidean and Manhattan distances are...distances. That is, as the value goes up, the things are further away and considered "less similar" Lastly, cool music on your channel! You've got a dynamite voice.
@SalahMusicOfficial Жыл бұрын
@@statquest thank you! let me know if you need another voice in any of your intro jingles 😁
@statquest Жыл бұрын
@@SalahMusicOfficial bam!
@ZOBAER496 Жыл бұрын
Can you please tell about some applications of cosine similarity like where is it used in which type of problems?
@statquest Жыл бұрын
I talk about that at the start of the video, but you can also use it whenever you want to compare two rows of data. For example, CatBoost uses it compare predicted values for a bunch of data to their actual values.
@luizcarlosazevedo9558 Жыл бұрын
Hey, great video as always!! Is the cosine similarity good for regression problems in which the targets are pretty close to zero? Im trying to implement some accuracy metrics for a transformer model
@statquest Жыл бұрын
Hmm... I bet it would work (if you had a row of predictions and a row of known values).
@jainanshu2000 Жыл бұрын
Great video ! One question - how is this diffrent from the regular string comparison we use various programming languages?
@statquest Жыл бұрын
I'm not sure I understand your question. My understanding of string comparison in programming languages is that it just compares the bits to make sure they are equal and the result is a boolean True/False type thing.
@WorldwidenigespamАй бұрын
The cosine formula shows Ai time Bi. When working through the example, you add them. I know you only get the correct answer when you add them so I thought it had been written incorrectly. However, searching the internet for the formula returns the same as that in the video. What am I missing?.
@statquestАй бұрын
What time point, minutes and seconds, in the video are you asking about? It is possible you are missing the "summation" symbol, represented by the upper case Sigma (a greek character).
@MrJ17J Жыл бұрын
Super interesting ! Do you have examples of how those are implemented in practice ?
@statquest Жыл бұрын
I talk about that at the start of the video, but it's also used by CatBoost to compare the predicted values for a bunch of samples to their actual values.
@Levy957 Жыл бұрын
you are amazing
@statquest Жыл бұрын
Thanks!
@Shehab-Codes Жыл бұрын
Thank you so much I had no idea what cosine similarity is and you illustrated it easily, appreciate it Btw how cosine similarity can result in -ve number
@statquest Жыл бұрын
The cosine similarity can be calculated for any 2 sets of numbers, and that can result in a negative value.
@Mrnafuturo Жыл бұрын
Does cosine similarity equation ends up being a vector normalization of the projection of one vector over the other one?
@statquest Жыл бұрын
I believe that is correct.
@samrasoli Жыл бұрын
useful, thanks
@statquest Жыл бұрын
Thanks!
@sushi666 Жыл бұрын
Can you please do Spherical K Means with Cosine Similarity as the distance metric?
@statquest Жыл бұрын
I'll keep that in mind.
@notjustanyuser2 ай бұрын
How can someone be so good at something! Thank you. I have bought a copy of your book "statquest_illustrated_guide_to_machine_learning" because I wanted to convey my gratitude. I am yet to go through the book (Just bought it!) but I am sure it would be awesome.
@statquest2 ай бұрын
Thank you very much!!! I really appreciate your support! :)
@notjustanyuser2 ай бұрын
@@statquest can you please do an episode on NMF
@statquest2 ай бұрын
@@notjustanyuser I'll keep that in mind, but it will probably be a long time before I can get to it.
@001kebede Жыл бұрын
how can we relate this with correlation between two continuous random variables?
Somewhere it says Cosine Similarity is a number between -1 and +1 but in other places it is said to be between 0 & 1. What is the truth?
@statquest4 ай бұрын
The cosine similarity can be between -1 and 1. If all the input data are positive (like they are in a bunch of the examples in this video, since we are just using count data, and count data is positive) then you'll be restricted to values between 0 and 1, but the data don't always have to be positive.
@shintaardani63329 ай бұрын
I am conducting sentiment analysis research and found that some data has a Cosine Similarity of 0. Are there any methods to make the Cosine Similarity not equal to 0?
@statquest9 ай бұрын
you could pad each phrase with something, so all phrases have at least one thing in common.
@shintaardani63329 ай бұрын
@@statquest Thank you so much😁
@eddiesec Жыл бұрын
I still don't understand how that works for embeddings though. Each embedding dimension should represent loosely a grammatical property of the words, than how can one word that is farther than another in a single dimension (as in your Hello Hello Hello example) be considered identical?
@statquest Жыл бұрын
I'll do a video on embeddings soon.
@miltonborges7356 Жыл бұрын
Amazing
@statquest Жыл бұрын
Thanks!
@PromitiDasgupta-mz7uc Жыл бұрын
can i use cosine similarity for building a similarity matrix between two different brain regions?
@statquest Жыл бұрын
Probably.
@rajashreechakraborty74711 ай бұрын
Can u please help me with this? This is my data: A: cosine: 0.58, z-score: 372 B: cosine: 0.63 , z-score: 370 How can I find the p-value/significance of the 0.5 change in the cosine similarities?