UMAP: Mathematical Details (clearly explained!!!)

Рет қаралды 38,365

Күн бұрын

Пікірлер: 82

@statquest 2 жыл бұрын

To learn more about Lightning: github.com/PyTorchLightning/pytorch-lightning To learn more about Grid: www.grid.ai/ Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

@mohamedarebi7419 2 жыл бұрын

Can't wait to get all the nitty gritty details :)

@statquest 2 жыл бұрын

BAM! :)

@hansenmarc 2 жыл бұрын

I have to agree with you. Spectral embedding does sound very cool!

@statquest 2 жыл бұрын

bam!

@henriquefantinatti4601 2 жыл бұрын

Please, make videos about time series. Would be amazing seeing you talk about it. Love the videos!

@statquest 2 жыл бұрын

I'll keep that in mind.

@nourelislam8565 2 жыл бұрын

Amazing!.... we actually need a video explaining the different normalization methods used in scRNA seq analysis especially SCTransform.. Appreciate your support, Thanks

@statquest 2 жыл бұрын

Thank you!

@dataanalyticswithmichael8931 2 жыл бұрын

Finally, i like mathematical things and this video popped up in my recomendation

@statquest 2 жыл бұрын

Hooray!

@mohamedamr8081 2 жыл бұрын

Hey Josh, thanks for the videos and the great content! I was wondering if you can make a video about causal inference, that would be great (for me lol), thanks.

@statquest 2 жыл бұрын

I'll keep that in mind.

@utkarshtrehan9128 2 жыл бұрын

Level: God Level! Could we have a video on Factor Analysis!

@statquest 2 жыл бұрын

I'll keep that in mind! :)

@venkateshmunagala205 2 жыл бұрын

Great Video. Thanks Josh

@statquest 2 жыл бұрын

Thanks!

@mariapelaez3758 Жыл бұрын

Hi Josh, I saw both of your videos for UMAP, but I have a doubt regarding how you did the low-dimension graph. I knw you mention spectral embedding an what I know from that is that you calculate the Laplacian of the graph and then get the eigenvalues and eigenvector and the coordinates of the new dimension will be the values of the eigenvector for the lowest eigenvalue (ignoring zero eigenvalues). But, when I try that for your data, I am not able to get the values that you showed, so, I wanted to know if you did something different. Also I wanted to know if for more than 1 dimension, I will use more than 1 eigenvector right? For the 2d case the x-y coordinates will be the values of the first and second eigenvectors of the lowest eigenvalues. Thanks

@statquest Жыл бұрын

To be honest, I just drew the low-dimensional graph in a way that I thought would best highlight how UMAP works, rather than stay faithful to how spectral embedding would have projected the points. In other words, I completely ignored spectral embedding when I drew the low-dimensional graph and only took pedagogical aspects into consideration. I'm sorry if this caused confusion. :(

@AHMADKELIX Жыл бұрын

Hi Josh, thank for your explained

@statquest Жыл бұрын

My pleasure!

@cahbe6108 2 жыл бұрын

Thank you for the great video! One question: What happens if there are multiple closest neighbors with the same distance? Then there will be multiple similarity scores = 1. Then changing sigma might not help to get close to log2(num_neighbors) for the sum of similarities.

@statquest 2 жыл бұрын

I'm not sure what the technical details are exactly, but I would guess it simply finds the value for sigma that gets the sum closest to the ideal value. It doesn't have to be exact.

@amalnasir9940 2 жыл бұрын

Hi Josh, Spectral clustering is an interesting topic. I hope you cover it one day!

@statquest 2 жыл бұрын

I'll keep that in mind! :)

@QwakeRunner 2 жыл бұрын

Hi Josh! would you mind showing us how this can be done via Python! Would be very happy to buy a template script from you and support your great work!

@statquest 2 жыл бұрын

That's a good idea and I'll keep that in mind.

@anitat9727 2 жыл бұрын

@@statquest I'd love this too

@AyushSharma-tg5co 2 жыл бұрын

sir I'm looking forward to the data analyst domain, so pls tell me which playlist of yours I should follow to get the stats part done?

@statquest 2 жыл бұрын

You can just start at the top of this page and work your way down: statquest.org/video-index/

@ass_im_aermel 2 жыл бұрын

Hi Josh! Is it possible, that you make some videos for time series related topics like: serial correlation or the box jenkins method? And thank you for all the videos you made in the last years. They are awesome :-)

@statquest 2 жыл бұрын

I'll keep those topics in mind.

@ieserbes Жыл бұрын

Hello Josh, Amazing explanation. Thank you so much. Where does 2.1 (lower dimensional distance between a and b ) come from? I am a bit lost there.

@statquest Жыл бұрын

If you look at the number line that the points are on, you'll see that point 'b' is is at about 1.8 and point 'a' is at about 3.9. Now we just do the math: 3.9 - 1.8 = 2.1. Bam.

@ieserbes Жыл бұрын

@@statquest Hooray 😄 Thank you Josh.

@grinps 2 жыл бұрын

waiting for the BAM!

@statquest 2 жыл бұрын

BAM! :)

@xavisolersanchis7145 Жыл бұрын

Hi, thank you very much! Just one question. In your case you could compute the initial distances between the data points as euclidean distances because you are only working wuth two features. How are they computed when you have much more features, do you always start with euclidean distances??

@statquest Жыл бұрын

The Euclidean distance works for more than 2 features, en.wikipedia.org/wiki/Euclidean_distance so there's no problem adding more features. That said, if you wanted to use a different distance metric, it would probably be OK.

@xavisolersanchis7145 9 ай бұрын

Thank you very much! I also realized there is a little error in the video. It is the part that you say the result seems strange to you. To compute the Symmetrical Score they don't take what you call the "Similarity scores". What they do is to iterate over the y nearest neighbours and compute the distance between x and y as the maximum value between 0 and the distance between x and y minus the distance to the nearest neighbour of x; all of this divided by the previously learnt sigmas. Then they exp-1 this dist to get a Similarity score between x and y (saved as the probability of this element in the fuzzy set). Once you have all this fuzzy set you apply the t-conorm you mention over this Similarity scores to have the Symmetrical score. I hope this is helpful to you, and I also hope not being wrong hehe. Thank you very much!@@statquest

@Pedritox0953 2 жыл бұрын

Great video!

@statquest 2 жыл бұрын

Thanks!

@alexmiller3260 2 жыл бұрын

Hey, Josh! Can you make video(s) about likelihood and MLE for an unknown distribution, if it can't be easily approximated or it's impossible to approximate? Because everyone talks about well-known distributions, but literally nothing about working with something unknown. Of course it would be better, if you decided to make a playlist with everything about unknown distributions, but couple of videos is also OK

@statquest 2 жыл бұрын

I'll talk about this topic when we cover Bayesian statistics. That said, even when the distribution is unknown, the central limit theorem ( kzbin.info/www/bejne/j3LPe3Z7ea1lq7s ) results in a known (gaussian) distribution. And that means it doesn't matter what the original distribution is.

@ouryly1541 11 ай бұрын

Amazing!! Thank you

@statquest 11 ай бұрын

Thank you too!

@conlele350 2 жыл бұрын

Hi Josh, could you please take some time to give some explanation about Markov chain decision process and it application in ML and how do we code it in R. Thanks

@statquest 2 жыл бұрын

I'll keep those topics in mind.

@yeah6732 2 жыл бұрын

Thank you!!

@statquest 2 жыл бұрын

bam! :)

@3Mus-cat-tears Жыл бұрын

Hi Josh! Thank you for the info! It's really helpful. What to do if I have zeros or NAs in my dataset? I couldnt find anything on imputation before UMAP on Google :(

@statquest Жыл бұрын

there might not be a UMAP specific imputation method, so if you just search for imputation methods in general, you might find something that works.

@lukesimpson1507 2 жыл бұрын

Hi Josh I was wondering how you feel about using some stills from your channel to explain these types of plots prior to displaying them? This would be done I'm an educational setting and I would credit the channel and provide a link if that is okay?

@statquest 2 жыл бұрын

As long as you provide the link, it's fine with me.

@lukesimpson1507 2 жыл бұрын

@@statquest Thank you! Keep up the great videos!

@ChocolateMilkCultLeader 2 жыл бұрын

The day you do Hilbert Curves will be a lot of BAMMMMMSSSSSSS

@statquest 2 жыл бұрын

I'll keep that in mind! :)

@muriloaraujosouza462 2 ай бұрын

In "making the scores symmetrical" part, the operation is done even if the scores are already symmetrical. For example, the score from A->B was the same as the score from B->A. And the score from A->C was the same as the score from C->A. Any ideas why it is necessary to perform this operation in already symmetrical scores?

@statquest 2 ай бұрын

Probably not and it is possible that efficient implementations avoid the extra math.

@miguelcampos867 2 жыл бұрын

Would be great talking about normalizing flows

@statquest 2 жыл бұрын

I'll keep that in mind.

@miguelcampos867 2 жыл бұрын

@@statquest thanks. In fact that’s the main reason of why I am watching all your videos of likelihoods, gaussian distribution etc jajaja (they are great btw)

@harryhamjaya Жыл бұрын

Hello could you bring Spectral Embedding topic, and I have a question regarding UMAP how is it possible that UMAP is faster than T-SNE, in which part does it beat the t-sne? Since logically T-SNE moves more point in a time compared to UMAP(?) Correct Me If I Wrong and BTW we are hoping the new updates ❤

@statquest Жыл бұрын

I talk about why UMAP is faster than t-SNE in my other UMAP video here: kzbin.info/www/bejne/m3-TqHmwd6ZnicU