To learn more about Lightning: github.com/PyTorchLightning/pytorch-lightning To learn more about Grid: www.grid.ai/ Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@EthanSalter32 жыл бұрын
This is such perfect timing, I'm supposed to learn and perform a UMAP reduction tomorrow. Thank you!
@statquest2 жыл бұрын
BAM! :)
@Dominus_Ryder2 жыл бұрын
You should buy a couple of songs to really show your appreciation!
@evatosco-herrera89782 жыл бұрын
I just found this channel. I'm currently doing my PhD in Bioinformatics and this is helping me immensely to save a lot of time and to learn new methods faster and better (I have a graphical brain so :/) Thank you so much for this!!
@statquest2 жыл бұрын
Good luck with your PhD! :)
@codewithbrogs38096 ай бұрын
After three days of coming back to this video, I think I finally got it... Thanks Josh. When I'm in a place to support, I will
@statquest6 ай бұрын
Bam!
@codewithbrogs38096 ай бұрын
DOUBLE BAM
@terezamiklosova1042 жыл бұрын
I really appreciated the UMAP vs t-SNE part. Thanks for the video! Really helpful when one tries to get the main idea behind all the math :)
@statquest2 жыл бұрын
Thank you very much! :)
@smallnon-codingrnabioinfor3792 Жыл бұрын
I totally agree! The part starting at 16'10 is worth to look at back! Thanks a lot for this great and simple explanation!
@JulietNovember92 жыл бұрын
New StatQuest always gets me amped. High yield, low drag material!!!
@statquest2 жыл бұрын
Awesome!!!
@abramcadabros17552 жыл бұрын
Wowie, I can finally learn what UMAP stands for and how it reduces dimensionality AFTER I analysed my scRNA-seq data with it's help!
@statquest2 жыл бұрын
BAM!
@offswitcher31592 жыл бұрын
Great Video, Thank you! You are with me since first semester and I am so happy to see a video by you on a topic that is relevant to me
@statquest2 жыл бұрын
Awesome!
@VCC13162 жыл бұрын
I'd love to see a cross-over episode between StatQuest and Casually Explained. Big bada-bam.
@statquest2 жыл бұрын
:)
@rajanalexander49496 ай бұрын
Great video; especially liked the echo on the full exposition of 'UMAP' 😂
@statquest6 ай бұрын
:)
@danli18632 жыл бұрын
I must say this channel is amazing! I must say this channel is amazing! I must say this channel is amazing! Important things 3 times. :)
@statquest2 жыл бұрын
TRIPLE BAM! :)
@whitelady10632 жыл бұрын
Best comment section in KZbin Also now I get why people on office won't stop praising you BAM!
@statquest2 жыл бұрын
Thank you! :)
@AmandaEstevamCarvalho5 ай бұрын
Ele explica como se eu fosse uma acéfala. Só assim eu entendi, obrigada!
@statquest5 ай бұрын
Muito obrigado! :)
@saberkazeminasab6142 Жыл бұрын
Thanks so much for the great presentation!
@statquest Жыл бұрын
Glad you enjoyed it!
@shubhamtalks97182 жыл бұрын
Yayy. I was waiting for it.
@statquest2 жыл бұрын
bam!
@emiyake7 ай бұрын
PaCMAP dimension reduction explanation video would be very appreciated!
@statquest7 ай бұрын
I'll keep that in mind.
@Littlemu22y11 күн бұрын
your videos are fantastic
@agentgunnso4 ай бұрын
Thank you so much!!! Love the sound effects and the jokes
@statquest4 ай бұрын
Glad you like them!
@abdoualgerian5396 Жыл бұрын
With this amazing explanation way, please consider doing a Deep TDA quest starting with the paraparapepapara funny thing instead of the songs
@statquest Жыл бұрын
Noted
@siphosakhemkhwanazi60424 ай бұрын
The intro made me to subscribe😂😂
@statquest4 ай бұрын
bam! :)
@veronicacastaneda62742 жыл бұрын
Hey! I love your videos! Can you do one on Weighted correlation network analysis? I share your videos with my friends and we want to learn about it :)
@statquest2 жыл бұрын
I'll keep tat in mind.
@meenak72211 ай бұрын
Thank you very much!
@statquest11 ай бұрын
You're welcome!
@RelaxingSerbian2 жыл бұрын
Your little intros are so silly and charming! ^_^
@statquest2 жыл бұрын
Thank you!
@THEMATT2222 жыл бұрын
New video!!!! Very Noice 👍
@statquest2 жыл бұрын
BAM!!!
@sumangare1804 Жыл бұрын
Thank your the explanation! If possible Could you do a video on HDBSCAN algorithm
@statquest Жыл бұрын
I'll keep that in mind.
@MegaNightdude2 жыл бұрын
Great content. As always!
@statquest2 жыл бұрын
Thank you!
@kiranchowdary81002 жыл бұрын
ROCKINGGGG!!!! As always.
@statquest2 жыл бұрын
Thanks!
@nbent46077 ай бұрын
Thank you!!
@statquest7 ай бұрын
You're welcome!
@floopybits80372 жыл бұрын
Thank you so much for this video
@statquest2 жыл бұрын
Most welcome 😊!
@paulclarke45482 жыл бұрын
Great video! Thank you!! Do you have any plans to clearly explain Generative Topographic Mapping (GTM)? I'd love that!
@statquest2 жыл бұрын
Not right now, but I'll keep it in mind.
@Pedritox0953 Жыл бұрын
Great video!
@statquest Жыл бұрын
Thanks!
@spenmop3 ай бұрын
Your videos are awesome! Makes things so much clearer! But I have a couple of questions: How do you handle the situation where a point has many identical points (ie. high-dim distance = 0)? How to calculate sigma_i? For example, if k = 10, but 7-8 of the neighbours are duplicates with Dij = 0, then sigma_i is undefined. Do I de-duplicate the data first and then add it back in at the end? And symmetrizing: Wij' = Wji' = Wij + Wji - Wij x Wji, yes? But aren't Wij and Wji only calculated for neighbours of i and j? What happens if Wij exists, but Wji does not? Do I add i as another neighbour of j's? (but then j would have more than k neighbours) I'm so confused.
@statquest3 ай бұрын
To be honest, I would just try UMAP out and see what it does. It could treat duplicate points as a single point or do something else.
@junaidbutt30002 жыл бұрын
Hey Josh, Great work as always, this StatQuest came at a great time for me because I've been looking into UMAP myself. I had a few questions apologies if they're covered in the mathematical details video: 1. Is there an additional constraint for the curve used to compute the high dimensional similarity score to make the scores what they are? In the example where you computed the distance of points B and C relative to A, you had 1.0 and 0.6. This is because the scores must sum to 1.6. But why not 1.3 and 0.3 or 1.59 and 0.01? Is there an additional consideration which locks them to be 1.0 and 0.6? 2. Will there be an explanation about spectral embedding? This may be outside of the scope of the video but I thought I'd ask! 3. Could you please check my understanding for what is happening when we move point D closer to point E? The discussion starts at 14:48 in the video. As I understand it, moving D closer to E (we want this) also moves D closer to C (we don't want this). So we compute a tradeoff and find that the cost of moving D closer to C is lower than the benefit of moving D closer to E. Therefore we move D to E. Is this correct? If so, is there an equation or rule that allows us to quantify this such that we can determine the exact distance to move D closer to E? I suspect that most of these will be included in the mathematical details follow up video but I thought I'd ask just in case they aren't.
@statquest2 жыл бұрын
1) You'll see the answer to this in the follow up video. However, to give you a head start - the similarity score for the closest point is always 1, and this limits what the score for the second point can be (since only have 2 points as neighbors). 2) Unfortunately I'm not going to dive into spectral embedding (not yet at least!) 3) You're understanding is correct and you'll see the equation that makes this work in the follow up video (which will be available very soon!)
@ashfaqueazad38972 жыл бұрын
It will be great if you do some videos on sparse data if you get the time. Would love it. Thanks.
@statquest2 жыл бұрын
I'll keep that in mind.
@cytfvvytfvyggvryd2 жыл бұрын
Thank you for your terrific video! If you got time, could you made a relevant video about densMAP? Again appreciate your wonderful work! Thank you!
@statquest2 жыл бұрын
I'll keep that in mind.
@Friedrich7132 жыл бұрын
Great quest, Josh! First time I noticed the fuzzy parts on the circles and arrows. What tool are you using to make the slides? Looks damn fine!
@statquest2 жыл бұрын
Thanks! I draw everything in keynote.
@LazzaroMan2 жыл бұрын
Love you
@statquest2 жыл бұрын
Thank you!
@AU-hs6zw2 жыл бұрын
Thanks!
@statquest2 жыл бұрын
bam! :)
@lamourpaspourmoi Жыл бұрын
Thank you! Could you do one with self organizing maps?
@statquest Жыл бұрын
I'll keep that in mind.
@rajankandel8354Ай бұрын
12:44 why does UMAP decides to move point e farther from b? Is it because similarity score is zero
@statquestАй бұрын
At 12:44 we move 'b' further from 'e' because they were in different clusters in the high dimensional space.
@pranilpatil41092 ай бұрын
But we can we seperate those clusters? We need cluster centroids for that.
@statquest2 ай бұрын
UMAP isn't a clustering method, it's a dimension reduction method. If you want to find clusters, try DBSCAN: kzbin.info/www/bejne/iHW9hpeIiKmCpc0
@mericknal8752 Жыл бұрын
echoing UMAP part is amazing 😂
@statquest Жыл бұрын
Thanks! :)
@hiankun2 жыл бұрын
The big picture is ❤️ 😃
@statquest2 жыл бұрын
You got it! BAM! :)
@indolizacja98292 ай бұрын
Have you considered comparing UMAP and Concordex? :)
@statquest2 ай бұрын
Not yet.
@ranjit94272 жыл бұрын
Can you make some videos on recommender systems??
@4wanys2 жыл бұрын
complete list for recommender systems kzbin.info/aero/PLsugXK9b1w1nlDH0rbxIufJLeC3MsbRaa
@statquest2 жыл бұрын
I hope too soon!
@joejohnoptimus5 ай бұрын
How does UMAP identify these initial clusters to begin with?
@statquest5 ай бұрын
You specify the number of neighbors. I talk about this at various times, but 17:18 would be a good review.
@김광우-w8m2 жыл бұрын
I have a question. After moving d closer to e, do we still consider moving d to c? Or, would c be moved to d? The direction in the video confuses me.
@statquest2 жыл бұрын
When we move 'd', we consider both 'e' and 'c' at the same time. In this case, moving 'd' closer to 'e' and closer to 'c' will increase the neighbor score for 'e' a lot but only increase the score for 'c' a little, so we will move 'd'. For details, see: kzbin.info/www/bejne/oKXLZZ57q69mhpo
@ammararazzaq1322 жыл бұрын
As PCA required correlation between features to find new principal components, does UMAP approach require correlation between features to project data onto lower dimensional space?
@statquest2 жыл бұрын
no
@ammararazzaq1322 жыл бұрын
@@statquest So we can still see clusters even when data is not correlated?
@statquest2 жыл бұрын
@@ammararazzaq132 That I don't know. All I know is that UMAP does not assume correlations.
@ammararazzaq1322 жыл бұрын
@@statquest Okay thankyou. I will look into it a bit more.
@Chattepliee2 жыл бұрын
I've read that UMAP is better at preserving inter-cluster distance information relative to tSNE, what do you think? Is it reasonable to infer relationships between clusters on a UMAP graph? I try to avoid doing so with tSNE.
@statquest2 жыл бұрын
To be honest, it probably depends on how you configure the n_neighbors parameter. However, to get a better sense of the differences (and similarities) between UMAP and t-SNE, see the follow up video: kzbin.info/www/bejne/oKXLZZ57q69mhpo
@samggfr2 жыл бұрын
Concerning distance information, initialization and parameters are important. Read "The art of using t-SNE for single-cell transcriptomics" pubmed.ncbi.nlm.nih.gov/31780648/ and "Initialization is critical for preserving global data structure in both t-SNE and UMAP" dkobak.github.io/pdfs/kobak2021initialization.pdf
@ali-om4uv2 жыл бұрын
How does umap know which high dimensional datapoint belongs to which cluster?
@statquest2 жыл бұрын
The similarity scores.
@gama31812 жыл бұрын
Hi-dimentional BAAAMM!
@statquest2 жыл бұрын
I love it! BAM! :)
@juanete69 Жыл бұрын
But how do you "decide" that a cluster is a distant cluster? PS: I guess you consider a point as a distant point if it's not among the k neighbors.
@statquest Жыл бұрын
correct
@juanete69 Жыл бұрын
@@statquest But do you keep "adding" new points to the cluster if they are within the k neighbors of the next point, and so on? Or in order to define the cluster you only consider the k neighbors of the first point?
@statquest Жыл бұрын
@@juanete69 We start with a single point. If it has k neighbors, we call it a cluster and the neighbors to the cluster. Then, for each neighbor that has k neighbors, we add those neighbors and repeat until the cluster is surrounded by points that have fewer than k neighbors.
@TheEbbemonster2 жыл бұрын
Seems very convoluted compared to K-means or hclust.
@statquest2 жыл бұрын
UMAP uses a weighted clustering method, so that points that are closer together in high-dimensional space will get higher priority to be put close together in the low dimensional space.
@AHMADKELIX Жыл бұрын
Permissionntomlearn sir
@statquest Жыл бұрын
:)
@sapito1692 жыл бұрын
i think he will sing all the video XD
@statquest2 жыл бұрын
:)
@connorfrankston5548 Жыл бұрын
Thanks, I appreciate the information. However, I think your videos would be easier to watch with a reduction of the "bam" dimension.
@statquest Жыл бұрын
Noted!
@dummybro4992 жыл бұрын
Don't say bam....!! It irritates
@statquest2 жыл бұрын
noted
@maburwanemokoena71172 ай бұрын
@@dummybro499 Double Bam!!!
@ghazalehgolmohammadnezhadk5307Ай бұрын
@@statquestI like it though
@aiexplainai22 жыл бұрын
I can't appreciate how much this channel helped me - so clearly explained!!
@statquest2 жыл бұрын
Thank you very much! :)
@markmalkowski36952 жыл бұрын
This is awesome, thanks for explaining UMAP so well, and clearly explaining when to use! Love the topics you’re covering
@statquest2 жыл бұрын
Thank you!
@MinsangKim-n1z2 ай бұрын
Hello Josh, thank you so much for the amazing video! I have a question about the mapping consistency of UMAP. In the video, UMAP can keep mapping consistency (meaning that the mapping does not change over the iteration) when we map the projected points on low-dimensional plane based on high-dimensional similarity score, unlike to t-SNE. My question is, it doesn't necessarily mean the final visualization result would be consistent for all time, right? Because since there is randomized sampling, I don't think the final result would be consistent. I tried it using umap-learn lib and the result was also inconsistent. I'm not sure I explained well on my question but please feel free to tell me if there's any ambiguous points. Thank you and have a nice day :)
@statquest2 ай бұрын
The only way to get the exact same graph every time is to set the random seed right before you use UMAP. Although it has less randomness than t-SNE, it still has some randomness.
@davidhodson6680 Жыл бұрын
Adding a comment for the cheery ukelele song at the start, I like it.
@statquest Жыл бұрын
Thank you! :)
@brucewayne67442 жыл бұрын
Amazing video!! Hope there is a statquest on ICA coming soon :)
@statquest2 жыл бұрын
One day...
@grace6228j2 жыл бұрын
Thanks for your amazing video! I am a little bit confused, it seems that UMAP is able to do clustering (based on the similarity scores) and dimensionality reduction visualization at the same time, why do researchers usually only use UMAP for visualization?
@statquest2 жыл бұрын
That's a great question. I guess the big difference between UMAP and a clustering algorithm is that usually a clustering algorithm gives you a metric to determine how good or bad the clustering is. For example, with k-means clustering, we can compare the total variation in the data for each value for 'k'. In contrast, I'm not sure we can do that with UMAP.
@kennethm.49982 жыл бұрын
Dude... Dude... You have a gift for explaining stats. Superb.
@statquest2 жыл бұрын
Thank you!
@akashkewar2 жыл бұрын
Not sure if I can hold my breath for long enough before the video starts, Amazing work!! @StatQuest
@statquest2 жыл бұрын
Thanks!!
@dexterdev2 жыл бұрын
I was waiting for this. thank you. best dimensionally reduced visual explanation out there.
@statquest2 жыл бұрын
Thank you very much! :)
@gergerger532 жыл бұрын
Great video (as always). You might want to calm it down with the BAMs though. It used to be quirky and fun but having them literally every minute or two is a bit much and forced. Your video creation skills are seriously awesome. I wish I had even half your skills at making these concepts accessible for the YT audience. 👏
@statquest2 жыл бұрын
Noted
@dataanalyticswithmichael89312 жыл бұрын
Nice esplanation, i want to use this as my references for my projects
@statquest2 жыл бұрын
Bam! :)
@cssensei6102 жыл бұрын
can you cover Locality Sensitive Hashing, and do a clustering implementation in PySpark
@statquest2 жыл бұрын
I'll keep that in mind.
@user-hg4jk2q4 ай бұрын
This will help me greatly for my MS project.
@statquest4 ай бұрын
Good luck!
@rajankandel8354Ай бұрын
13:27 how do you derive t distribution fit
@statquestАй бұрын
That question, and other details, are answered in the "details" video: kzbin.info/www/bejne/oKXLZZ57q69mhpo
@prashantsharma-sr5dl7 ай бұрын
how did the low dimensional plot came just after the similarity score?
@statquest6 ай бұрын
At 4:14 I talk about how the main idea is that we start with an initial (somewhat random) low dimensional plot that we then optimize based on the high dimensional similarity scores.
@alexlee35116 ай бұрын
Complicated dataset you referring to is the dataset that cannot be explained by one or two PC?
@statquest6 ай бұрын
yep
@samuelivannoya2672 жыл бұрын
You are amazing!! Thanks!!!
@statquest2 жыл бұрын
Thank you!
@franziskakaeppler56022 ай бұрын
Thank you for this great video. I have a question at 8:21min: Why are the similarity scores 1,0 an 0,6? Could they as well be e.g. 0,9 and 0,7?
@statquest2 ай бұрын
I'm sorry for the confusion. There's an important detail that I should have included in this video, and not just the follow up that shows the mathematical details ( kzbin.info/www/bejne/oKXLZZ57q69mhpo ): the nearest point always has a similarity score of 1.
@franziskakaeppler56022 ай бұрын
Thank you:)
@andreamanfron31992 жыл бұрын
i just love you
@statquest2 жыл бұрын
Thanks!
@TJ-hs1qm2 жыл бұрын
auto-like 👍
@statquest2 жыл бұрын
bam!
@jatin19952 жыл бұрын
Perfect!
@statquest2 жыл бұрын
Thank you!
@wlyang8787 Жыл бұрын
Hi Josh, would you please make a video about DiffusionMap? Thank you very much!
@statquest Жыл бұрын
I'll keep that in mind.
@AkashKumar-qe5jk2 жыл бұрын
Great video!!! One query: What characteristics of the features/dataset we would be analyzing when we choose a smaller value of neighbors? Same question with larger values?
@statquest2 жыл бұрын
The number of nearest neighbors we use does not affect how the features are used. The features are all used equally no matter what.
@leamon90242 жыл бұрын
Hello sir, would you cover a dimension reduction technique which uses hierarchical or k-means clustering if possible? Thanks in advance.
@statquest2 жыл бұрын
I'll keep that in mind.
@flc4eva2 жыл бұрын
I might have missed this, but how does UMAP initializes a low-dimensional graph? Is it randomized as done in tSNE?
@statquest2 жыл бұрын
This is answered at 16:43
@Dominus_Ryder2 жыл бұрын
StatQuest please do a UMAP tutorial in R next!
@statquest2 жыл бұрын
I'll keep that in mind. However, I'm doing the mathematical details next.
@92marjoh2 жыл бұрын
Hey Josh, Your videos have made my learning curve exponential and i truly appreciate the videos you make! I wonder, have you ever considered making a video about Bayesian target encoding (and other smart categorical encoders)?