Intro to R/RStudio: RNAseq edition
55:11
Linear models in R
52:54
2 жыл бұрын
Introduction to base R (i4TB)
56:04
2 жыл бұрын
Making an R package (day 2)
43:05
2 жыл бұрын
Making an R package (day 1)
48:11
2 жыл бұрын
Introduction to coding in R
1:10:56
2 жыл бұрын
Gene set enrichment analysis in R
1:29:32
Introduction to the R tidyverse
1:34:13
Пікірлер
@fgfanta
@fgfanta 6 күн бұрын
I got here the explanations I didn't get from my university teachers, thank you! 1:09:01 I think you can do the same (as the "borderline tacky code") with this one-liner gene_set_list <- split(H$ensembl_gene, H$gs_name)
@user-il4jz8mu6o
@user-il4jz8mu6o 3 ай бұрын
my genes like cd45 and sox10 are not within the hg19 pbmc data ! what I should do ?
@jinlingli9728
@jinlingli9728 5 ай бұрын
Hi @Kim Dill-McFarland, thanks for this helpful and insightful intro to R tidyverse. Could you please provide the data for dat$E in your presentation for our practice?
@kdillmcfarland
@kdillmcfarland 5 ай бұрын
All data and code are available in GitHub linked in the video description
@joeyoviedo5202
@joeyoviedo5202 11 ай бұрын
Hi, great video and clarification of types of enrichment analyses. I have a question, what is the best way to create a ranked list of genes for 3 treatment and 3 control samples in one data frame using just normalized read counts. I want to rank the gene list from all genes not DEGs then do enrichment analysis. Thank you!
@jessehines4044
@jessehines4044 Жыл бұрын
I'm new to this and I'm wondering why do you need to see how much your significant genes overlap with a larger or other gene set? Is that to elucidate what transcriptional regulation network controls the significant genes and or to discover other similar genes relative to the genes of interest?
@smrutimishra9804
@smrutimishra9804 Жыл бұрын
How to download the genesets directly in R studio?
@kcal12
@kcal12 Жыл бұрын
As a guy who got recommended this video on youtube for some reason, who has no experience in molecular biology, nor anything related to biology or scientific expert fields, let me give you my 2 cents : you have many errors here and there for which you will need to re-analyze from the beginning. I will not pin point what is wrong nor why, just pointing out that in my expert point of view, this whole video was based on false data. Try again BUCKO :D:D:D:D:D:D
@kcal12
@kcal12 Жыл бұрын
PS: here is where parts of my analysis come from : kzbin.info/www/bejne/roTbgpZ3opiDe5o
@naveedkhan-fi6ux
@naveedkhan-fi6ux Жыл бұрын
can we do the gene set enrichment analysis for rice using the same code and databases
@CorruptedSon
@CorruptedSon Жыл бұрын
Thank you! Spend a while trying to figure out how to do pathway analysis in R and most guides always expected you already have some sort of GO or Kegg library where you can refer to and don't go into specifics how these libraries work and what to do when they do not work. This step-by-step guide was enough to get me from DEG lists into proper pathway analysis - and I even understood why and what I am doing in each step! I am working with rat sequencing data and some columns I had were very different from the example data you had here but after checking specific points a few times I managed to filter and re-format all the necessary information from my data.
@yt.abhibhav
@yt.abhibhav Жыл бұрын
Thankyou! I was just wondering which paper to cite when performing the hypergeometric "simple" enrichment?
@Stop-and-listen
@Stop-and-listen Жыл бұрын
I really enjoyed your presentation. I learned quite a bit. Thank you!
@DDosAndDonts
@DDosAndDonts 2 жыл бұрын
Thanks for the tips, dr Kim. I have a question: why does && not work under dplyr context? afaik, only & works
@kdillmcfarland
@kdillmcfarland 2 жыл бұрын
&& only returns the first response whereas & returns a vector of all responses. In the context of dplyr, you want the whole vector or else something like filter would only return the first row for which the statement is TRUE, instead of all rows that are TRUE. For example, in something like the following with a single &, c(-1,1) > 0 & c(2,-2) >0 you get FALSE FALSE. This is because it is evaluating along the vector looking at index 1 of both vectors, then index 2. So the first statement is -1 > 0 & 2 > 0 ==> FALSE because the first part (-1 is not greater than 0). Then the second statement is 1 > 0 & -2 > 0 ==> FALSE because the second part (-2 is not greater than 0) In contrast, the following just gives FALSE because it only looks at index 1 (the first statement) and ignores everything else c(-1,1) > 0 && c(2,-2) >0
@mocabeentrill
@mocabeentrill 2 жыл бұрын
Had to read your paper first 😅
@mocabeentrill
@mocabeentrill 2 жыл бұрын
Hi Kim. @6:10 I see WGCNA installation? I'm busy with that analysis and I'm tripping up. Are you doing that analysis as well? Will you make an R tutorial for WGCNA?
@kdillmcfarland
@kdillmcfarland 2 жыл бұрын
It's on the list in the next couple of months!
@stefanodidonato1284
@stefanodidonato1284 2 жыл бұрын
Best channel ever, thank for keeping posting
@jajaja20703
@jajaja20703 2 жыл бұрын
Very clear explanation, thanks for this amazing content! Would you have any additional bio-inf analysis tutorials?
@kdillmcfarland
@kdillmcfarland 2 жыл бұрын
Thanks! I have other R workshop videos kzbin.info/aero/PL_Oo8UFoIb007lGeg78awOu44Ido35zsY with materials for those and other workshops that don't have videos at github.com/BIGslu/workshops and github.com/hawn-lab/workshops_UW_Seattle
@vaibhavsunkaria7291
@vaibhavsunkaria7291 2 жыл бұрын
Hi how can we do the gsea analysis for dna methylation genes i have beta values of samples and logFC cutoff of the same, thank you
@azure-hawk
@azure-hawk 2 жыл бұрын
Great video! I learned about msigdbr and the dplyr::separate function. I just want to mention a few things. 1. The GSEA ranking metric doesn’t have to be fold-change. I use the gene wise average moderated t-statistic from limma or the signed -log10-transformed p-value. There are a ton of ranking metrics to choose from. Both of these are very similar, and we can compare their density plots to get an idea of how they would alter the GSEA results. 2. Over-representation analysis is not great as a follow-up to differential analysis because of the arbitrary significance threshold that you mentioned and the fact that there may be duplicates at the gene level. Also, we lose information about the direction of change, since ORA only tells us which sets are more present in the significant group than what we expect by chance. However, it is great when genes uniquely map to discrete clusters, so it is good as a follow-up to WGCNA or K-means clustering. 3. The figures you use to introduce GSEA show the phenotype permutation approach, but most R implementations (including fgsea) use the gene permutation approach, which is much faster but has a slightly different interpretation. 4. For ORA, it may be useful to plot the ratio of the number of significant genes in the gene sets to the total number of significant genes along the x-axis and change the bars to points scaled according to the -log10(adjusted p-value). Gene sets that include all significant genes (ratio of 1) may be interesting to look at, even if their adjusted p-values are hovering near 0.05. 5. The fora function in fgsea can be used for ORA as well. Personally, I find it easier than dealing with the bulkier clusterProfiler results objects.
@pragatigupta8999
@pragatigupta8999 2 жыл бұрын
HELLO, How can we add gene count and pvalue in same histogram by using clusterprofiler package of R?
@kdillmcfarland
@kdillmcfarland 2 жыл бұрын
Do you mean the FDR by count histograms around 35min? You can add the total # of genes (count) to the top of each bar in a histogram with stat_bin(geom="text", aes(label=..count..)) And to plot Pvalue, I would make a new plot with x=Pval instead of x=FDR
@tonkatsuburger3531
@tonkatsuburger3531 2 жыл бұрын
Thank you so much this was so helpful!
@hamidnikbakht1295
@hamidnikbakht1295 2 жыл бұрын
Thank you for the very clear explanations. One question is that for the purpose of GSEA (either simple or gsea), what type of normalization of the counts should one use? Or does it even matter? If so, how would it be different between the two methods? Thank you!
@kdillmcfarland
@kdillmcfarland 2 жыл бұрын
For RNAseq GSEA, we use fold changes calculated from TMM normalized log2 counts per million (see limma package tutorial) or estimates output by whatever linear model we ran. In essence, whatever data normalization needs to be done for stats should also be done before calculating fold changes for GSEA. For simple enrichment, it's similar. Treat the data however is best for statistical tests. Then find significant genes from those tests and input those gene lists into enrichment