Survival analysis with TCGA data in R | Create Kaplan-Meier Curves

  Рет қаралды 16,107

Bioinformagician

Bioinformagician

Күн бұрын

In this video I talk about the concept of survival analysis, what questions does it help to answer and what data do we need to perform this analysis. I also discuss important concepts like censoring and how it is perform and explain how to interpret Kaplan-Meier curves. Lastly, I demonstrate how to perform survival analysis in R using survival and survminer packages.
I hope you find this video helpful! Leave your thoughts in the comment section below!
Link to Code:
github.com/kpatel427/KZbinT...
How to download data from GDC portal?
• Download data from GDC...
How to convert gene IDs to symbols?
• 3 ways to convert Ense...
Chapters:
0:00 Intro
0:35 Intuition behind survival analysis
2:21 Why do we perform survival analysis?
3:57 What is Censoring and why is it important?
6:14 What is considered as an event?
6:35 Methods for survival analysis
8:03 How to read a Kaplan-Meier curve?
10:31 Question to answer using survival analysis
10:53 3 things required for survival analysis
12:08 Download clinical data from GDC portal
15:57 Getting status information and censoring data
17:31 Set up an “overall survival” (i.e. time) for each patient in the cohort
19:01 For event/strata information for each patient, fetch gene expression data from GDC portal
19:33 Build query using GDCquery()
22:45 Download data using GDCdownload()
23:14 Extract counts using GDCprepare()
25:07 Perform Variance Stabilization Transformation (vst) on counts before further analysis
27:38 Wrangle data to get the relevant data and data in the right shape
33:11 Approaches to divide cohort into 2 groups based on expression
34:41 Bifurcating patients into low and high TP53 expression groups
34:57 Define strata for each patient
38:41 Compute a survival curve using survfit() and creating a Kaplan-Meier curve using ggsruvplot()
41:30 survfit() vs survdiff()
You can show your support and encouragement by buying me a coffee:
www.buymeacoffee.com/bioinfor...
To get in touch:
Website: bioinformagician.org/
Github: github.com/kpatel427
Email: khushbu_p@hotmail.com
#bioinformagician #bioinformatics #survival #survminer #survivalanalysis #kaplanmeier #tcga #gdcportal #tcgaportal #nci #cran #bioconductor #funcotator #variantcalling #variants #gatk #vcf #gvcf #haplotype #alleles #geneticvariants #mutations #gff3 #gff #gtf #sam #bam #phred #fasta #fastq #singlecell #10X #ensembl #biomart #annotationdbi #annotables #affymetrix #microarray #affy #ncbi #genomics #beginners #tutorial #howto #omics #research #biology #GEO #rnaseq #ngs

Пікірлер: 33
@shivanirai3626
@shivanirai3626 7 күн бұрын
Best channel for any bioinformatician ❤❤
@preeti97rox
@preeti97rox Жыл бұрын
As someone who doesn't have a degree in Bioinformatics I am truly able to appreciate these things. Never stop making these videos!!
@jordanfredette5090
@jordanfredette5090 Жыл бұрын
This is literally exactly the resource I was looking for several months ago. Glad to finally have it now. It's so nice to have example code and clear explanation.
@MsZhang666
@MsZhang666 Жыл бұрын
I'm going to do survival analysis tomorrow, and I found you updated this video, it's so so so helpful! You're my Godness😍😘
@codewithme_1988
@codewithme_1988 Жыл бұрын
Hi, I appreciate your work. Thanks for making these videos
@amitrupani9898
@amitrupani9898 Жыл бұрын
Thank you very much for this very informative tutorial. Very helpful indeed.
@PsycheSnacks657
@PsycheSnacks657 Жыл бұрын
You are the best! Thanks
@MsZhang666
@MsZhang666 Жыл бұрын
I can't agree more
@ezra47986
@ezra47986 6 күн бұрын
Thank you for your video! I just have question, why did you extracted the unstranded counts, but not any other count type?
@prakrithi.p7033
@prakrithi.p7033 10 ай бұрын
Thank you so much for your amazing content. I just wanted to know how I could extract the TCGA counts for some non-coding regions specified in a bed file. Suggestions would be really helpful. Thanks!
@user-mv7uw3dh5d
@user-mv7uw3dh5d Жыл бұрын
Thanks so much. This video is really useful. Besides, how can we prepare data to combine different factors to draw forest plot or to construct risk models? Could you please share this similar R code? Thanks again!
@BilalAhmad-gb7ui
@BilalAhmad-gb7ui Жыл бұрын
Could you please make a video on integration of Chip-seq and RNA-seq data?
@Bioinformagician
@Bioinformagician Жыл бұрын
I definitely plan to! Please stay tuned :)
@BilalAhmad-gb7ui
@BilalAhmad-gb7ui Жыл бұрын
@@Bioinformagician Thank you! I appreciate that.
@madushanfernando6495
@madushanfernando6495 7 ай бұрын
Thank you very much for the excellent presentation. I am relatively new to TCGA-based R analysis. I was wondering if I can apply the same process to plot survival curves for a particular mutation using SNV data, such as the effect of BRCA1 mutation on the overall survival of ovarian cancer patients. Are there any significant changes that I need to make in the workflow to achieve this?
@stefanodidonato1284
@stefanodidonato1284 8 ай бұрын
If you ever write a book, let me know cause I'll pay 2000 euro to get it hands down!
@skim4901
@skim4901 10 ай бұрын
Thank you for this very helpful video. If I want to know correlation (pearson R-value) between some genes in TCGA-Breast Cancer , do I have to use fpkm_unstrand? Could you make video about this? Again, I really appreciate your effort!!
@AyrodsGamgam
@AyrodsGamgam Жыл бұрын
thanks. Could you please run a tut on combining Machine Learning in R and TCGA or cbioportal or Gdac or others? Thank you.
@reflections86
@reflections86 Жыл бұрын
Greetings Miss Khusbu! Again a powerful video and it was really comprehensive. I have one question and will appreciate your guidance on it. If we perform survival analysis on an RNA-seq data from TCGA, and let’s say the expression matrix has 20K genes and 200 patients. After survival analysis I found 30 genes that has significant survival difference. So I want to pursue further and perform a multivariate cox regression of these 30 genes. Now my confusion is that what expression matrix we should use in multivariate cox model. Should we reduce initial expression matrix to only 30 genes as variables(columns) and 200 patients (as rows) or should we use the original expression matrix (having 20K genes and 200 patients and only put 30 genes in the cox equation : coxph(Surv(time, event) ~ gene1+ gene2 + gene3..+ gene 30 , data)). Will highly appreciate your comment on that. Thanks and keep doing the great work.
@Bioinformagician
@Bioinformagician Жыл бұрын
I don't recommend to reduce the matrix to 30 genes. You should use the entire dataset and provide 30 genes in cox equation. Also, check for multicollinearity between 30 genes, as correlations between genes can cause instability in model estimates. If collinearity is found, you should use feature selection methods to include most relevant and independent predictors in the model.
@reflections86
@reflections86 Жыл бұрын
@@Bioinformagician Many Thanks. Highly appreciate your reply.
@ShubhamMaurya-ws5ly
@ShubhamMaurya-ws5ly Жыл бұрын
Can you please make video on top colleges of msc bioinformatics in India?
@saeedjaanz
@saeedjaanz Жыл бұрын
Have you ever heard or done MFA & mixOmics DIABLO analysis on TCGA data?
@user-yf4pn8bw9c
@user-yf4pn8bw9c Жыл бұрын
How do we change the number days upto which follow up is done? Say instead of 8000 days I want the data upto only 4000 days.
@raresciencesimple5626
@raresciencesimple5626 Жыл бұрын
risk.table is showing the followinf error: Error: 'yaml_body' is not an exported object from 'namespace:xfun'. can you please help
@mugomuiruri2313
@mugomuiruri2313 7 ай бұрын
good
@shreyasharma8063
@shreyasharma8063 Жыл бұрын
Hello mam, I am getting pvalue = 47.07. results are not significant. how to solve this. what could be the reason for this
@dwitiroy2700
@dwitiroy2700 Жыл бұрын
Hello didi .. I need to talk to you .. can you pls send ur contact details .. it's about my current project .. i have some questions based on bioinformatics
@arpitmathur2933
@arpitmathur2933 11 ай бұрын
Dividing into groups is not good practice. Regression should be used. I did my whole thesis on this debate.
@divyaagrawal6740
@divyaagrawal6740 Жыл бұрын
Why we usually chose “unstranded data” for analysis?? @bioinformagician @khushbu. Please do solve this query??
@Bioinformagician
@Bioinformagician Жыл бұрын
I chose unstranded data for demonstration purposes. If your data is generated using a stranded protocol, you should choose stranded or reverse stranded accordingly.
@divyaagrawal6740
@divyaagrawal6740 Жыл бұрын
@@Bioinformagician thank you
@saeedjaanz
@saeedjaanz Жыл бұрын
​@@Bioinformagician I had the same question as @Divya and i got my answer.
Survival Analysis in R
1:38:40
David Caughlin
Рет қаралды 34 М.
Дибала против вратаря Легенды
00:33
Mr. Oleynik
Рет қаралды 3,1 МЛН
Василиса наняла личного массажиста 😂 #shorts
00:22
Денис Кукояка
Рет қаралды 9 МЛН
FOOTBALL WITH PLAY BUTTONS ▶️❤️ #roadto100million
00:20
Celine Dept
Рет қаралды 35 МЛН
MEGA BOXES ARE BACK!!!
08:53
Brawl Stars
Рет қаралды 32 МЛН
Download data from GDC Portal using TCGAbiolinks R Package
41:31
Bioinformagician
Рет қаралды 15 М.
How to interpret KAPLAN-MEIER curves - Easily explained!
9:41
Biostatsquid
Рет қаралды 9 М.
Kaplan-Meier Curves and Log-rank Test - [Survival Analysis 4/8]
36:40
How to read Kaplan-Meier plots
46:36
Vinay Prasad MD MPH
Рет қаралды 25 М.
Introduction to Survival Analysis in R
2:48:24
UCLA Office of Advanced Research Computing (OARC)
Рет қаралды 9 М.
OncoLnc: Linking TCGA survival data to mRNAs, miRNAs, and lncRNAs
18:08
Survival Analysis in R (in 8-minutes)
8:14
Business Science
Рет қаралды 9 М.
Дибала против вратаря Легенды
00:33
Mr. Oleynik
Рет қаралды 3,1 МЛН