Extract data from a Kaplan-Meier plot

  Рет қаралды 10,240

Dominic Magirr

Dominic Magirr

Күн бұрын

I describe how to extract (approximate) patient level data from a Kaplan-Meier plot using the algorithm of:
Guyot, P., Ades, A., Ouwens, M.J. et al. Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves. BMC Med Res Methodol 12, 9 (2012). doi.org/10.118...
(link to R code in paper)
and using WebPlotDigitizer:
Author: Ankit Rohatgi
Title: WebPlotDigitizer
Website: automeris.io/W...
Version: 4.2
Date: April, 2019
E-Mail: ankitrohatgi@hotmail.com
Location: San Francisco, California, USA
See also
healthdatacoun...
The Kaplan-Meier plot I used as an example came from the supplementary material of:
Gandara, D.R., Paul, S.M., Kowanetz, M. et al. Blood-based tumor mutational burden as a predictor of clinical benefit in non-small-cell lung cancer patients treated with atezolizumab. Nat Med 24, 1441-1448 (2018). doi.org/10.103...
I also mentioned the Stata version:
Wei, Y., & Royston, P. (2017). Reconstructing Time-to-event Data from Published Kaplan-Meier Curves. The Stata Journal, 17(4), 786-802. doi.org/10.117...

Пікірлер: 55
@madankundu6035
@madankundu6035 2 жыл бұрын
Very helpful. I am trying to figure out this data extraction, and you have saved my day. Well done, keep posting helpful videos for fellow statisticians
@matinhewing1
@matinhewing1 4 жыл бұрын
Many thanks for this excellent tutorial. It really helped.
@alpr1864
@alpr1864 4 жыл бұрын
You are really G.O.A.T! Would you please make another video to get Hazard Ratio from the Kaplan-Meier?
@lydiahanna4016
@lydiahanna4016 2 жыл бұрын
AGREE, excellent video!! Yes please, another video of how to het the HR from the KM curve re-created would be great!
@Fyall35
@Fyall35 8 ай бұрын
Thank you very much for this video. This was massively helpful
@duongngoccongkhanh4711
@duongngoccongkhanh4711 4 жыл бұрын
Thank you very much! It is so helpful!
@fawaztariq3585
@fawaztariq3585 2 жыл бұрын
Thanks, this was so helpful
@CanadianGeneralContractor
@CanadianGeneralContractor 2 жыл бұрын
Fawaz, did you get this error? Error in paste(path, KMdatafile, sep = "") : object 'KMdatafile' not found"
@mastermindstudynew
@mastermindstudynew 3 ай бұрын
How if the paper didn't provide the data of number at risk? Thank you
@drsuhailsayed
@drsuhailsayed 2 жыл бұрын
Hi Dominic, thanks for posting this useful video. It will be very useful if you could also let us all know the Hazard Ratio calculation, Also, If the KM curve does not have the "numbers at risk" How can we calculate in R.
@khairuhazwan9859
@khairuhazwan9859 2 жыл бұрын
Hi. You can try to estimate the number at risk using the method explained by Tierney et al. 2007 (doi:10.1186/1745-6215-8-16). Then use that information to help you reconstruct the IPD. Good luck!
@ahmedabdelaziz9602
@ahmedabdelaziz9602 Жыл бұрын
Thank you so much. I have a question how to deal with dublicate values (which value should I delete)
@adoi
@adoi 3 жыл бұрын
AMAZING! Thanks a lot!
@adoi
@adoi 3 жыл бұрын
I tried the following, but the upper doesn't work. lower = x))) upper
@adoi
@adoi 3 жыл бұрын
Oh sorry, I figured. My data was not sorted according to the t.S
@brunolarvol7951
@brunolarvol7951 4 жыл бұрын
Super helpful. Thanks.
@alessandrodifederico9870
@alessandrodifederico9870 Жыл бұрын
Hi, another question: when extracting data, sometimes, the number of events does not match with that I had put in the tot.events at the beginning of the code. Anyone knows why this can happen? is the code correct for that? Thank you!
@Adam-vg7pu
@Adam-vg7pu 10 ай бұрын
I don't know how this could work for you guys. I tried to use my KM curve and it's not working.
@alessandrodifederico9870
@alessandrodifederico9870 Жыл бұрын
Hi, what is the best way to put data points on the KM curves at the time of data extraction that results in the most accurate estimation of censored patients at each time point (mirroring as much as possible the censorship in the original KM plot)? Thank you!
@FMW110
@FMW110 Жыл бұрын
Hey, please answer my question. I did (almost) everything as you said. As in, I am not getting errors per say. But now that I run the code, It's running without error but I am not getting any results at all. I've tried download Ipdfromkm and other packages, but I'm still not getting the IPD, as you did in the end. Pls tell me what to do. I'm a newbie still, so I might be missing something.
@ibrahimalfayoumi4328
@ibrahimalfayoumi4328 11 ай бұрын
Hello Dominic! So, i am getting this error message "Error: unexpected ':' in "lower
@rositsakoleva-kolarova3899
@rositsakoleva-kolarova3899 3 жыл бұрын
Thank you for this very useful tutorial! I can’t seem to get the lower and the upper right, although I’m exactly following the code. The lower returns Inf, while upper returns -Inf. Has anyone else run into the same issue? As a result in my IPD survival time is NA, and the second column with events/censored is only 0.
@cheihung5579
@cheihung5579 3 жыл бұрын
Hi, I can't get upper or lowers either. E.g. lower= x)))? Any help would be great!
@julianra3256
@julianra3256 2 жыл бұрын
I had the same issue. First I checked my csv file in R, that it looks like: 0.2871,1.0 0.2930,0.9906 ... rather than (this happens when you open the csv file first with excel) 0,2871;1.0 0,2930;0,9906 ... and then try to minimise the time you representing in the risk table. So in the tutorial he just used: t.risk
@davevanderkruijssen2330
@davevanderkruijssen2330 3 жыл бұрын
Using the same graph from Gandara et al. and following the steps explained in the video, I get the following error "Error in rep(t.S[j], d[j]) : invalid 'times' argument" when I arrive at the following part of the script: ### Now form IPD ### #Initialise vectors t.IPD
@johnuserful
@johnuserful Жыл бұрын
I think this might mean there is a conflict between your extracted data and the numbers at risk (i.e. that the proportion of patients who had the event (the survival times) are higher (by a margin) than what would be expected from the numbers at risk table. Try extracting the curve again carefully?
@shathaomar1516
@shathaomar1516 3 жыл бұрын
Hi, Is that mean the only way to reconstruct this data is using R package? Can't I do the calculation using excel?
@debalinamukhopadhyay6677
@debalinamukhopadhyay6677 4 жыл бұрын
I dont have number at risk data along with my table. How should then I get it?
@dominicmagirr5571
@dominicmagirr5571 4 жыл бұрын
Hi. Sorry, don't think there's much you can do without the numbers at risk.
@khairuhazwan9859
@khairuhazwan9859 4 жыл бұрын
Hi. You can try to estimate the no. at risk using the method explained by Tierney et al. 2007 (doi:10.1186/1745-6215-8-16). Then use that information to help you reconstruct the IPD.
@khairuhazwan9859
@khairuhazwan9859 4 жыл бұрын
Hi thank you for sharing this. I have one problem to your approach here. As soon as I run the algorithm in Rmd, I always get this message "Error in paste(path, KMdatafile, sep = "") : object 'KMdatafile' not found" I follow exactly how you did, but your analysis can run without the error message. What might have caused this?
@eulat4367
@eulat4367 4 жыл бұрын
Great tutorial! I have the same issue as Khairu.
@eulat4367
@eulat4367 4 жыл бұрын
Ah I see where I missed a step: he added "IPD" as the last line. This will run IPD.
@CanadianGeneralContractor
@CanadianGeneralContractor 2 жыл бұрын
@khairu, were you able to resolve this problem?. Dominic, you got the error but your syntax still ran, mine stopped at this error.
@julianra3256
@julianra3256 2 жыл бұрын
At the beginning he got rid of the Function inputs, because he used different ones. If you put your own path in (e.g. path
@khairuhazwan9859
@khairuhazwan9859 2 жыл бұрын
@@CanadianGeneralContractor I did
@eroseangeles7263
@eroseangeles7263 3 жыл бұрын
hi how do we extrapolate the survival curve beyond its current time or duration?
@dominicmagirr5571
@dominicmagirr5571 3 жыл бұрын
Hi. That's a very big topic. Maybe this paper could help you journals.sagepub.com/doi/full/10.1177/0272989X16639900
@eroseangeles7263
@eroseangeles7263 3 жыл бұрын
thank you so much for replying ☺️
@DocAdamCardio
@DocAdamCardio 3 жыл бұрын
hi there, but every time I ran the read digitize part, it would come out as error. how do I go about this? been nights trying to figure this out. thanks a many!
@dominicmagirr5571
@dominicmagirr5571 3 жыл бұрын
Hi. I'm not sure I'm afraid. Are you referring to the section I delete (around 6:30)? Perhaps you could provide a few more details.
@vangelis9911
@vangelis9911 3 жыл бұрын
@@dominicmagirr5571 i habe the same problem here a message comes out "Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'deboer.csv': No such file or directory > t.S S
@vangelis9911
@vangelis9911 3 жыл бұрын
got it , wrong directory !! thanx for the awesome video btw, some tutorials on meta-analysis would be nice too , doing a great job thanx again
@thrtst
@thrtst 4 жыл бұрын
Hi Dominic, I am struggling getting en error "Error: Can't subset columns that don't exist. x Location 9 doesn't exist. ℹ There are only 1 column." when running this part of the code: if (n.int > 1){ #Time intervals 1,...,(n.int-1) for (i in 1:(n.int-1)){ #First approximation of no. censored on interval i n.censor[i]n.risk[i+1])||((n.hat[lower[i+1]]0))){ if (n.censor[i]
@dominicmagirr5571
@dominicmagirr5571 4 жыл бұрын
Hi Paul. I'm not sure, but I suspect it could be that there is an interval in the t.risk vector where there are no corresponding data points. I've seen this cause problems in the past, and the solution is to either collect more data points, or shorten t.risk.
@thrtst
@thrtst 4 жыл бұрын
@@dominicmagirr5571 Thanks Dominic. I seem to have resolved the issue by reloading the downloaded css as read.cvs instead of read_csv. However I am getting several errors when I run the entire code. How do I resolve these? 1. Error in while ((n.hat[lower[i + 1]] > n.risk[i + 1]) || ((n.hat[lower[i + : missing value where TRUE/FALSE needed 2. Error in if (n.censor[n.int] 0) { : missing value where TRUE/FALSE needed 4.Error in lower[n.int]:upper[n.int] : NA/NaN argument 5. Error in rep(t.S[j], d[j]) : invalid 'times' argument Kind regards Paul
@dominicmagirr5571
@dominicmagirr5571 4 жыл бұрын
Sorry, I don't know beyond making sure that .risk and n.risk are aligned, and there are no intervals in t.risk where there are no data points in t.S. I'm afraid I didn't write the bulk of the algorithm, so not familiar with all the details. Good luck!
@CanadianGeneralContractor
@CanadianGeneralContractor 2 жыл бұрын
Hey Th_rtst, did you figure out how to resolve this issue?
@ghazalstaity5269
@ghazalstaity5269 3 жыл бұрын
Do you mind sharing what version of R you used for this tutorial? I keep getting error messages although I am copy pasting the algorithm exactly from the paper and following the changes you made in the video. I’m using RStudio on R4.1.0. Thank you in advance.
@dominicmagirr5571
@dominicmagirr5571 3 жыл бұрын
I think it was 3.6.1. Sorry to hear it didn't work for you, I haven't tried it yet on 4.1.0.
@elmobabiiex
@elmobabiiex 3 жыл бұрын
Thanks very much for the video, it was really helpful. One issue i am having is at the end where you combine the two plots into 1. I can see both datasets in my global environment loaded, but the code you provide at the end is not working and no plot is produced. Curiously there is no error, also no combined plot produced
@elmobabiiex
@elmobabiiex 3 жыл бұрын
Update- solved. The two arms of the data has to be named with different arm indicator numbers in the algorithm when running otherwise combining the plots wont work with the above code example.
How to read Kaplan-Meier plots
46:36
Vinay Prasad MD MPH
Рет қаралды 35 М.
The evil clown plays a prank on the angel
00:39
超人夫妇
Рет қаралды 53 МЛН
Арыстанның айқасы, Тәуіржанның шайқасы!
25:51
QosLike / ҚосЛайк / Косылайық
Рет қаралды 700 М.
Сестра обхитрила!
00:17
Victoria Portfolio
Рет қаралды 958 М.
How to draw Kaplan Meier survival curves in R
31:06
Makerere Epidemiology and Statistical center
Рет қаралды 8 М.
Kaplan-Meier Curves and Log-rank Test - [Survival Analysis 4/8]
36:40
Tutorial of WebPlotDigitizer
10:04
Taylor Sparks
Рет қаралды 67 М.
Data Extraction from Kaplan-Meier for Meta-Analysis of Hazard Ratio
16:31
Am1r Safavi-Naini
Рет қаралды 1,3 М.
Reverse Engineering Kaplan-Meier Survival Curves
6:37
Pancanology
Рет қаралды 754
WebPlotDigitizer: How to Extract Data using WebPlotDigitizer
8:05
edu supplement
Рет қаралды 27 М.
Survival analysis with TCGA data in R | Create Kaplan-Meier Curves
43:50
Bioinformagician
Рет қаралды 20 М.
Teach me STATISTICS in half an hour! Seriously.
42:09
zedstatistics
Рет қаралды 2,9 МЛН
All Machine Learning algorithms explained in 17 min
16:30
Infinite Codes
Рет қаралды 588 М.
The evil clown plays a prank on the angel
00:39
超人夫妇
Рет қаралды 53 МЛН