What is the proportion of uncensored vs censored data in a dataset for a survival model to be accurate? For example, if there is a dataset that is heavily skewed towards censored datapoints, will such models still be accurate ? In general how do we measure this accuracy of survival models ?
@WilnerSumagingsing2 ай бұрын
For survival models, it is generally recommended to have 60-70% uncensored data to ensure reliable estimates. A dataset skewed towards censored data can lead to bias in survival estimates and overfitting to the available uncensored observations, ultimately compromising the model's accuracy. To measure the accuracy of survival models, several methods csn be employed. The Concordance Index (C-Index) assesses the model's ability to rank survival times, with values ranging from 0.5 (indicating no discrimination) to 1.0 (indicating perfect discrimination). The Log-Rank Test is used to compare survival distributions between groups, revealing significant differences. The Brier Score measures the mean squared difference between predicted survival probabilities and actual outcomes, with lower scores indicating better performance. Calibration plots visually compare predicted probabilities to observed outcomes, with well-calibrated models aligning closely to the 45-degree line. Additionaly, Akaike Information Criterion (AIC) snd Bayesian Information Criterion (BIC) allow for model comparison based on fit and complexity, where lower values suggest better models. Finally, cross-validation can be used to assess model performance across different data subsets, providing a robust evaluation. In conclusion, maintaining a balance between uncensord and censored observations is crucial for effective survival models, and employing these metrics offers a comprehensive understanding of their accuracy and reliability.