Three-day course

Statistical methods for risk prediction and prognostic models

Date: 21 June – 23 June 2022
Location: Online
Cost: 
Industry - £649
University staff member - external to Keele - £549
Post-graduate student - external to Keele - £449

This online course provides a thorough foundation of statistical methods for developing and validating prognostic models in clinical research. The course is delivered over three days and focuses on model development (day one), internal validation (day two), and external validation and novel topics (day three). Our focus is on multivariable models for individualised prediction of future outcomes (prognosis), although many of the concepts described also apply to models for predicting existing disease (diagnosis).

Please email Lucinda Archer, Lecturer in Biostatistics, for any course enquiries.

Risk prediction and prognostic logo The course is aimed at individuals that want to learn how to develop and validate risk prediction and prognostic models, specifically for binary or time-to-event clinical outcomes (though continuous outcomes are also covered). We recommend participants have a background in statistics. An understanding of key statistical principles and measures (such as effect estimates, confidence intervals and p-values) and the ability to apply and interpret regression models is essential.

The course will be run online over three days using a combination of recorded lecture videos, computer practical exercises in Stata and R for participants to work through, and live question and answer sessions following each lecture/session. There will also be opportunities to meet with the faculty to ask specific questions related to personal research queries and problems.

Computer practicals in either R or Stata are included on all three days (two per day), and participants can choose whether to focus on logistic regression examples (for binary outcomes) or Cox / flexible parametric survival examples (for time-to-event outcomes), to tailor the practicals to their own purpose. All code is already written, so participants can focus more on their understanding of methods and interpretation of results.

Day one begins with an overview of the rationale and phases of prediction model research. It then outlines model specification, focusing on logistic regression for binary outcomes and Cox regression or flexible parametric survival models for time to event outcomes. Model development topics are then covered, including identifying candidate predictors, handling missing data, modelling continuous predictors using fractional polynomials or restricted cubic splines for non-linear functions, and variable selection procedures.

Day two focuses on how models are overfitted for the data in which they were derived, and thus often do not generalise to other datasets. Internal validation strategies are outlined to identify and adjust for overfitting. In particular cross-validation and bootstrapping are covered to estimate the optimism and shrink the model coefficients accordingly; related approaches such as LASSO and elastic net are also discussed. Statistical measures of model performance are introduced for discrimination (such as the C-statistic and D-statistic) and calibration (calibration-in-the-large, calibration plots, calibration slope, calibration curve). With all this knowledge, we then discuss sample size considerations for model development and validation, and new software to implement sample size calculations.

Day three focuses on the need for model performance to be evaluated in new data to assess its generalisability, namely external validation. A framework for different types of external validation studies is provided, and the potential importance of model updating strategies (such as re-calibration techniques) are considered. Novel topics are then considered, including the use of pseudo-values to allow calibration curves in a survival model setting; the development and validation of models using large datasets (e.g. from e-health records) or multiple studies; the use of meta-analysis methods for summarising the performance of models across multiple studies or clusters; the role of net benefit and decision curve analysis to understand the potential role of a model for clinical decision making; and practical guidance about different ways in which prediction and prognostic models can be presented.

Ideally, participants should undertake the course live (9 am to 5 pm UK time), but all course material (e.g. lecture videos, computer practicals) will be made available 5 days in advance and for 2 weeks afterwards, to provide plenty of time and flexibility for participants to work through the material in their own time.