Modeling causes of death: an integrated approach using CODEm
Published January 6, 2012, in Population Health Metrics (opens in a new window)
Global health policymakers, advocates, and planners need to know the current magnitude of health problems and trends in these problems in order to best help populations in need. The study “Modeling causes of death: an integrated approach using CODEm” proposes five general principles for cause of death model development, validation, and reporting and details an analytical tool – the Cause of Death Ensemble model (CODEm) – that explores a large number of possible models to estimate trends in causes of death.
- Identify all the available data: Most cause of death data are found through national sources or the World Health Organization, and these sources can be supplemented by subnational studies on select causes or age groups from published literature.
- Maximize the comparability and quality of the dataset: In order to ensure all data are comparable and of high quality, researchers need to map across various revisions of the International Classification of Diseases, reclassify deaths assigned “garbage codes” or causes of death that are not the true causes, and correct for the completeness of death registration in vital registration systems that do not capture all deaths.
- Develop a diverse set of plausible models: While good modeling practice should cast a wide net in terms of proposed models, those chosen need to respect known biological or behavioral relationships (e.g., models for lung cancer should consider tobacco consumption). Hundreds or thousands of individual models need to be tested, ranging from simple linear covariate models to sophisticated spatial-temporal models. Then, these individual models are combined to produce robust ensemble models.
- Assess the predictive validity of each plausible individual model and of ensemble models: When data are sparse or missing, out-of-sample predictive validity is the most robust measure of prediction. Out-of-sample predictive validity is tested by running a model with some of the data removed, and then checking the performance of the model at predicting the data that were removed.
- Choose the model or ensemble model with the best performance in the out-of-sample predictive validity tests: Choosing the best model requires balancing different performance attributes.
Foreman KJ, Lozano R, Lopez AD, Murray CJL. Modeling causes of death: an integrated approach using CODEm. Population Health Metrics. 2012; 10:1.