Choosing the best method for verbal autopsy (VA) requires the appropriate metrics to assess a given method’s performance, and researchers from IHME and the University of Queensland undertook a study to determine these metrics.

The study, Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies, shows that chance-corrected concordance to assess accuracy of individual cause of death assignment and cause-specific mortality fraction accuracy to assess the fractions of all deaths due to a specific cause in the population are the best metrics to use to test verbal autopsy methods.

Research objective

Because there are many different methods of performing VA, including physician review and computer-automated methods, there needs to be a way to evaluate the performance of each method. Currently used metrics to assess performance of VA methods, such as sensitivity, specificity, and cause-specific mortality fraction error, do not provide a robust basis for comparing methods. This study was designed to test metrics to compare the performance of all types of VA methods to help identify the best methods for estimating causes of death in a population.

Research findings

The authors show that VA methods need to be evaluated across a set of test datasets with widely varying cause-specific mortality fraction composition. They propose two metrics for assessing the performance of a given VA method. For assessing how well a method does at individual cause of death assignment, they recommend the average chance-corrected concordance. For testing how well a method does across a population, the authors recommend testing the accuracy of cause-specific mortality fractions. Performance of a VA method in estimating cause-specific mortality fraction by cause can be determined by examining the relationship between the estimated cause-specific mortality fraction calculated by a VA method and the true cause-specific mortality fraction in a population.

Analytical approach

The authors performed simple simulations of populations to demonstrate that most metrics used in VA validation studies are sensitive to the cause-specific mortality fraction composition of the dataset being used to test a VA method. They also conducted simulations to show that the composition of the test dataset must be varied, demonstrating that an inferior method for assessing VA can actually produce better results than a superior method on the basis of only one test dataset.

Policy implications

The metrics examined in this study are crucial for objective assessment of the range of VA methods currently in use. By utilizing the most robust metrics available, the best methods will be discovered in order to most accurately determine causes of death in populations without vital registration systems. These metrics will provide a basis of comparison so that different VA methods can be compared across different studies and in different populations. These standardized metrics will also facilitate innovation of new methods by providing a clear answer if a new method is leading to improved performance in assigning causes of death through VA.

Murray CJL, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD. Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Population Health Metrics. 2011; 9:28.