Determining causes of death: How we reclassify miscoded deaths

Published July 23, 2018

Photo Credit: Utah Death Certificate Index, 1905-1966

As it turns out, knowing what someone died of can be complicated. We often talk and think about death as a singular event. We say, “he died of cancer” or “she died of old age.” In reality, a series of domino effects are often occurring inside the body that lead to someone’s death. 

In places with functioning vital registration systems, a death certificate is filled out every time someone dies, in an attempt to capture this entire causal chain. This death certificate includes demographic information about the deceased individual, the chain of events that led to the death, and the underlying cause that directly resulted in the death of that individual.  

For example, in the death certificate below, the immediate cause of death was acute renal failure. But when we look back at the chain of events, we see that this individual developed diabetes 15 years ago, which led to hyperosmolar nonketotic coma, a fancy term for a dramatic rise in blood sugar levels, which then led to the renal failure. The underlying cause of death for this individual was actually diabetes

Credit: CDC, Instructions for Completing the Cause-of-Death Section of the Death Certificate, 2014.

This last piece – the underlying cause of death – is what GBD researchers use to produce mortality statistics. Understanding the underlying cause of death in a population allows public health officials to develop interventions that target the root cause.    

But picking out the underlying cause of death can be difficult. Many countries in the world do not have functioning vital registration systems, and even for those countries who do, there are often high levels of misclassification and vague reporting of causes of death on death certificates. 

For example, in some countries, when people die at home, the ambulance driver fills out the death certificate when they come to pick up the body, which can lead to high levels of inaccuracy in vital registration data. Having physicians fill out death certificates is ideal since it raises the quality of the data. 

Making sense of the data

For the GBD, this is where “garbage codes” come in. A garbage code refers to anything that is marked as a cause of death on a death certificate that cannot officially kill you. These official causes of death are listed in the International Classification of Diseases (ICD). The ICD is a standardized system for the coding and classification of cause-of-death data that was adopted in 1893 precisely to help standardize causes of death around the world. It has since evolved to include 14,000 diagnoses in its present-day form, the tenth edition (ICD-10). 

Back pain is an excellent example in this case. Back pain is potentially a sign or symptom of an underlying cause of death, but it cannot kill you in its own right. However, it routinely shows up on death certificates, and so does acne, anxiety, stomach pain, or a headache.

Other times, death certificates list an “intermediate” cause of death, such as heart failure. Heart failure can be the result of many different diseases, such as diabetes, ischemic heart disease, or hypertensive heart disease. It is therefore a mode of dying rather than a cause of dying. For example, Person X and Person Y might both have died from heart failure, but Person X had coronary heart disease that led to their heart failure, while Person Y had atrial fibrillation (irregular heartbeat) that led to their heart failure. If only heart failure is listed on the death certificates, we have no concrete idea of what caused these individuals’ deaths. 

GBD researchers track the percent of deaths with garbage coding by country and by year (for more information, see Table 1 in this GBD 2016 paper). In 2014, for example, the percent of deaths with garbage coding ranged from a low of 2% to a high of 52%. Sources with more than 50% garbage codes are dropped entirely from our cause-of-death analysis due to doubts about the overall quality of the data source.

At IHME, we handle garbage codes by re-assigning them to the most probable underlying cause of death. These probable causes of death are identified based on the literature, expert opinion, ICD rules, and knowledge about the disease. The redistribution itself is carried out through statistical models and algorithms for each age-sex-location-year group. For example, deaths coded as an unspecified road traffic accident are distributed to pedestrian, bicyclist, and vehicle driver deaths. This redistribution of garbage codes can have a big effect on what we say people are dying from.

This graph shows an example of how one garbage code – unspecified road traffic accidents – is proportionally reassigned across specific causes of death through garbage code redistribution. The Y axis shows the number of deaths, and the x axis shows the cause.

Case study: "unintentional unspecified poisoning”

One example of the impact of reclassification comes from looking at deaths recorded with ICD codes X40-X44, which are "unintentional unspecified poisoning.” In GBD 2015, researchers noticed that more than 97% of these poisonings by substance or drug occurred in ages 15–65, making it unlikely that these were cases of accidental ingestion of substances.  

Using multiple-cause-of-death data from the US, Australia, and Brazil, researchers at IHME analyzed death records where the principal cause of death was listed as unintentional unspecified poisoning but another cause in the chain of deaths that was noted on the death certificate, such as drug or alcohol use, was also listed. They calculated the fraction of these unspecified poisoning deaths that overlapped with alcohol use disorders, unintentional poisonings, and drug abuse disorders, and proportionally redistributed these mis-assigned unintentional poisoning deaths.

As it turns out, many of these unintentional unspecified poisoning deaths were redistributed to opioid drug overdoses. 

Redistribution is also done for a variety of other garbage codes, including ill-defined external causes of injury, unspecified stroke, heart failure, and hypertension. 

Finalizing the list

At the end of this redistribution process, our cause-of-death data should more accurately portray the actual ways that people are dying across the world as estimated by GBD methods. A more accurate understanding of what people are dying from allows us to better target planning of health interventions and programs to improve quality of health. 

Now, onto less morbid things . . . our next post is about what we do after we’ve evaluated and cleaned the data. 


This post is part of the IHME Foundations series, which discusses some of the core aspects of IHME’s work while exploring along the way everything from how you manage over 50 databases with more than 39 billion rows (and what that even means) to how you help governments in Central America evaluate the impact of their health programs. Join us for the whole series here, or on Medium.