Objective of this article:
- To show the difference between the “real numbers” and the reported numbers
- To highlight why we should be doing randomized testing
- To remind you that it doesn’t make a difference in how we manage this global pandemic
There are a lot of numbers and charts out there. A lot! Some of them look scary. Some of them give us optimism. Some of the charts just plain look cool. But what’s the point of all these numbers? To inform us and drive us to take the right actions.
Sneak peak of the findings:
- Unless we test everybody, or start doing randomized testing on the general population, we are never going to know the “real numbers”. Given the scarcity of tests currently available, our only option is really randomized testing.
- The claim that “our numbers are higher because we’re testing more” isn’t really true. A given region can do a better job than expected at testing the population, but that population (the community) can do a worse job than expected in terms of spreading the virus.
- The mortality rate is the only “real number”. Anyone with coronavirus symptoms who is in critical condition is very likely to get properly tested. The problem with this hard number is that it tells us the rate of the virus in the community two or more weeks ago … not what it is right now.
- It doesn’t matter anyway. What matters is having fewer cases each day in every region. Each of us must do everything what we can to reduce the transmission of the virus in the community.
Chart 1: The relationship between confirmed cases per million and deaths per million

One of the most real numbers we have is “deaths”, and as shown there is a correlation between confirmed cases per million and deaths per million. But the context is relevant too … British Columbia has had a larger proportion of outbreaks in long-term care homes (19 around Vancouver), and as a result it drives the mortality rate up. A 1% mortality rate is considered to be “average” for regions that have fared well. Ontario, Alberta, Saskatchewan and Manitoba have mortality rates of 1.1% to 1.5%. The mortality rate for British Columbia is 2.4% as of the most recent data.
Confirmed Cases versus “Real” cases
In the previous article we showed the number of cases per million, and highlighted how Canada and the US are at a crossroads between a worst case and a best case scenario.
Chart 2: Confirmed cases per million since the date of the 100th confirmed case

A problem with confirmed cases is that it only represents the cases that were actually tested. What about all of the coronavirus cases that are out there in the community that aren’t tested yet?
The impact of different testing approaches
I’m going to show you three examples of how different testing approaches can skew the numbers. Imagine we had a hypothetical village of 100 people with a similar pattern to coronavirus (30% with no symptoms, 55% with mild to moderate symptoms and the rest with severe symptoms), and imagine that we (somehow) knew that 10% were infected. The village would look like the following visual.
Graphic 1: 100 residents, 10% “true” infection rate

I’m showing this twice … on the left side (organized) everything is organized so that it’s easy to count. The right side is what it’s probably like in the community (randomly spread).
This is a trivially small village and maybe we tested all 100 residents. But, what would the situation look like if we didn’t test everybody? We would test some of the individuals who had symptoms and were close to people with symptoms.
Graphic 2: 100 residents, unknown infection rate with testing, 25% testing coverage

Again, the left side (organized) shows the 100 people in the village, nicely organized so that you we can count them. You’ll see that there are now diamond shapes which represent the 25 villagers that we tested. The circles are the ones that have not been tested (but we secretly know who has the virus and who doesn’t).
In this demonstrative example there was just 1 confirmed case out of the 25 residents who were tested, which means the infection rate would be 4%. The infection rates (of those who are tested) that are reported in Canada range between 2% and 6%.
The right side (randomly spread) shows how we might test the villagers who happened to be close to the one positive test we identified.
The above testing would conclude that there are 1 confirmed case for 100 residents, so a 1 in 100 rate. But we secretly know that the true rate in the community is 1 in 10. We are underestimating by 90% this scenario.
Graphic 3: 100 residents, 25% testing coverage focused on those with symptoms

In this demonstrative example all 7 of the cases with symptoms were tested. As a result there were 7 confirmed case out of 25 tested, which means the infection rate would be 28%, which is about 5 times higher than what we’re seeing in the data. The above testing would conclude that there are 7 confirmed case for 100 residents, so a 7% rate in the community. But we still secretly know that the true rate in the community is 10%. We are still underestimating by 30% in this scenario.
Graphic 4: 100 residents, randomized testing

In this final demonstrative example, we’ve randomly selected 2 out of 10 people and tested them. Out of 100 people in the community we tested 20 of them. 2 of the tests came back as positive, for an sample-based infection rate of 10%. Because we did a random selection we know that we can scale up the numbers to the full population of the village, meaning we would not calculate 2 over 100 residents, we would calculate 2 over 20 samples = 10% and then we can accurately project that there must be 8 more cases in the community that have not been tested.
Take away message: We can’t know the real numbers without randomized testing
If we want to have the “real numbers” we have two options:
Option 1: Test everybody, which is impossible given the scarcity of tests
Option 2: Design a randomized test of the general population
There must be a relationship between testing and confirmed cases!
There isn’t great data out there on the number of tests that are performed, but recently Canada has started reporting the number of tests in each province. As of April 1, 2020 the number of tests per 1,000 population ranges from 3.5 to 11.1, and higher for the remote provinces with smaller populations. South Korea is at about 7.9 tests per 1,000 population, and they are considered to have tested in a substantial manner.
Chart 3: Tests per 1,000 population by province in Canada

If there was a relationship between the number of tests per population and the confirmed cases per population, then we should be able to see a correlation in the following chart.
Chart 4: Relationship between tests per population and confirmed cases per population

Looking at the chart, it doesn’t appear to show a correlation between these two factors. But some of the outliers to the right are very small populations which might be distracting us from the bigger story.
Chart 5: Relationship between tests per population and confirmed cases per population, by size of population

This helps us see what we should be focusing on. If we zoom into the range of 0 to 12 tests per 1,000 population we see the start of what looks like a correlation.
Chart 6: Relationship between tests per population and confirmed cases per population, by size of population

But, it’s definitely not the type of correlation that you would see in a statistics textbook. Why is that? There are two factors at play here:
Factor 1: How a given region approaches testing. Do they test randomly? Do they test above average or below average amounts per capita? Do they save their tests for specific situations? Do they not waste their tests on individuals where it doesn’t make a difference in the management?
Factor 2: The ability of a region to contain the spread. How seriously do they approach containment? Do they follow the guidelines? Did they act early?
Chart 7: Relationship between tests per population and confirmed cases per population, by size of population, highlighted

The above chart shows a blue region that is subjectively chosen as the “normal” relationship of tests per population and confirmed cases per population. Ideally this blue region would be based on comparable data from around the world, but such a dataset does not yet exist, so we do what we can with our small handful of data points.
Specific provinces have been highlighted. Quebec stands out as having a much higher number of confirmed cases per population, in comparison to the rest of the provinces. It would be difficult to explain the high numbers as the cause of them doing much higher testing per population. Alberta stands out as having a remarkably high number of tests per population, while still showing a relatively low number of confirmed cases per population. The same could be said for Manitoba.
The only “real number” is deaths
The hard truth of the numbers is in the mortality rate. We’ve seen that the general mortality rate in the regions that have managed this for a longer period is approximately 1%. We’ve also seen situations like Spain, Italy and Hubei where the demand on the healthcare system far surpassed it’s capacity, and as a result had mortality rates of 5% to 10%.
Chart 8: Relationship between confirmed cases per population, and deaths per population

As shown in this chart there is a relationship between the confirmed cases per million population and the deaths per million population. We can see that even though Quebec had higher than expected confirmed cases per population in Chart 7, they are showing in the “normal” range just below the 1% mortality rate. There are some smaller provinces such as New Brunswick and Nova Scotia that are so early in the process that they have not reported any deaths yet. Ontario has a 1.5% mortality rate as of this data, whereas Manitoba, Saskatchewan and Alberta have a 1.1% to 1.2% mortality rate. British Columbia is showing as having a much higher mortality rate at 2.4% … this may just be reflective of how the virus impact long term care homes, with elderly residents. There is a clear relationship between age and mortality.
The “real numbers” don’t change our actions
Data is not reality. It is a representation of reality. The point of data is to guide us to the right decisions and the right actions. Even if we don’t have perfect data we can hopefully all agree that:
- It’s a good thing if any of our numbers (confirmed cases, deaths) reduce from one day to the next.
- If we’re “doing well” then we should all do what we can to keep it that way … which means following the social distancing guidelines.
- If we’re not doing well in a given region, then each of us needs to do everything to help bend the curve, including following the guidelines to the letter.
I’m inclined to agree with the following statement:
“If it were possible to wave a magic wand and make all Americans freeze in place for 14 days while sitting six feet apart, epidemiologists say, the whole epidemic would sputter to a halt.”
Main take away messages
These are the main things that I learned from carrying out this analysis:
- There is a relationship between tests and confirmed cases … if you do more testing, you do get more confirmed cases … but it’s not as simple as that. Cases can be higher than expected, but have a “normal” mortality rate. Cases can be lower than expected even if there are high levels of testing provided that the region follows the recommended social distancing guidelines.
- We should really consider doing randomized testing of the general population if we want to get at the real numbers. A little bit of testing with some basic statistical design would go a long way.
- Mortality is a “hard number”, but even this metric must be taken in context. Virus outbreaks in long term care homes are likely to result in above-expected mortality rates.
- With a bit of concerted effort over a short period of time the growth rates can be reduced. This situation doesn’t get any easier for any of us if we drag it out.
Data sources:
Much of the data that we need is publicly available on github. The following link includes daily data of confirmed cases for each country in the past several weeks:
https://github.com/datasets/covid-19/blob/master/data/time-series-19-covid-combined.csv
Please keep in mind that each day it shows the numbers that were reported across the world as of the previous day.
In addition, the Epidemiological Summary from the government of Canada has provided much of the data related to testing: