Coronavirus – data mining

WARNING. I am not a medical doctor nor an epidemiologist. The analysis I am sharing here is only for the data geeks around that are curious. Please follow advice of your national authorities and health system.
NOTE. This post was updated on 15/3.

The data | The data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering is available on GitHub.  To adjust mortality rates by local demographics, I have downloaded the population pyramid data from There are several estimates for age-dependent mortality. I was able to find only the following pre-print for mortality in Hubei compared to the rest of China. The dataset analysed was small, let me know if you find something better.

The software | In spare time, I prepared a bit of Matlab code that can import the JH data and does just two simple things: compare trends between different countries and compare age-adjusted moralities. The code is available on GitHub. Keep in mind, sorry to repeat, this is just for the curious geeks.


I summarize a few countries I checked. As I am Anglo-Italian, and with Italy and UK having adopted very different strategies to fight conoravirus, I developed this code to check trends between UK, Italy and Hubei. It is interesting that these three territories have similar population sizes but, until now, experienced the epidemics in different ways. Hubei got off-guard because it is the origin of the epidemics. Italy, together with South Korea and Iran got off-guard because they thought the coronavirus was somehow under control. UK might have done fewer mistakes so far and controlled the spread of the virus better and it has decided not to further contain the epidemics against WHO advice. Let’s see.

I synched the curved to a number of confirmed cases equal to 400. By chance, this is about the number of cases since we have data from Hubei, the day when Hubei went into lockdown, and a similar number when UK decided not to contain the virus.

Italy, just before and just after introduced first a local lockdown and then a national lockdown. Italy and Hubei seem to be on a similar trajectory of confirmed cases. For the UK is too early to say. We should keep in mind that confirmed cases depend on the methodologies of testing. Hubei’s and Italy’s health systems got overwhelmed, therefore it is possible at a certain point might have struggled to test the general population. The UK has decided to stop screening the general populations. Therefore, the reported deaths might be more realistic as numbers. At the time of writing, the JH dataset is one day behind, but we know that the UK is now in line with the other curves, and Italy is overshooting Hubei’s trajectory. Mortality rates are heavily affected by the reporting of confirmed cases. We will know the actual mortality rates only after epidemiologist will be able to do their statistical work retrospectively. More on this at the end of this post.

What about other countries? South Korea is interesting as they did not go into lock down but they also had a major outbreak. They were able to contain it by tracking those infected.

Assuming that Korea counted all covid-related deaths, their strategy was rewarded with a successful containment and lower deaths than other regions. Spain seems to be the EU country that will struggle next, let’ see…

Unfortunately, it seems that Spain is on the same trajectory compared to Italy and Hubei. But remember, Hubei succeed to contain the outbreak, which gives hopes. This and the experience in Korea is why WHO is still recommending to attempt containing the virus.

The same is true for France.

What about Germany?

For confirmed cases, Germany looks like on a similar trajectory. However, unless I did a mistake, the mortality rate seems very low. There are reports on the news that Germany considers as covid-related deaths only those patients who did not have other important related pathologies.

This, of course, would bias completely the curves we presented, but the situation in Germany might be not different form other countries. We’ll understand this in the future. Now a few comments on mortality rates. Initially, many of us were puzzled by the differences in mortalities between countries. There are several factors that influence these statistics: i) confirmed cases are underestimated in different ways in different countries because of testing capacity or policy. ii) covid-related deaths seem to be accounted similarly in many countries, except for Germany. iii) different countries have different demographics and iv) when a health system is strained both mortality might increase and confirmed cases decrease. All this considered, I just thought to give a reference for demographic adjustments.

I used mortality figures in Hubei and rest of China as to estimate the worst and best case scenario for an overwhelmed and a coping health system. The red and blue curves are these values adjusted by demographic differences in each country.

Then it seems that the high mortality in Italy is just demographics. Pay attention that these are cumulative statistics and, therefore, even if the situation improves massively like in Hubei, the mortality remains high because historically it was high. Thus, so far it looks like that only in Hubei and Italy the outbreaks arrived to the point to fully overwhelm the health systems. However, check the drift of the Italian curve, that is what might (hopefully not) happen in other countries that are on similar trajectory.

Keep in mind, I am no expert. I think however that there are two possibilities that explain this, and probably they both coexist. First, when ICU is overwhelmed, we rescue fewer people. Second, when a country is overwhelmed, there might be also fewer testing. So, plenty of limitations in this data (mortality rate data are no great, I am no expert, and several factors might explain the trends)… but at least there is some pattern that might indicate what is happening.

To conclude. Every country can still do what Hubei did. Not my work, but WHO’s. We need to protect the most vulnerable waiting for the vaccines and drugs that WILL come. Take care and find ways to keep positive and help people around you!

Author: Alessandro

Please visit my website to know more about me and my research

2 thoughts on “Coronavirus – data mining”

  1. Hi Ale, great visualizations. Spain and Germany curves in the last graph seem copy paste error though 🙂

Leave a Reply