Corona virus – data mining (v2)

WARNING. I am not a medical doctor nor an epidemiologist. The analysis I am sharing here is only for the data geeks around that are curious. Please follow advice of your national authorities and health system.

NOTE: For a more comprehensive blog post, you might be interested in Tomas Pueyo’s website. A good discussion about mortality rates can be found at CEBM website.

The trends

I have run a comparison between selected countries. I use Hubei (the Chinese province where Wuhan is located, in red) and Italy (in green) as references. The top plot in each figure shows confirmed cases and the middle plot shows covid-related deaths. Check the appendix for a discussion on the bottom plots (mortality rates). The JH data starts with Hubei on the day that the province went into lockdown. I will often comment on the comparison between the UK and Italy as I am a British-Italian dual-national and I started to follow the data to understand how to adapt to the situation.

Day 1 in these plots is relative to the start of Hubei tracking, which coincides with the day of Wuhan lockdown (red vertical line). The vertical green lines represent two key moments of the Italian response. The first is the local lockdown of towns where outbreaks in Northern Italy occurred, the second is the national lockdown. The vertical blue lines are key moments of the British response. The first one marks the day when the PM announced abandoning the containment strategy to pursue ‘herd immunity’, the second line marks the day when a UK nation-wide lockdown was implemented.

The comparison between Hubei, Italy and the UK is interesting as these are regions with similar total populations (~60M). Imported cases are identified at a similar time in both UK and Italy, but the two countries then follow quite a different trajectory. Until day 35, the UK seems capable to track and isolate cases, after which local outbreaks are evident from the start of a steep exponential (linear in this log-scale) growth. Instead, Italy experienced a sudden (apparent) outbreak at day 30. The difference between Italy and UK at this stage is that Italy shut flights from China (contrary to WHO indications) and kept testing only people with a declared travel history from China. This caused the coronavirus to spread undetected, particularly in the hospitals that did not trigger emergency procedures timely. It seems likely that the coronavirus was imported from Germany by people returning from China through Germany. However, it seems also likely that the spread was boosted by contacts between Germans and Italians, between strong productive regions, completely bypassing the Italian monitoring strategies, too focused on China. From the trends of detected cases in Italy, one could extrapolate that the local epidemic starts immediately and undetected as soon as the first imported cases are identified. The good British monitoring of the epidemic gives the UK two weeks of breathing time for preparing to the epidemics.

In my previous post, I mentioned about the puzzle of Germany exhibiting low mortality and some reports about Germany counting covid-related deaths differently from other countries. Differences in counting are probably real but I am now convinced that too much emphasis on this point is unadvisable. My impression is that there is a lot of talking about this to spread fake news aimed to construe a conspiracy theory about some countries hiding covid-cases to protect their economies. Personally, I wished someone would make clarity on how deaths are counted, but I am keen to interpret – at this point in time – data as more realistic that some people might consider. Probably, Germany did simply a better job than the UK and Italy and delayed the local epidemic of three weeks compared to the latter.

We can browse the plots of different western European countries and notice similar trends, with different delays caused by the probability to import cases and the capability of each country to track and contain imported cases. Amongst the countries I check, Spain is the outlier, with a steep rise of covid-related cases that has been broadly described in the media and not yet fully explained.

In South East Asia (I checked China, Singapore, South Korea and Japan), the trends are very different from Europe. We can argue that different political and societal systems permitted a better response. In general, we can state that they responded fast and proactively. You might notice second waves of infections. Japan, Singapore and South Korea did not enforce national lockdowns but are containing the epidemics with careful tracking and strong mitigation measures. Somehow it is what the UK wanted to do. However, the UK, in my opinion, acted too late and when they announced changing from containment to mitigation plans, it was too late to adopt a balanced strategy between public health and economy (the South East Asian model) and too late to fully contain the disease with low casualties. Therefore, western European countries will likely follow the Chinese trend, full containment followed – in due time – with the South East Asian response unless a cure or vaccine will be ready early.

Singapore is the outlier, however. We should consider that Singapore has only 10% of the inhabitant in Italy and Hubei (Japan twice, South Korea similar), it is, in fact, a high-density population city-state. The death rate is very low… I wonder how the situation will evolve and how the data is collected. Back to Europe, we should keep an eye on Sweden as – if I understood correctly – the Swidish government did what the UK wanted to do, i.e. mild mitigation measures.

Let’s conclude this overview with USA. We should keep in mind that in any country, the statistics are the sum of multiple local outbreaks. USA has >300M inhabitant split in 52 states. For the time being, I just check the overall trends. And the trends are simply alarming. Of course, I neglected many countries but you can freely use/adapt my Matlab code on GitHub to draw your own conclusions.

Next…. let’s synchronize the curves.

Contrary to my previous blog post, I am now synchronizing the curves on the death statistics, as they are more realistic, while confirmed covid-cases heavily depend on the capability of health systems to run tests across the non-hospitalized population. I used an arbitrary number of deaths, 40, sufficiently large to provide a robust synchronization, but sufficiently low to precede containment actions of governments that would, of course, change the shape of the curves. Most South-East Asia countries are not shown as they are containing the epidemics at the moment. With this data, now you can appreciate that the epidemic across European countries is very similar with the caveat that any type of synchronization of data is wrong, certainly if not done with proper models and study of local outbreaks. However, the general conclusions might not change.

Rolling back to the UK, Italy and Hubei, you can see that the British government announced the abandoning of the containment phase, exactly when Hubei did the opposite (in relative time). After initial errors, Italy acted fast but had to play a chase game with the outbreak. Their strategy was not working and Italy had to enforce a nation-wide lockdown that started about 10 days later in comparison to Hubei. A nation-wide lockdown might not have been necessary, but the lack of compliance of the general population, some doctors insisting that coronavirus was not worse than the flu and some political parties trying to score points rather than supporting a common strategy eventually required a national lockdown that now is finally having the desired effects. We can be still concerned, however, about possible second-waves of the epidemics also in the short terms because of waves of inner migration from North to South, people running away from the epicentre of the Italian outbreak, although the current lockdown might be able to quench new outbreaks. We can just wait and see for now. However, that delay and initial lack of cohesion will result in several thousand preventable deaths.

In the UK, the first stage of the epidemics is still unfolding. It is interesting to observe that about a week after the British government decided not to take action to reach ‘herd immunity’, the increase in deaths deviated from the original trend, slowing down. On the ground, we have noticed how a large majority of people, seeing what was happening in Italy, took action independently from government advise. Sport organizations cancelled events, Universities started to close, people increasingly worked from home and several families started to withdraw children from schools. In all effect, the British population started social distancing measures ahead of the Government that, eventually triggered a national lockdown approximately at the same (relative) time than Italy.

One issue that in Italy has been publicized but not in the UK is what I referred to as internal migration. The reason is simple. Paradoxically, in both countries this happened because people did not follow advice. In Italy, the Government asked not to travel from the afflicted regions. People did not follow advise and when the government was preparing the regional lockdown, an opposition party leaked the measure to the press. People panicked and travelled to their holiday homes and University students went back to their family, many travelling from North to South. In the UK, the Government did not ask to close University, but Universities did close and students were invited to go back to their families. This advice was sensible as we do not want to have students trapped in student accommodations. However, this happened without control and without a good sense of the status of the local epidemics. Therefore, while not broadly reported by the media because of how this happened, this large movement of people might contribute to the future dynamics of the UK epidemics, and possibly also in other countries.

It is just my opinion, not a scientific fact for now, that the UK wasted an incredible amount of time. While the NHS and the Government might have done everything right initially, all those efforts had been squandered. What I argue here is that no strategy is a good strategy in these circumstances. Coronavirus will kill a lot of people and cause damage to the economy. Adapting a strategy is also important, new information might require a new strategy. However, contrary to the story depicted by the Government, there was no change of science about the epidemics. A U-turn in policy can be costly as it might result in none of the positive outcomes desired with either one or the other strategy.

The only hope is that, those countries that were able to delay the onset of local epidemics compared to Italy and Iran, either by chance (low exposure) or by good management, were able to scale-up contingency plans and availability of specialist ICU beds to draw the final numbers of deaths as low as possible. Why this was not arranged at the onset of the epidemics is something we will have to analyse in the future.

Soon, we will speak a lot about USA. China has four times the population of USA. China was able to contain the disease (for now) and to avoid widespread diffusion across all China. Outbreaks after outbreaks, the trends from USA are accelerating and already overshooting both Hubei and China overall, with no sign of slowing down of the epidemics.

Concluding remarks

In the next sections, I describe the methods I used and I provide a discussion about mortality rates. However, I wished to conclude my post here stating that this might be the last time I post graphs, as by now there is so much data around and from people with a background in epidemiology. However critical I might be of certain political decisions, I would like to be very clear on the following. The individual risk to people is comparatively low. The large majority of the population has to fear more the socioeconomic repercussions of the pandemic rather than the disease in itself. The socioeconomic impacts of nationwide lock-downs will impact people’s health. Mental health will deteriorate. Health systems will be weakened and therefore more people will suffer even if not infected. Governments all over the world are trying to guess what is best to do. My advice is to follow guidance from your national medical and governmental authorities. It might seem I am contradicting myself. I am not. I simply acknowledge there is no good solution to the problem. However, governments should speak the truth and clarify why they take certain decisions, at least in democracies. Governments should also work together not against each other. Some of us were fearing a big war was brewing after the 2008 financial crisis but we were not expecting a pandemic. Now that we got the pandemic, I hope we do not get both, but that this situation is bringing all peoples of all nations together. At the moment there are both good and bad signs. In Italy, we have a saying: “La speranza e’ l’ultima a morire”. Hope is last to die. And we hope, we hope for more rainbows at the windows and fewer clouds at the horizon.

Take care, my friends.

The data | The data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering is available on GitHub.  To adjust mortality rates by local demographics, I have downloaded the population pyramid data from www.populationpyramid.net. There are several estimates for age-dependent mortality. I was able to find only the following pre-print for mortality in Hubei compared to the rest of China. The dataset analysed was small, let me know if you find something better.

The software | In spare time, I prepared a bit of Matlab code that can import the JH data and does just two simple things: compare trends between different countries and compare age-adjusted moralities. The code is available on GitHub. Keep in mind, sorry to repeat, this is just for the curious geeks.

Appendix – Mortality rates.

Let’s now discuss, briefly, mortality rates. This is a very complicated issue, not just for the non-experts like me. We will have better estimates only much later. At the moment, there is no evidence about the existence of multiple strains of CoV-SARS-2 exhibiting different aggressiveness. There are of course different strains, as viruses do constantly mutate, but the idea that some country is more affected than others because of different strains – at the moment- seems to be just a way to justify their own shortcomings. Of course, we will understand this later.

Let’s consider Washington state (not shown here) that resulted in a very high mortality. This was the result of very little testing of the general population, and a spike of deaths of elderly people arising from outbreaks in retirement homes. New York is on the opposite scale as it was a more ‘standard’ outbreak. We noticed mortality rates going from very high to very low and bouncing back. These are all artefacts of sampling.

I re-propose here some graphs comparing demographics. First, the age-adjusted mortality. The blue bar in China at ~1% is the mortality rate in China outside Hubei. This, in my opinion, is a good measure and backed up by epidemiological studies. WHO is setting this number at ~3.5% because at the moment they are just dividing confirmed deaths by confirmed cases, a number likely to be overestimated.

The blue bars for other countries are adjusted to the different demographics of individual regions. Italy has an older population compared to most other countries and, therefore, higher mortality is to be expected. There is some report suggesting that also men are more likely to die compared to women. Therefore, in the middle graph, I compare the demographics of Italy, UK and Germany to China. The red bars in the first graph are the mortality rates inferred from Hubei, thus in a situation where the health system is overwhelmed. As you can see, in Italy we can expect apparent mortality rates of almost 10%. Indeed, at the moment this value has been exceeded. However, here the keyword is ‘apparent’.

Therefore, while I might be still discussing COVID in this blog, I will probably stop speaking about mortality rates. Eventually, these numbers could be very low. Perhaps, we might have ‘just’ 0.5% of the population at risk of death, maybe 1% in countries with older demographics. However, keep in mind that like the WHO has always said, the issue is the overwhelming of the health systems. UK, Hubei and Italy have around 60M inhabitants. This means that a ‘do-nothing’ policy would result in 300-600k deaths in each country (40M world-wide). Or half of that if some sort of herd immunity would protect us. To put this in perspective, ~500-600k people die in the UK every year (60M world-wide).

Coronavirus – data mining

WARNING. I am not a medical doctor nor an epidemiologist. The analysis I am sharing here is only for the data geeks around that are curious. Please follow advice of your national authorities and health system.
NOTE. This post was updated on 15/3.

The data | The data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering is available on GitHub.  To adjust mortality rates by local demographics, I have downloaded the population pyramid data from www.populationpyramid.net. There are several estimates for age-dependent mortality. I was able to find only the following pre-print for mortality in Hubei compared to the rest of China. The dataset analysed was small, let me know if you find something better.

The software | In spare time, I prepared a bit of Matlab code that can import the JH data and does just two simple things: compare trends between different countries and compare age-adjusted moralities. The code is available on GitHub. Keep in mind, sorry to repeat, this is just for the curious geeks.

Trends

I summarize a few countries I checked. As I am Anglo-Italian, and with Italy and UK having adopted very different strategies to fight conoravirus, I developed this code to check trends between UK, Italy and Hubei. It is interesting that these three territories have similar population sizes but, until now, experienced the epidemics in different ways. Hubei got off-guard because it is the origin of the epidemics. Italy, together with South Korea and Iran got off-guard because they thought the coronavirus was somehow under control. UK might have done fewer mistakes so far and controlled the spread of the virus better and it has decided not to further contain the epidemics against WHO advice. Let’s see.

I synched the curved to a number of confirmed cases equal to 400. By chance, this is about the number of cases since we have data from Hubei, the day when Hubei went into lockdown, and a similar number when UK decided not to contain the virus.

Italy, just before and just after introduced first a local lockdown and then a national lockdown. Italy and Hubei seem to be on a similar trajectory of confirmed cases. For the UK is too early to say. We should keep in mind that confirmed cases depend on the methodologies of testing. Hubei’s and Italy’s health systems got overwhelmed, therefore it is possible at a certain point might have struggled to test the general population. The UK has decided to stop screening the general populations. Therefore, the reported deaths might be more realistic as numbers. At the time of writing, the JH dataset is one day behind, but we know that the UK is now in line with the other curves, and Italy is overshooting Hubei’s trajectory. Mortality rates are heavily affected by the reporting of confirmed cases. We will know the actual mortality rates only after epidemiologist will be able to do their statistical work retrospectively. More on this at the end of this post.

What about other countries? South Korea is interesting as they did not go into lock down but they also had a major outbreak. They were able to contain it by tracking those infected.

Assuming that Korea counted all covid-related deaths, their strategy was rewarded with a successful containment and lower deaths than other regions. Spain seems to be the EU country that will struggle next, let’ see…

Unfortunately, it seems that Spain is on the same trajectory compared to Italy and Hubei. But remember, Hubei succeed to contain the outbreak, which gives hopes. This and the experience in Korea is why WHO is still recommending to attempt containing the virus.

The same is true for France.

What about Germany?

For confirmed cases, Germany looks like on a similar trajectory. However, unless I did a mistake, the mortality rate seems very low. There are reports on the news that Germany considers as covid-related deaths only those patients who did not have other important related pathologies.

This, of course, would bias completely the curves we presented, but the situation in Germany might be not different form other countries. We’ll understand this in the future. Now a few comments on mortality rates. Initially, many of us were puzzled by the differences in mortalities between countries. There are several factors that influence these statistics: i) confirmed cases are underestimated in different ways in different countries because of testing capacity or policy. ii) covid-related deaths seem to be accounted similarly in many countries, except for Germany. iii) different countries have different demographics and iv) when a health system is strained both mortality might increase and confirmed cases decrease. All this considered, I just thought to give a reference for demographic adjustments.

I used mortality figures in Hubei and rest of China as to estimate the worst and best case scenario for an overwhelmed and a coping health system. The red and blue curves are these values adjusted by demographic differences in each country.

Then it seems that the high mortality in Italy is just demographics. Pay attention that these are cumulative statistics and, therefore, even if the situation improves massively like in Hubei, the mortality remains high because historically it was high. Thus, so far it looks like that only in Hubei and Italy the outbreaks arrived to the point to fully overwhelm the health systems. However, check the drift of the Italian curve, that is what might (hopefully not) happen in other countries that are on similar trajectory.

Keep in mind, I am no expert. I think however that there are two possibilities that explain this, and probably they both coexist. First, when ICU is overwhelmed, we rescue fewer people. Second, when a country is overwhelmed, there might be also fewer testing. So, plenty of limitations in this data (mortality rate data are no great, I am no expert, and several factors might explain the trends)… but at least there is some pattern that might indicate what is happening.

To conclude. Every country can still do what Hubei did. Not my work, but WHO’s. We need to protect the most vulnerable waiting for the vaccines and drugs that WILL come. Take care and find ways to keep positive and help people around you!

Signor Tenente (a song against mafia)

My holidays are spent with the nose into papers and the hands on the computer keyboard, working on quinquennial report. But I am back to my family in Italy, specifically in Sanremo, city of flowers, city of music, as it used to be the largest flower market and an important production center of flowers, and it hosts the most followed music festival in Italy. It is then not that surprising to walk in the streets and listen to music in the festive periods and in summer. Today, I got a break from work and went with my family to the main piazza of the town, where a group was singing various songs that contested the Sanremo Festival in the past.

The time came for “Signor Tenente” by Giorgio Faletti (1994), a song that was acclaimed by the critic and arrived second in the competition. A song that is musically flat, with a simple lyric, spoken rather than sung. A song that I had forgotten, but that is linked to an event I will never forget and changed me and many others in Italy, even very far from where it had happened.

In 1992, the prosecutor Giovanni Falcone was killed together with his wife Francesca Morvillo and three police officers in his security detail, Rocco Dicillo, Antonio Montinaro and Vito Schifani, when ‘Cosa Nostra’ blasted a segment of a motorway to kill his most feared enemy. Two months later, his friend and colleague Paolo Borsellino was killed with five police officers, Agostino Catalano, Walter Cosina, Emanuela Loi , Vincenzo Li Muli and Claudio Traina, by a car bomb while visiting his mother. Sanremo is a sea away from Sicily but in that tragic year we all felt Sicilians, raged against organized crime, close to the prosecutors, judges and the police forces – left alone by a political system that was about to be decimated by corruption scandals and that was in disarray.

“Signor Tenente” narrates that period from the point of view of the police (specifically Carabinieri) who, poorly paid and often in danger, do their duty while bombs kill.

These events might be difficult to understand outside Italy, or perhaps by the generation after mine. However, I wished to share with you, my friends, the feeling of pride I felt when, after a rendition of “Signor Tenente” finished, the square burst in a heart-felt applause, the warmest of the evening.

This is just a reminder that, in any country, most people are honest and good. There is time to criticize any authority, but there is also time to simply just thank, the police forces, the prosecutors, the justice system, and the people that in Italy and anywhere in the world fight injustice at great personal danger.