Corona virus – data mining (v2)

WARNING. I am not a medical doctor nor an epidemiologist. The analysis I am sharing here is only for the data geeks around that are curious. Please follow advice of your national authorities and health system.

NOTE: For a more comprehensive blog post, you might be interested in Tomas Pueyo’s website. A good discussion about mortality rates can be found at CEBM website.

The trends

I have run a comparison between selected countries. I use Hubei (the Chinese province where Wuhan is located, in red) and Italy (in green) as references. The top plot in each figure shows confirmed cases and the middle plot shows covid-related deaths. Check the appendix for a discussion on the bottom plots (mortality rates). The JH data starts with Hubei on the day that the province went into lockdown. I will often comment on the comparison between the UK and Italy as I am a British-Italian dual-national and I started to follow the data to understand how to adapt to the situation.

Day 1 in these plots is relative to the start of Hubei tracking, which coincides with the day of Wuhan lockdown (red vertical line). The vertical green lines represent two key moments of the Italian response. The first is the local lockdown of towns where outbreaks in Northern Italy occurred, the second is the national lockdown. The vertical blue lines are key moments of the British response. The first one marks the day when the PM announced abandoning the containment strategy to pursue ‘herd immunity’, the second line marks the day when a UK nation-wide lockdown was implemented.

The comparison between Hubei, Italy and the UK is interesting as these are regions with similar total populations (~60M). Imported cases are identified at a similar time in both UK and Italy, but the two countries then follow quite a different trajectory. Until day 35, the UK seems capable to track and isolate cases, after which local outbreaks are evident from the start of a steep exponential (linear in this log-scale) growth. Instead, Italy experienced a sudden (apparent) outbreak at day 30. The difference between Italy and UK at this stage is that Italy shut flights from China (contrary to WHO indications) and kept testing only people with a declared travel history from China. This caused the coronavirus to spread undetected, particularly in the hospitals that did not trigger emergency procedures timely. It seems likely that the coronavirus was imported from Germany by people returning from China through Germany. However, it seems also likely that the spread was boosted by contacts between Germans and Italians, between strong productive regions, completely bypassing the Italian monitoring strategies, too focused on China. From the trends of detected cases in Italy, one could extrapolate that the local epidemic starts immediately and undetected as soon as the first imported cases are identified. The good British monitoring of the epidemic gives the UK two weeks of breathing time for preparing to the epidemics.

In my previous post, I mentioned about the puzzle of Germany exhibiting low mortality and some reports about Germany counting covid-related deaths differently from other countries. Differences in counting are probably real but I am now convinced that too much emphasis on this point is unadvisable. My impression is that there is a lot of talking about this to spread fake news aimed to construe a conspiracy theory about some countries hiding covid-cases to protect their economies. Personally, I wished someone would make clarity on how deaths are counted, but I am keen to interpret – at this point in time – data as more realistic that some people might consider. Probably, Germany did simply a better job than the UK and Italy and delayed the local epidemic of three weeks compared to the latter.

We can browse the plots of different western European countries and notice similar trends, with different delays caused by the probability to import cases and the capability of each country to track and contain imported cases. Amongst the countries I check, Spain is the outlier, with a steep rise of covid-related cases that has been broadly described in the media and not yet fully explained.

In South East Asia (I checked China, Singapore, South Korea and Japan), the trends are very different from Europe. We can argue that different political and societal systems permitted a better response. In general, we can state that they responded fast and proactively. You might notice second waves of infections. Japan, Singapore and South Korea did not enforce national lockdowns but are containing the epidemics with careful tracking and strong mitigation measures. Somehow it is what the UK wanted to do. However, the UK, in my opinion, acted too late and when they announced changing from containment to mitigation plans, it was too late to adopt a balanced strategy between public health and economy (the South East Asian model) and too late to fully contain the disease with low casualties. Therefore, western European countries will likely follow the Chinese trend, full containment followed – in due time – with the South East Asian response unless a cure or vaccine will be ready early.

Singapore is the outlier, however. We should consider that Singapore has only 10% of the inhabitant in Italy and Hubei (Japan twice, South Korea similar), it is, in fact, a high-density population city-state. The death rate is very low… I wonder how the situation will evolve and how the data is collected. Back to Europe, we should keep an eye on Sweden as – if I understood correctly – the Swidish government did what the UK wanted to do, i.e. mild mitigation measures.

Let’s conclude this overview with USA. We should keep in mind that in any country, the statistics are the sum of multiple local outbreaks. USA has >300M inhabitant split in 52 states. For the time being, I just check the overall trends. And the trends are simply alarming. Of course, I neglected many countries but you can freely use/adapt my Matlab code on GitHub to draw your own conclusions.

Next…. let’s synchronize the curves.

Contrary to my previous blog post, I am now synchronizing the curves on the death statistics, as they are more realistic, while confirmed covid-cases heavily depend on the capability of health systems to run tests across the non-hospitalized population. I used an arbitrary number of deaths, 40, sufficiently large to provide a robust synchronization, but sufficiently low to precede containment actions of governments that would, of course, change the shape of the curves. Most South-East Asia countries are not shown as they are containing the epidemics at the moment. With this data, now you can appreciate that the epidemic across European countries is very similar with the caveat that any type of synchronization of data is wrong, certainly if not done with proper models and study of local outbreaks. However, the general conclusions might not change.

Rolling back to the UK, Italy and Hubei, you can see that the British government announced the abandoning of the containment phase, exactly when Hubei did the opposite (in relative time). After initial errors, Italy acted fast but had to play a chase game with the outbreak. Their strategy was not working and Italy had to enforce a nation-wide lockdown that started about 10 days later in comparison to Hubei. A nation-wide lockdown might not have been necessary, but the lack of compliance of the general population, some doctors insisting that coronavirus was not worse than the flu and some political parties trying to score points rather than supporting a common strategy eventually required a national lockdown that now is finally having the desired effects. We can be still concerned, however, about possible second-waves of the epidemics also in the short terms because of waves of inner migration from North to South, people running away from the epicentre of the Italian outbreak, although the current lockdown might be able to quench new outbreaks. We can just wait and see for now. However, that delay and initial lack of cohesion will result in several thousand preventable deaths.

In the UK, the first stage of the epidemics is still unfolding. It is interesting to observe that about a week after the British government decided not to take action to reach ‘herd immunity’, the increase in deaths deviated from the original trend, slowing down. On the ground, we have noticed how a large majority of people, seeing what was happening in Italy, took action independently from government advise. Sport organizations cancelled events, Universities started to close, people increasingly worked from home and several families started to withdraw children from schools. In all effect, the British population started social distancing measures ahead of the Government that, eventually triggered a national lockdown approximately at the same (relative) time than Italy.

One issue that in Italy has been publicized but not in the UK is what I referred to as internal migration. The reason is simple. Paradoxically, in both countries this happened because people did not follow advice. In Italy, the Government asked not to travel from the afflicted regions. People did not follow advise and when the government was preparing the regional lockdown, an opposition party leaked the measure to the press. People panicked and travelled to their holiday homes and University students went back to their family, many travelling from North to South. In the UK, the Government did not ask to close University, but Universities did close and students were invited to go back to their families. This advice was sensible as we do not want to have students trapped in student accommodations. However, this happened without control and without a good sense of the status of the local epidemics. Therefore, while not broadly reported by the media because of how this happened, this large movement of people might contribute to the future dynamics of the UK epidemics, and possibly also in other countries.

It is just my opinion, not a scientific fact for now, that the UK wasted an incredible amount of time. While the NHS and the Government might have done everything right initially, all those efforts had been squandered. What I argue here is that no strategy is a good strategy in these circumstances. Coronavirus will kill a lot of people and cause damage to the economy. Adapting a strategy is also important, new information might require a new strategy. However, contrary to the story depicted by the Government, there was no change of science about the epidemics. A U-turn in policy can be costly as it might result in none of the positive outcomes desired with either one or the other strategy.

The only hope is that, those countries that were able to delay the onset of local epidemics compared to Italy and Iran, either by chance (low exposure) or by good management, were able to scale-up contingency plans and availability of specialist ICU beds to draw the final numbers of deaths as low as possible. Why this was not arranged at the onset of the epidemics is something we will have to analyse in the future.

Soon, we will speak a lot about USA. China has four times the population of USA. China was able to contain the disease (for now) and to avoid widespread diffusion across all China. Outbreaks after outbreaks, the trends from USA are accelerating and already overshooting both Hubei and China overall, with no sign of slowing down of the epidemics.

Concluding remarks

In the next sections, I describe the methods I used and I provide a discussion about mortality rates. However, I wished to conclude my post here stating that this might be the last time I post graphs, as by now there is so much data around and from people with a background in epidemiology. However critical I might be of certain political decisions, I would like to be very clear on the following. The individual risk to people is comparatively low. The large majority of the population has to fear more the socioeconomic repercussions of the pandemic rather than the disease in itself. The socioeconomic impacts of nationwide lock-downs will impact people’s health. Mental health will deteriorate. Health systems will be weakened and therefore more people will suffer even if not infected. Governments all over the world are trying to guess what is best to do. My advice is to follow guidance from your national medical and governmental authorities. It might seem I am contradicting myself. I am not. I simply acknowledge there is no good solution to the problem. However, governments should speak the truth and clarify why they take certain decisions, at least in democracies. Governments should also work together not against each other. Some of us were fearing a big war was brewing after the 2008 financial crisis but we were not expecting a pandemic. Now that we got the pandemic, I hope we do not get both, but that this situation is bringing all peoples of all nations together. At the moment there are both good and bad signs. In Italy, we have a saying: “La speranza e’ l’ultima a morire”. Hope is last to die. And we hope, we hope for more rainbows at the windows and fewer clouds at the horizon.

Take care, my friends.

The data | The data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering is available on GitHub. To adjust mortality rates by local demographics, I have downloaded the population pyramid data from www.populationpyramid.net. There are several estimates for age-dependent mortality. I was able to find only the following pre-print for mortality in Hubei compared to the rest of China. The dataset analysed was small, let me know if you find something better.

The software | In spare time, I prepared a bit of Matlab code that can import the JH data and does just two simple things: compare trends between different countries and compare age-adjusted moralities. The code is available on GitHub. Keep in mind, sorry to repeat, this is just for the curious geeks.

Appendix – Mortality rates.

Let’s now discuss, briefly, mortality rates. This is a very complicated issue, not just for the non-experts like me. We will have better estimates only much later. At the moment, there is no evidence about the existence of multiple strains of CoV-SARS-2 exhibiting different aggressiveness. There are of course different strains, as viruses do constantly mutate, but the idea that some country is more affected than others because of different strains – at the moment- seems to be just a way to justify their own shortcomings. Of course, we will understand this later.

Let’s consider Washington state (not shown here) that resulted in a very high mortality. This was the result of very little testing of the general population, and a spike of deaths of elderly people arising from outbreaks in retirement homes. New York is on the opposite scale as it was a more ‘standard’ outbreak. We noticed mortality rates going from very high to very low and bouncing back. These are all artefacts of sampling.

I re-propose here some graphs comparing demographics. First, the age-adjusted mortality. The blue bar in China at ~1% is the mortality rate in China outside Hubei. This, in my opinion, is a good measure and backed up by epidemiological studies. WHO is setting this number at ~3.5% because at the moment they are just dividing confirmed deaths by confirmed cases, a number likely to be overestimated.

The blue bars for other countries are adjusted to the different demographics of individual regions. Italy has an older population compared to most other countries and, therefore, higher mortality is to be expected. There is some report suggesting that also men are more likely to die compared to women. Therefore, in the middle graph, I compare the demographics of Italy, UK and Germany to China. The red bars in the first graph are the mortality rates inferred from Hubei, thus in a situation where the health system is overwhelmed. As you can see, in Italy we can expect apparent mortality rates of almost 10%. Indeed, at the moment this value has been exceeded. However, here the keyword is ‘apparent’.

Therefore, while I might be still discussing COVID in this blog, I will probably stop speaking about mortality rates. Eventually, these numbers could be very low. Perhaps, we might have ‘just’ 0.5% of the population at risk of death, maybe 1% in countries with older demographics. However, keep in mind that like the WHO has always said, the issue is the overwhelming of the health systems. UK, Hubei and Italy have around 60M inhabitants. This means that a ‘do-nothing’ policy would result in 300-600k deaths in each country (40M world-wide). Or half of that if some sort of herd immunity would protect us. To put this in perspective, ~500-600k people die in the UK every year (60M world-wide).

Coronavirus – data mining

Trends

I summarize a few countries I checked. As I am Anglo-Italian, and with Italy and UK having adopted very different strategies to fight conoravirus, I developed this code to check trends between UK, Italy and Hubei. It is interesting that these three territories have similar population sizes but, until now, experienced the epidemics in different ways. Hubei got off-guard because it is the origin of the epidemics. Italy, together with South Korea and Iran got off-guard because they thought the coronavirus was somehow under control. UK might have done fewer mistakes so far and controlled the spread of the virus better and it has decided not to further contain the epidemics against WHO advice. Let’s see.

I synched the curved to a number of confirmed cases equal to 400. By chance, this is about the number of cases since we have data from Hubei, the day when Hubei went into lockdown, and a similar number when UK decided not to contain the virus.

Italy, just before and just after introduced first a local lockdown and then a national lockdown. Italy and Hubei seem to be on a similar trajectory of confirmed cases. For the UK is too early to say. We should keep in mind that confirmed cases depend on the methodologies of testing. Hubei’s and Italy’s health systems got overwhelmed, therefore it is possible at a certain point might have struggled to test the general population. The UK has decided to stop screening the general populations. Therefore, the reported deaths might be more realistic as numbers. At the time of writing, the JH dataset is one day behind, but we know that the UK is now in line with the other curves, and Italy is overshooting Hubei’s trajectory. Mortality rates are heavily affected by the reporting of confirmed cases. We will know the actual mortality rates only after epidemiologist will be able to do their statistical work retrospectively. More on this at the end of this post.

What about other countries? South Korea is interesting as they did not go into lock down but they also had a major outbreak. They were able to contain it by tracking those infected.

Assuming that Korea counted all covid-related deaths, their strategy was rewarded with a successful containment and lower deaths than other regions. Spain seems to be the EU country that will struggle next, let’ see…

Unfortunately, it seems that Spain is on the same trajectory compared to Italy and Hubei. But remember, Hubei succeed to contain the outbreak, which gives hopes. This and the experience in Korea is why WHO is still recommending to attempt containing the virus.

The same is true for France.

What about Germany?

For confirmed cases, Germany looks like on a similar trajectory. However, unless I did a mistake, the mortality rate seems very low. There are reports on the news that Germany considers as covid-related deaths only those patients who did not have other important related pathologies.

This, of course, would bias completely the curves we presented, but the situation in Germany might be not different form other countries. We’ll understand this in the future. Now a few comments on mortality rates. Initially, many of us were puzzled by the differences in mortalities between countries. There are several factors that influence these statistics: i) confirmed cases are underestimated in different ways in different countries because of testing capacity or policy. ii) covid-related deaths seem to be accounted similarly in many countries, except for Germany. iii) different countries have different demographics and iv) when a health system is strained both mortality might increase and confirmed cases decrease. All this considered, I just thought to give a reference for demographic adjustments.

I used mortality figures in Hubei and rest of China as to estimate the worst and best case scenario for an overwhelmed and a coping health system. The red and blue curves are these values adjusted by demographic differences in each country.

Then it seems that the high mortality in Italy is just demographics. Pay attention that these are cumulative statistics and, therefore, even if the situation improves massively like in Hubei, the mortality remains high because historically it was high. Thus, so far it looks like that only in Hubei and Italy the outbreaks arrived to the point to fully overwhelm the health systems. However, check the drift of the Italian curve, that is what might (hopefully not) happen in other countries that are on similar trajectory.

Keep in mind, I am no expert. I think however that there are two possibilities that explain this, and probably they both coexist. First, when ICU is overwhelmed, we rescue fewer people. Second, when a country is overwhelmed, there might be also fewer testing. So, plenty of limitations in this data (mortality rate data are no great, I am no expert, and several factors might explain the trends)… but at least there is some pattern that might indicate what is happening.

To conclude. Every country can still do what Hubei did. Not my work, but WHO’s. We need to protect the most vulnerable waiting for the vaccines and drugs that WILL come. Take care and find ways to keep positive and help people around you!

Publishing: a business transaction

Until not so long time ago, desk-rejections (the editor decision not to proceed with peer-review of a submitted manuscript) or even rejections of a manuscript after peer-review with very little substance for that decision, could get me angry, at least in private. These emotions can motivate to do better, but most of the time – if we try too hard to get published in very selective journals – they can take a toll.

After speaking to several editors, I tried to focus on the fact that most of us (editors, authors, referees – sometimes the same people wearing different hats) are good and well-motivated people. That did not work. The sense of unfairness outranks that thought.

I tried to not care, and that did not work either. Until…

***
I believe that the large majority of scientists and editors do their job also for a clear vocation, to advance human knowledge for the benefit of society. For this reason, we often invest a lot more in our jobs that we should, emotionally and time-wise. This is why it might be difficult to have a detached view of what publishing is nowadays. Let’s make an effort together, watching the problem as a scientific one, analyze it, reducing its complexity to its components and mechanisms.

***
If you have a donkey and you want to sell the donkey, you go to the market. You might first go to a trader who pays very well as they have good contacts with wealthy farmers. However, they may not like your donkey even if you dropped the price. They do not like your donkey, why do you want to sell your donkey to them? Then you go to a different trader, they like your donkey and you settle for a fair price. But if your donkey is very old, you might come back to your farm with your old donkey. Perhaps you need the money and you get frustrated, maybe even angry, but which is the point? Business is business and the trader is simply doing their job.

Wait… what? D-D-D-Donkeys?

***
When you submit a paper to a journal, you try to initiate a business transaction. The editor is an expert trader, highly invested in their business and committed to maintaining their operations, legitimately, financially sustainable and profitable. The author trades-in two commodities, their manuscript and their reputation, and – additionally – pays a lump of money for the service. In return, the editor provides two commodities, their readership and their reputation, and – additionally – provides editorial services. I will perhaps elaborate in the future on the traded commodities and services, but for now, I keep this post to the bare essential.

The editor-trader first judges the quality of the product you want to trade-in. They are entitled to act discretionally applying their in-depth knowledge of their business to assess if they are about to initiate a potentially good deal. Can your donkey carry weight? Er, I mean, can your paper attract many citations and media coverage? If they do not want to do business with you, it is not a matter of fairness, even not of science, certainly nothing personal. It is the author’s responsibility to make their business pitch, and it is the editor’s responsibility to not lose good assets or not invest in bad ones.

***
If I read what I have just written ten years ago, I would have recoiled in disgust. Then I expect many scientists being horrified by what I have written and perhaps editors offended. I hope this is not the case, but if it happened, please let me clarify one point.

We (authors and editors) do what we do to advance human knowledge for the benefit of society. Boiling down everything to a mere business transaction feels perhaps bad. However, let’s keep in mind that scientific publishing is business. If it has to be or not, it is the subject of a different post and to the analysis of the nature of the commodities and services we trade.

For now, I just wished to share with you the trick I use to cope with the stress of rejections, particularly desk-rejections. That part of our job is just a business transaction. This thought helps me a bit more than anything else I tried before.

Lost in translation (dogma and science)

Once in a while I hear or read about dogmas as if they were models. I came to realize that some people might not be aware what a dogma is and before the (mis)use of this word spread even further, I hope you will agree to get it back into its original meaning.

The Oxford dictionaries define dogma as “A belief or set of beliefs held by a group or organization that others are expected to accept without argument”. Other dictionaries report similar definitions, but the Merriam-Webster also include the slightly softer “Something held as an established opinion, especially a definite authoritative tenet”. Many dictionaries also report to religious doctrines. Therefore, dogma can’t be used as synonym of model or hypothesis, particularly in science. Of course, most people are still using the word dogma correctly even in science, to refer to a model that has become established fact despite no, weak or even erroneously interpreted evidence for it.

I suspected that most of the damage has been caused by Francis Crick when he has introduced the “Central Dogma ” of molecular biology. Let’s be clear, I do not want to be pedantic and I care very little about semantics, but the correct use of the words dogma, hypothesis, model, theory, is rather important in science. There are instances when these four words might be interchanged but we should – I hope – all agree that dogma is to be used only with a negative connotation (in science).

I assume you know what the central dogma is but if you do not, the Wikipedia page is good enough to get an understanding. In lectures during the late 50s, Francis Crick stated that “Once information has got into a protein it can’t get out again” and named this statement “The Central Dogma”. Apparently the name was a bit of a joke, as it appears evident from the famous document stored by the Wellcome Library. The initial paragraph was entitled “The Doctrine of the Triad”, a clear reference to DNA, RNA and proteins with a rather obvious analogy to the Christian doctrine of the Trinity.

I must admit I did not read Crick’s autobiography, but it is well known that there, he writes that “I called this idea the central dogma, for two reasons, I suspect. I had already used the obvious word hypothesis in the sequence hypothesis, and in addition I wanted to suggest that this new assumption was more central and more powerful.” and “As it turned out, the use of the word dogma caused almost more trouble than it was worth. Many years later Jacques Monod pointed out to me that I did not appear to understand the correct use of the word dogma, which is a belief that cannot be doubted. I did apprehend this in a vague sort of way but since I thought that all religious beliefs were without foundation, I used the word the way I myself thought about it, not as most of the world does, and simply applied it to a grand hypothesis that, however plausible, had little direct experimental support.”

Then, I asked a friend who lived those times if perhaps the word dogma was used slightly differently in the past and I got this brilliant response: “A dogma in science is a fanatic intrusion into rational thought. When a big name in science makes a joke, accolades of small names taking it seriously are sure to follow… A model becoming a dogma is ready for the bin. Never had a dogma crossing my path.”

Well, nowadays I do see dogmas crossing my path but never mind, that is a different story. For the young students who might read the “central dogma” in text books and then adopt the term “dogma” as equivalent to model or hypothesis then, just two suggestions.

First, a scientists should be always skeptical and doubt about anything. It is unavoidable that sets of established facts, sometimes even wrong, become generally accepted in a discipline and crystallize into a real dogma that no one challenge. However, it is our duty to challenge any interpretation, any model, whenever is conflicting with evidence.

Second, let’s reserve the word dogma (in science) to critically identify established believes with insufficient or contradictory experimental evidence, or perhaps for jokes…

Ironically, the “Central Dogma” was a very good hypothesis.

It is yellow, the two proteins must interact!

In fluorescence microscopy, colocalization is the spatial correlation between two different fluorescent labels. Often, we tag two proteins in a cell with distinct fluorescent labels, and we look if and where the staining localizes. When there is a “significant overlap” between the two signals we say that the two molecules “colocalize” and we might use this observation as possible evidence for a “functional association”. We might argue that measuring colocalization in microscopy is one of the simplest quantitation we can do. Yet, many horror stories surround colocalization measurements. This post is not a review of how to do colocalization, but a brief casual discussion about a few common controversies that is – as often I do – aimed to junior scientists.

***

“I am imaging GFP, but the image is blue, can you help me?”. Well, this is not a question related to colocalization but it illustrates a fundamental issue. In truth, cell biology is such an inherent multidisciplinary science that – in most cases – a researcher might require the use of tens of different techniques on a weekly basis. It is thus not surprising that many researchers (I dare say most) will be an expert on some of the techniques they use but not all. Microscopy is particularly tricky. To be a true expert, you need to handle a feast of physical, engineering and mathematical knowledge alongside experimental techniques that might span chemistry, cell culture and genetic engineering. However, the wonderful commercial systems we have available permit us to get a pretty picture of a cell with just a click of a button. Here the tricky bit, you want to study a cell, you get a picture of a cell. One is lead to confusing the quantity that intends to measure with the information that is actually gathering and with its representation. This is true for any analytical technique but as ‘seeing is believing’, imaging might misrepresent scientific truth in very convincing ways. Hence, with no doubts that upon reflection the non-expert user would have understood why the picture on the screen was ‘blue’, the initial temptation was to believe the picture.

Question what you set out to measure, what the assay you have setup is actually measuring and what the representation is showing. Trivial? Not really. It is an exercise we explicitly do in my lab when we have difficulties to interpret data.

***

“It is yellow, they colocalize, right?”. Weeeeeeeeellll… may be, may be not. Most of you will be familiar with this case. Often researchers acquire two images of the same sample, the pictures of two fluorescent labels, one then is represented in green and the other in red. With an overlay of the red and green channels, pixels that are bright in both colours will appear yellow. I would not say that this approach is inherently flawed but we can certainly state that it is misused most of the times and, therefore, I try to discourage its use. One issue is that colour-blindness, not as rare as people think, renders this representation impractical for many colleagues (so my colour highlights!), but even people with perfect vision will see colours with lower contrast than grey-scale representations, and green more than red. Eventually, to ‘see yellow’ is almost unavoidable to boost the brightness of the underlying two colours to make the colocalization signal visible. This can be done either during the acquisition of the image often saturating the signal (bad, saturated pixels carry very little and often misleading information) or during post-processing (not necessarily bad, if declared and properly done). Either way, at the point you are doing this, your goal to be quantitative has been probably missed. The truth is that a lot of biological work is non-quantitative but faux-quantitative representations or statistics are demanded by the broader community even when unnecessary. Let’s consider one example with one of the stains being tubulin and the other a protein of interest (PoI). Let’s assume the PoI is localizing at nicely distinguishable microtubules in a few independent experiments. Once the specificity of the stain is confirmed, the PoI can be considered localized at the microtubules (within the limitations of the assay performed) without the need for statistics or overlays. Unfortunately, it is not very rare to see papers, also after peer-review, to show diffuse stainings of at least one of the PoI and perhaps a more localised stain of the second PoI and a ‘yellow’ signal emerging from an overlay is considered colocalization, instead of what it is: just noise. Another common issue is localization in vesicles. Again, any cytoplasmic PoI would appear to colocalize with most organelles and structures within the cytoplasm with diffraction-limited techniques. Sometimes punctuated stainings might partially overlap with known properly marked vesicles, let’s say lysosomes, but not all. Then the issue is to prove that, at least, the overlap is not random and, therefore, statistics in the form of correlation coefficients are necessary.

***

“The two proteins do not colocalise, two molecules cannot occupy the same volume” Really!? Well, from a quantum mechanics standpoint…. No, do not worry, I am not going there. I have received that criticism during peer-review in the past and until recently I thought this was a one-off case. However, I have recently realised that I was not the only person reading that statement. I am really uncertain why a colleague would feel the need to make such an obvious statement except for that condescending one-third of the community. I should clarify that to my knowledge no one implies physical impossibilities with the term colocalization. That statement is perfectly ok in a casual discussion or to make a point to teach beginners the basics. Some of us also might enjoy discussing definitions, philosophical aspects related to science, controversial (real or perceived) aspects of techniques, but better at a conference or in front of a beer, rather than during peer-review. The issue here is that while it is reasonable to criticise certain sloppy and not too uncommon colocalization studies, in general colocalization can be informative when properly done.

***

“So, is measuring colocalization useful?” Homework. Replace ‘colocalization’ with your preferred technique. Done? Now try to make the same positive effort for colocalization. Every technique is useful when used properly.

You might have noticed I marked some words in my introduction: colocalize, significant overlap and functional association. It is important we understand what we mean with those words. Colocalization means co-occurrence at the same structure, a non-trivial correlation between the localization of two molecules of interest, within the limits defined by the resolution of the instrumentation. The “significant overlap” should be really replaced by “non-trivial correlation”. Non-trivial, as diffuse stainings, unspecific stainings, saturated images can very easily result in meaningless colocalization of the signals but not of the molecules of interest. Correlation, as the concept of overlap might be improper in certain assays, for instance in some studies based on super-resolution microscopy. After we did everything properly, we still cannot say that if protein A and protein B colocalize they interact (see slide). However, we can use colocalization to disprove the direct interaction of two proteins (if they are not in the same place, they do not interact) and we can use high-quality colocalization data to suggest a possible functional association that might be not a direct interaction, and that should be then proven with additional functional assays.

Then, my friends, do make good use of colocalization as one of the many tools you have in your laboratory toolbox but beware that just because it is simple to acquire two colourful pretty pictures, there are many common errors that people do when acquire, analyse and interpret colocalization data.

P.S.: if I cited your question or statement, please do not take it personally. As I have written, not everyone can be an expert of everything and the discussion between experts and non-experts is very useful, so making real-life anonymous examples.

Is the average between a cat and a dog a real animal?

Image credit: Pixabay License. Free for commercial use. No attribution require

Is it a cat? Is it a dog? Is the average between a cat and a dog a real thing, perhaps a caog or a doat?

Not all science should be based on single cell detection, and there are plenty of cases where single cell measurements are superfluous. However, too often we fail to appreciate the huge mistakes we can do in biology when we forget the assumptions we do when using population measurements.

But which assumptions do we really do?

Often implicitly, when doing population measurements (e.g., Western blots, sequencing, proteomics, etc…) we assume that populations of cells we measure are homogeneous and synchronous. Or at least we assume that these differences are unimportant and that they can be averaged out. In the best cases, we try to enforce a degree of synchronicity and homogeneity, experimentally. In reality, one of the most important assumptions we implicitly do is that the system we analyse is an ergodic system. In physics and statistics, an ergodic system is a system that, given a sufficiently long time, explore all its possible states. It is also a system where – if sufficiently sampled – all its states are explored and, consequently, averages over time on a single cell and averages over a population at a given time are the same. However, there are limits to this assumption in biology. The obvious example is the cell cycle. There is significant literature about ergodicity and cell cycle [e.g., 1, 2, 3] and how this principle can be exploited, but…

The lottery for cell division makes you grow faster.

There is a particular phenomenon that we encountered while we were working on this project [4] that fascinated me for its simplicity and consequences. How cells can increase their fitness (i.e. their growth rate)? One obvious answer is by dividing faster. Another, at first glance less obvious answer, is by exhibiting an heterogeneous cell cycle length. Let’s consider a population of cells that divides every 24 hours. Over one week, these cells will have 128 times the original population size. Now, let’s consider cells that divide on average every 24 hours but exhibit variation in cell cycle length, randomly, with a standard deviation of 4 hours and a normal distribution. Cells with 20 hours or 28 hours long cell cycle are equally probable to occur. However, in one week, cells with a 28 hours long cell cycle length will grow 64 times and cells with a 20 hours long cell cycle length will grow about 380 times. On average, these cells will grow ~200 times, that is much faster than cells dividing precisely every 24 hours (128 times). This is true for any pair drawn at equal distance from the two sides of the average; these pairs are equiprobable, thus cells dividing at a given average cell cycle length grow faster at increasing heterogeneity. Let’s remember that this can occur not just in the presence of genetic differences, but even just for stochastic variations where the progeny of one cell will not keep the same cell cycle length but will keep randomly changing according to an underlying distribution. This is a phenomenon that has been observed experimentally, for instance, in yeast [5] with single-cell measurements but that is occurring in any cellular systems as described in [1] and our own work [4]. Population measurements might conceal these very important phenotypic or mechanistic differences.

The sum of two normal distributions is not another normal distribution.

The beauty of the normal distribution is that it is such a ‘well behaved’ distribution and, at the same time, it represents many physical and biological phenomena. If a population we are characterizing is made of two normal distributions, their average is the average of the normal distribution. If these have the same average, the variance of the sum will be the sum of the variances. These basic and useful mathematical relationships can be also rather misleading. In fact, while these statements are mathematically correct, two populations of cells that ‘behave rather differently’, for instance in response to a drug, cannot be averaged. For instance, one cell population might be killed with a given concentration of a drug. Another population might be resistant. By detecting 50% cell death, we could assume – incorrectly – that dosing at higher concentrations we could kill more cells.

The plot shown below illustrates this basic principle. The blue and red distributions, averaged together, exhibit the same variance and average of the yellow distribution but they represent very different systems. If the blue distribution represents the sizes of cats and the red distribution the sizes of dogs, the yellow distribution does not represent the size distribution of any real animals. In other words, the average phenotype is not a real phenotype and, in the best case scenario, when there is a dominant population, it represents the most frequent (the mode) phenotype. In all other cases, where the homogeneity of the phenotype is not checked, the average phenotype might be simply wrong.

gaussians

This is a very simple illustration of a problem we frequently encounter in biology, trusting our population measurements (averages and standard deviations over experimental repeats) without being sure of the distributions underlying our measurements. In the figure above, the purple distribution is a distribution where the average is the correct average of the blue and red distribution, but the purple distribution is the statistical error of the assay and it is unrelated to the scatter of the biological phenomenon we are measuring. Sometimes, we cannot do anything to address this problem experimentally because of the limitations of technologies but it is very important – at least – to be aware of these issues.

Just for the most curious, I should clarify that for two Gaussian distributions with relative weights A and B, we can define a mixing parameter p=A/(A+B). The average of the mixed population will be simply μP=p*μA+(1-p)*μB, i.e. for p=0.5 is the average of the means. The apparent variance is σP^2 = p*σA^2+(1-p)*σB^2+p(1-p)*(μA-μB)^2, i.e. σP^2 is the average of the variances summed to the squared separation of the two averages weighed by the geometrical averages of the mixing parameters of the two populations.

Collective behaviour of cells is not an average behaviour, quite the contrary.

When discussing these issues, I am often confronted with the statement that we eventually do not care about the behaviour of individual cells but with the collective behaviour of groups of cells. There are two important implications to discuss. First of all, when arguing the importance of single-cell measurements, we do not argue the importance of studying individual cells in isolation. Quite the contrary, we should measure individual cells in model systems the closest to the physiological state. However, many assays are incompatible with the study of cell behaviour within humans and we resort to a number of model systems: individual cells separated from each other, 2D and 3D cultures, ex and in vivo assays. The two arguments (single cell measurements or measurements in more physiological model systems of tissues or organisms) are not the same.

Second, collective behaviours are not ‘average behaviours’. There are great examples in the literature but I would advise just even to visit the websites of two laboratories that I personally admire. They nicely and visually illustrate this point, John Albeck’s laboratory at UC Davis and Kazuhiro Aoki’s laboratory at NIBB. Collective behaviours emerge from the interaction of cells in space and time as illustrated by waves of signalling or metabolic activities caused by cell-to-cell communication in response to stimuli. The complex behaviours that interacting cells exhibit, even just in 2D cultures, can be understood when single cells and their biochemistry are visualized individually. Once again, phenotypes or their mechanism might be concealed or misinterpreted by population or snapshot measurements.

This is, of course, not always the case. However, my advice is to keep at least in mind the assumptions we do when we perform an ensemble or a snapshot measurement and, whenever possible, to check they are valid.

At FLIM impact (Episode I)

What has been the impact of fluorescence lifetime imaging microscopy to science and to the biomedical community in particular? Is FLIM a niche technique, one of those techniques that always promise but never deliver?

The top 10 most cited papers

Excluding reviews, the list of the top 10 most cited papers, albeit representing a very narrow window on the impact that FLIM had on the broader community, is rather instructive. Do consider, we are missing all those papers where FLIM was used but not cited in title or abstract. Most of the top 10 is made of applications to cell biochemistry, demonstrating the potential and the impact that fluorescence lifetime has. FLIM helped to understand how signalling work in living cells and animals, helped to identify drugs and to study disease. Some of the top cited papers are more technical, such as Digman’s paper on the phasor-transform or Becker’s paper on TCSPC widely cited because of their influence on contemporary FLIM techniques from a perspective of data analysis and technology. Other papers date back to the initial years of FLIM with applications to biochemistry. Overall, from this list, we understand (if more evidence was needed) that FLIM had a deep impact on the understanding of cell biochemistry albeit, historically, FLIM has been limited to the specialist laboratory.

I would like to highlight also another small observation, perhaps just interesting for the specialists, and not visible from other bibliometric analyses. Tom Jovin and a group of scientists trained by him (e.g., Dorus Gadella and Philippe Bastiaens) left a significant footprint in the field, directly driving biomedical relevant applications while pushing, at the same time, technological or methodological developments. Many others are linked to this ‘school’ directly or indirectly, scientists that use/develop a microscope to do biochemistry.

Mapping temperature within cells using fluorescent polymers by Okabe and colleagues (2012) from Uchiyama’s laboratory and published in Nature Communications, where FLIM was used to map temperature within cells using fluorescent polymers as temperature sensors. (442)
Phasor analysis by Michelle Digman and colleagues, from the laboratory of Enrico Gratton (2008) published by Biophysical Journal. The phasor-based analysis, in different flavours, has become quite popular nowadays. (406)
An in vivo FLIM-based analysis of calcium dynamics in astrocytes by Kuchibhotla and colleagues from Bacskai’s laboratory (2009) published in Science. (353)
The study of Calmodulin-dependent kinase II activity in dendritic spines by Lee and colleagues from Yasuda’s laboratory (2009) published in Nature. (351)
One of the first FLIM papers by Lackowicz, published in 1992 in PNAS, where they applied the methodology, yet to be fully established, to the study of free and bound NADH. (339)
One of the first biochemical applications of FLIM, where Gadella and Jovin applied the new tools to the study of EGFR oligomerization (1995), published in the Journal of Cell Biology. (323)
A 2004 paper, where Becker and colleagues present the TCPSC instrumentation that would become a commercial success, published in Microscopy Research and Technique. (321)
The application of FLIM and molecular motors to study viscosity of the cellular environment by Marina Kuimova and colleagues, from the laboratory of Klaus Suhling published on JACS in 2008. (319)
The development of a drug interfering with the interaction between KRAS and PDEdelta published Zimmermann and colleagues with the laboratory of Philippe Bastiaens and published by Nature in 2013. (291)
The interaction between PKC and integrin shown by Ng an colleagues from Parker’s laboratory in 1999 by the EMBO Journal. (277)

Methodology

Tool: Web of Science

Search term: “FLIM” and “fluorescence lifetime imaging microscopy”

Filter: Article

Note: FLiM is a component of the flagella motor and it shows up in the searches. I could not eliminate this ‘false positive’ but it is my assumption that it is not changing the following discussion.

Citations (in parenthesis) as in April 2019.

Any bibliometric analysis is very limited in scope, certainly this very narrow search. This is just a blog post, one observation done just to trigger a discussion for those curious people about the topic.

Snap opinion on deep-learning for super-resolution and denoising

I am personally conflicted on this topic. I have recently started to work on machine learning and deep-learning specifically. Therefore, I am keen to explore the usefulness of these technologies, and I hope they will remove bottlenecks from our assays.

My knowledge about CNNs is rather limited, even more so for SR and denoising applications. My first opinion was not very positive. After all, if you do not trust a fellow scientist guessing objects from noisy or undersampled data, why should you trust a piece of software? That appeared to be also the response of many colleagues.

After the machine learning session at FoM, I partially changed opinion, and I am posting this brief -very naïve – opinion after a thread of messages I read on twitter by colleagues. Conceptually, I always thought of machine learning as ‘guessing’ the image, but suddenly I realise that CNNs are perhaps learning a prior or a set of possible priors.

I have mentioned in a previous post about the work by Toraldo di Francia on resolving power and information, often cited by Alberto Diaspro in talks. Di Francia, in his paper, states “The degrees of freedom of an image formed by any real instrument are only a finite number, while those of the object are an infinite number. Several different objects may correspond to the same image. It is shown that in the case of coherent illumination a large class of objects corresponding to a given image can be found very easily. Two-point resolution is impossible unless the observer has a priori an infinite amount of information about the object.”

Are CNNs for image restoration and denoising learning the prior? If so, issues about possible artefacts might be not put aside but at least handled a bit better conceptually by me. The problem would then shift to understand which priors a network is learning and how robust these sets are to typical variations of biological samples.

Great talks today at FoM. Eventually, we will need to have tools to assess the likelihood that an image represents the ground-truth and some simple visual representation that explain what a CNN is doing to a specific image that is restored and ensure good practise. Nothing too different from other techniques, but I feel it is better to deal with these issues earlier rather than later in order to build confidence in the community.

What is life?

Preamble

In this assay, I describe reflections on biological systems and the nature of life. If you do not know it, the title is an obvious reference to the famous Schrodinger’s assay that motivated many physicists to create the branch of science that is Biophysics. In its current stage, these words are not written with the intent to be precise or complete, but to guide my own thoughts in the understanding – however superficial – of which are the general principles, opposite to specific molecular mechanisms, that drive biological processes and that are more likely to help us in the understanding of human physiology and pathological states. I will often express trivial observations from which, perhaps, less trivial considerations may be built upon.

From disorder to self

A living organism is an active chemical system, one that is constituted by an identifiable ensemble of molecules that manifest cooperative behaviour. For life to be observed, it has to be identifiable. As trivial as this observation is, identity is a founding character of the chemical systems we call life.

Life, as we know it, is based on the basic chemical unit that we call ‘cell’. The boundary of the cell is defined by lipids, amphipathic molecules that are made of a portion that likes water and another portion that does not. A basic characteristic of amphipathic molecules is their capability of spontaneous self-assembly. The polar, water-liking, head of lipids will try to contact water molecules, whilst the water-repelling tails will associate with each other trying to exclude water molecules, like oil in a glass of water.

Local reduction of disorder (or entropy as often we call it) is a feature of living systems. Reduction of entropy can occur only as consequence of irreversible chemical reactions that convert energy to order the local environment, like a person burning calories to tidy up their home, that in the process generate waste that they dump into the external environment. However, self-assembling systems are spontaneous and reversible processes that increase order at no entropic cost to the environment. The best example for this are colloidal suspensions. If large spheres are mixed with smaller spheres, large spheres will start to organize in ordered structures. In fact, around the large sphere, there is a volume of solvent that is inaccessible to the small sphere (because of steric hindrance). The entropy of a system of a mixture of large and small sphere is higher when there is a level of organization in the large spheres. Another example of such process is the spontaneous ordering of polymers. Even without amphipathic properties, there are molecules such as polymers that can interfere with the internal bonding of water (water molecules like each other) and are therefore driven into ordered structures. The net effect of this process, driven by so called entropic forces, is to maximize the entropy overall, primarily increasing the entropy of the solvent by maximizing the number of states available to water molecules; at the same time, polymers or other macromolecules are driven into ordered, low entropy, structures.

Therefore, given the right conditions of solvent and molecules, a spontaneous process of self-organization will drive compartmentalization of chemical systems. It is very important to stress that one of the first fundamental steps that initiate life, i.e. establishing an identity through a boundary between the self and the environment, is a thermodynamically favoured process that occurs at no entropic cost to the environment. This step results in the maximization of the internal entropy but, counter-intuitively at first impression, generates order and define a self at the same time.

Such compartmentalized systems do give raise to special environments where chemical reactions can occur at high efficiencies. Furthermore, the constant supply of energy provided by the sun, can readily start catalysis in these compartmentalized systems and drive these systems far from the equilibrium. Of the many compartmentalized systems that can naturally and spontaneously occur, however, only those that will be stable in time will be able to evolve into nowadays ‘living systems’.

From transient to stable

What we call a living cell requires the preservation of its own identity for long enough to give birth to life as we recognize it—a living organism must exhibits mechanisms to ensure its own integrity over time: the integrity of its boundary and the integrity of an active internal chemical system. Both require the existence of favourable conditions such as suitable operational windows of temperature, pressure, pH, etc. All systems incapable of maintaining integrity within a range of conditions that may occur in time extinguish themselves as soon as environmental factors change even slightly. It appears, therefore, that the maintenance of integrity over time necessitate that a primordial biochemical system exhibits some sort of “thermodynamic resilience”, i.e. its chemistry can operate and its identity maintained in face of environmental challenges. We can easily postulate that any primordial biochemical system was simple in nature and manifested comparatively little resilience to environmental changes. Reversible processes of self-organization would be constantly counter-acted by more energetic stochastic events inducing a relentless process of creation and destruction of such thermodynamic systems. Natural selection forged life from the very beginning, enabling only those systems capable of increased “thermodynamic resilience” to survive.

There is experimental evidence that simple self-assembling systems such as lipid vesicles can grow by uptake of other lipids from the environment and trigger fission in smaller vesicles spontaneously. The propensity to fusion and fission events of the early proto-cell may have represented both a challenge and an opportunity to the evolution towards an early system. Fission and fusion can be seen as a challenge to thermodynamic resilience as the identity and composition of this proto-cell is extinguished when fusion and fission occurs unregulated. At the same time, simple mechanisms that would minimize fusion and regulate fission, for instance specific composition of proto-cellular membranes, under the empowering thrust of natural selection would lead to the emergence of a ‘replicator phenotype’. This replicator phenotype could be thermodynamically favoured in specific environmental conditions and, hypothetically, better supported by simple internal chemistry that would favour a stable process of fission.

During this phase, inheritance of characters could be only of one kind, structural inheritance. Composition of amphipathic chains of specific types may favour self-assembly with other chains of the same type and stochastically divide into multiple “daughter” entities that constituted by the same elements, different ever so slightly by chance, would be still favouring the same self-assembly processing to occur, but randomly accommodating variation in the composition of the boundary and inner content. At the same time, irreversible reactions stabilizing this process may emerge by increasing efficiency in the utilization of energy to maintain structural integrity.

Once that the cycle of relentless creation and destruction is replaced by a cycle of relentless replication, natural selection will favour the optimal “thermodynamic resilience” for a given environment.

From random to self-governed

In face of environmental changes a primordial system to survive into a life form will require adaptation of its active chemistry supporting different chemistry active on different operational windows. Gains in thermodynamic resilience may occur by: i) stabilizing thermodynamic variables (e.g., temperature, pressure, volume) within optimal windows (homeostasis), ii) migrating to a different environment (taxis) or iii) adapting the internal chemistry to the different conditions (through evolution or, on shorter time scales, by allostasis).

Homeostasis is defined as the capability of a system to maintain certain parameters nearly constant. For instance, the human body is kept at around 37°C where cell biochemistry operates optimally. Homeostasis is the incarnation of thermodynamic resilience, where this is ensured by an active process of self-government, another founding property of life. There are many different terms to define this property of biological systems (e.g., homeorhesis and allostasis), but homeostasis is the one encompassing all of them. For instance, allostasis is the process by which a system maintain its homeostasis through a change. Let’s consider a simple biochemical system where its internal chemistry depends on pH. In order to be thermodynamically resilient, this system will require the capability to buffer pH either by chemical composition or by utilizing regulated proton pumps that will ensure a stable pH. However, when these mechanisms are insufficient to guarantee pH stability, a system can shut down those internal machineries that generate variations of pH as a by-product. A partial loss of efficiency (read as fitness) in such a system will however guarantee maintenance of other fundamental reactions and, therefore, overall fitness.

Another possibility is to engage in taxis, the active movement towards a more permissive environment, for instance searching for those conditions of nutrients, temperature, pH, etcetera where a system operates optimally. Taxis is another property shared by all animate being. Either plants trough growth, or other organisms through migration, all organisms are capable to sense the environment and trigger movements to seek for their optimal environment where their metabolism will operate more efficiently. For instance, a plant will adapt its growth to seek for sunlight, bacteria sense gradients of chemicals to find food and higher organisms migrates to places where abundance of food and water, and environmental conditions are suitable for them.

Homeostasis and taxis are manifestations of self-government that can be established only once that a biochemical system acquires the capability to process information from its internal and the external environments to execute specific actions to ensure thermodynamic resilience.

Conclusive remarks

I am no expert in any of the topic I discussed here. However, I have the impression that thermodynamics aspects of life are often far too emphasized. The question is not if life contradicts the second law of thermodynamics (it does not), the question is how much the second law can teach us about living beings. Often, a connection with entropy, its evolution towards higher values, is seen as a necessary link to justify the spontaneous occurrence of life. Even if this was true, does it matter, if a much simpler justification is available in the process of natural selection as a fundamental law of Nature? And even if natural selection could be justified with a thermodynamics description, would this help us to understand life, or to resolve the many afflictions that living beings are cursed with?

I firmly believe that research in thermodynamics of self-assembly systems and the role of entropic forces in biology are essential to a better understanding of life. However, the question about why life has evolved and if this conflicts with thermodynamics has been already addressed. It seems, sometimes, there is a conflict between the laws of physics and biological mechanisms, but – of course – there is none. Life is a complex phenomenon, the emergent property of a highly compartmentalized ensemble of chemically active molecules that abide (again, obviously) the basic laws of physics, but which description as a system, may not be properly described by thermodynamics. This is why systems biology is branch of biophysics that was born for this purpose exactly.

The most elemental aspects of life are identity as an active biochemical system (i.e., its compartmentalized metabolism), its capability to maintain integrity (i.e., its homeostasis, its capability to replicate) in face of environmental pressures and to be autonomous (i.e., the capability to process information and undertake decision).

For those of us working on a disease such as cancer, it is thus unsurprising that cancer is intimately linked to deregulation of all of these fundamental characteristics of life.

References

I’ve written this assay in different periods reading literature beyond the scope of my own research. Therefore, I cannot reference my text properly, but these are the material I’ve read and may be of interest to you.

Devies et al. (2013) “Self-organization and entropy reduction in a living cell” Biosystems

Spanner (1953) “Biological systems and the principle of minimum entropy production” Nature

Prigogine (1971) “Biological order, structure and instabilities” Quarterly Reviews of Biophysics

England (2015) “Dissipative adaptation in driven self-assembly” Nature Nanotechnology

Frenkel (2014) “Order through entropy” Nature Materials

Schrodinger (1944) “What is life?”

Yodh et al. (2001) “Entropically driven self-assembly and interaction in suspension” Phil. Trans. R. Soc. Lond. A

Bray (1990) “lntracellular Signalling as a Parallel Distributed Process” J Theor Biol

Bray (1995) “Protein molecules as computational elements in living cells” Nature

Bray (2003) “Molecular Networks: The Top-Down View” Science

Mc Ewen and Wingfield (2003) “The concept of allostasis in biology and biomedicine” Hormones and Behaviour

Ray and Phoha “Homeostasis and Homeorhesis: Sustaining Order and Normalcy in Human-engineered Complex Systems”

Sterling “Principles of allostasis: optimal design, predictive regulation, pathophysiology and rational therapeutics” in “Allostasis, Homeostasis, and the Costs of Adaptation” by J. Schulkin

Berclaz et al. (2001) “Growth and Transformation of Vesicles Studied by Ferritin Labeling and Cryotransmission Electron Microscopy” J Phys Chem B

Markvoort et al. (2007) “Lipid-Based Mechanisms for Vesicle Fission” J Phys Chem B

Mostafavi et al. (2016) “Entropic forces drive self-organization and membrane fusion by SNARE proteins” PNAS

Stachoviak et al. (2013) “A cost–benefit analysis of the physical mechanisms of membrane curvature” Nature Cell Biology

Although I never found the time to read it, the following book seems to cover exactly the topics I discussed. Having browsed through its pages now and then, it is likely I have been influenced by it:
Radu Popa “Between necessity and probability: searching for the definition and origin of life”

Which is the best model system for biomedical research? None, all model systems are wrong.

Which is the best model system for biomedical research? None, all model systems are wrong, but before I explain myself, let me tell you a story. One day I attended a retreat of the Molecular Physiology of the Brain Centre in Goettingen, and I genuinely had fun. Two things will remain in my memory.

First, Prof. Tom Jovin – one of the top scientists in the area I was working on – asked one of his most senior associates to show if the model I was using could provide representative results. I was studying the interaction of alpha-synuclein, a small protein involved in Parkinson Disease, with another protein, Tau, involved in neurodegeneration as well. Using molecular simulations, they demonstrated that the dynamic folding of alpha-synuclein is radically altered when alpha-synuclein is fused to a bulky fluorescent protein, a label I needed to quantify protein-protein interactions. As a PhD student, I was proud to deserve the scrutiny of Jovin’s group, a discussion that was based on reciprocal respect and motivated by the pursuit of the scientific truth. I aimed to compare differences between different mutants of alpha-synuclein and tau. I was using cell lines just as a test tube and to examine differences, obtaining the significant advantage of using a living cell to test these differences, but with the disadvantage of the requirement for a bulky label.

Second, there were several talks that day. I believe we started discussing NMR experiments on aqueous solutions of alpha-synuclein aimed to study its structure, then moving on work carried out in cell culture, fruit flies, mice, up to experiments done with primates. Once that people noticed the progression, the next scientist started to joke about the limitations of the model used by the previous colleague. To tell the truth, I do not remember if those were light-hearted comments or harsh criticisms, but I remember I came out of the meeting having had fun following the science and the debate, but also with a sense of uneasiness. Is there really a ‘best model system’ in biomedical research or all systems can be informative?

Let’s do another step back, away from this question.

Sometimes I like to say that ‘all biological model systems are wrong’ just for the fun of seeing the distorted faces of my interlocutor, probably caused by a wave of instinctive and unexpressed disdain or rage, before I explain myself. I assume I got (unconsciously and unwillingly) fond of this sentence, repeating a similar provocation expressed for mathematical models.

A mathematical model is ‘always wrong’ as it can never capture all the complex features of reality. Models are based on a few parameters that aid in reproducing and understanding a phenomenon providing as accurate as possible predictions. Models always lack some granularity in the description of reality and, therefore, they are always wrong. This idea is the exact opposite of what I was thought during my undergrad studies as a physicist. As far as a model has some predictive power, a model is correct, but some models are better than others in predicting a phenomenon. As it happens, these two contradictory statements mean the same thing. All models are wrong, as they will always be incomplete, but at the same time, correct, as they permit to predict – with varying degree of confidence – phenomena they represent. The most compelling example is the progression of models on relativity from Galileo Galilei, Isaac Newton to Albert Einstein, all great models that served humanity greatly.

Is this true also for biological model systems? Personally, I do not see why there should be any difference.

Let’s take the fruit fly for example. When a scientist is actually interested in understanding how a fruit fly works, a specific strain of fruit fly grown and examined in specific conditions will become the model system for all fruit flies. This is the closest that a biological model system can be to the system we intend to study (a fruit fly for fruit flies, C. elegans for nematodes, a C57BL/6 mouse strain for mice). An experimenter will be able to identify general principles, for instance, that certain genes or classes of genes are necessary during development to develop morphological or functional features like the wings and the capability to fly, the eyes and vision, colour and shape of structures etcetera. Scientists will be able to investigate also specific mechanisms, for instance, that a specific protein-protein interaction mediates the processing of information from receptors eventually resulting in the capability of the fly to find food. Researchers will be then capable of generalising their specific observations of a laboratory strain to the genetically identical but wild fly (I’ll discuss this more commenting on mice models), then to all fruit flies, and perhaps other living beings.

Of course for most scientists fruit flies are not model systems of fruit flies or other insects. Because of the genetic tools available, the fruit fly has been a fantastic model system to understand genetics and to explore the role of specific genes, gene interaction mechanisms, gene regulation during development and the role of genes in development. As pea plants permitted Mendel to formulate the first basic observations that led to the foundation of human genetics, the fruit fly expanded our knowledge of genetics (and much more) permitting us to understand better how genes work in humans. Are peas and flies the right models for human genetics or to study human physiology and disease? There is no right or wrong, beans and flies were good models for human genetics insofar they provided sufficient predictive power about humans, after which more accurate models could and will always be available.

A decade later my meeting on alpha-synuclein, I am often confronted with this type of questions. Sometimes, this is caused by a daunting sense of impending doom realising to have invested years of work in studying the ‘wrong’ model, simply because of choices that were taken at a different time or because of the (always) limited resources I had or have. Other times, I am confronted by the different views of colleagues, more often anonymous referees, depicting a model system as inadequate.

Let me briefly describe a few actual examples before a few closing remarks.

One of the earliest critical comments I heard on model systems after I moved into cancer research was during a lab meeting at the Venkitaraman’s lab. Although I do not recall the details, a colleague must have presented work on DT40 cells, a lymphoblast chicken cell line. Once again, I do not remember the tone of the conversation, but I do recall the comment that was shot at the speaker: ‘are we trying to cure chicken cancer?’ The DT40 cell line is made of floating immature avian white cells, which are certainly not the right model for human solid tumours we try to understand. However, DT40 cells exhibit a high capacity for gene recombination, permitting to modify their genetic background in a very efficient way and, therefore, DT40 cells have been successfully used to study the roles of several genes, BRCA2 in the laboratory where I work. Here, a well-resourced lab can carry experiments including in vitro assays, passing from DT40 and human cell lines, up to mice models and arriving at the study of human clinical samples.

In vitro experiments, where individual constituents of a biochemical reaction or a molecular machinery are reconstituted and studied, represent the most simple of the model systems (not necessarily the most straightforward experiments though). For instance, scientists can see kinesin molecules walking on microtubules. Among other functions, kinesins deliver cargos to and from peripheral cellular regions of the cell when or where diffusion of molecules would be an inefficient process to deliver specific cellular constituents. A kinesin molecule has two ‘legs’ that sequentially interact with microtubules, the (cyto)skeleton of the cell which is used by kinesin-like motorways. Kinesin utilizes ATP, one of the molecules used by nature to store energy, to propel itself forward. Without in vitro experiments, the molecular understanding of molecular motors would be unlikely. Work in cell lines was necessary to fine-tune our understanding of the system, but I do not think any colleague would feel the need for an unlikely/impossible in vivo human experiment attempting to falsify the model of kinesin motor. Does kinesin walk differently over synthetic microtubules on a coverslip or in their cellular context? I am no expert in this area, by I assume that while there would be substantial differences, the molecular principles of the kinesin stride, in this case, are safe. Cell culture work refines and improves these models and cell culture experiments done within a three-dimensional sample, where tissue-like forces are appropriately set will provide even a better picture of kinesin, but the basic in vitro work is and has been essential.

Here I touched the topic of three-dimensional cultures. Organisms are three dimensional, but the vast majority of experiments are performed with cell lines growing on surfaces. There is no doubt that 2D biology differs from 3D biology, as the topological and mechanical properties of the 2D or 3D structures will definitely alter the biological processes we study. Recently, a colleague of mine told me of a very peculiar comment they received during the refereeing of one of their manuscripts. In a very brief report, an anonymous colleague stated that “cancer occurs in vivo and not in a petri dish”, concluding that not having in vivo relevance that research was not worth publishing. How to rebuttal such a true statement? Perhaps, with a better understanding of what a model and a model system are? Two-dimensional culturing methods have provided us with such a wealth of information on how life works and cell biology will be not the great discipline is today without these model systems.

In the early nineteen-hundreds, Theodor Boveri inferred essential aspects of the process of oncogenesis studying, as a zoologist, cell division in the fertilised egg of the sea urchin. His experimental observations were as crucial as distant from the ‘right model system for cancer’ with the sea urchin egg being a one-dimensional culture system of an organism so remotely related to humans. During the first half of the last century, after studying oxygen consumption in fertilized sea urchin eggs, Otto Warburg revealed metabolic changes in cancer by placing tumours into a petri dish and analysing their metabolic action, what we would call today an organotypic culture. However, Otto Warburg later published a paper (‘On the origin of cancer cells’) commenting work performed on cell lines ‘What was formerly only qualitative has now become quantitative. What was formerly only probable has now become certain. The era in which fermentation of the cancer cells or its importance could be disputed is over, and no one today can doubt we understand the origin of cancer cells if we know how their large fermentation originates, or, to express it more fully, if we know how the damaged respiration and the excessive fermentation of the cancer cells originate’. At the best of my knowledge, Warburg’s work was controversial at the time, as much as Boveri’s, but only because of the hypotheses they brought forward, not because of their model systems. Although the fine details of their discoveries may be more or less accurate (or popular) with the judgement of state-of-the-art observations, Boveri’s and Warburg’s contributions to the understanding of the origin of cancer is invaluable and was largely based on very simple, very wrong, and yet so very correct model systems.

On a much more personal and less grandiose note, recently, colleagues and I were criticised for using HeLa cells for our studies. HeLa cells had been controversially (not at the time) derived from a cervical cancer of a non-consenting patient. HeLa cells grow in culture since 1951; they are hypertriploid cells, i.e. HeLa cells carry ~80 chromosomes rather than the normal set of 46 and ~25 are abnormal chromosomes. The genome of HeLa cells is otherwise considered stable. In a timeline article in Nature Reviews Cancer, John Masters describes ‘the good, the bad and the ugly’ of HeLa cells and he states: ‘Our knowledge of every fundamental process that occurs in human cells – whether normal or abnormal – has depended to a large extent on using HeLa and other cell lines as a model system. Much of what we know today, and much of what we do tomorrow, depends on the supply of HeLa and other cell lines.’

However, HeLa cells have shown a significant adaptability to different culturing conditions and, therefore, HeLa cell lines may behave differently in different laboratory, but eventually, this does not depend on the cell line, but on careful experimental practice, which is true for all model systems. More worrying is the issue of cross-contamination that again is true for any work and, incidentally, it is more likely to affect the work done on cell lines other than HeLa (contaminated by HeLa cells) rather than HeLa cells themselves. Not only we understood so much of how HeLa cells work, not a very interesting topic, but we were able to port much of this knowledge on other model systems and humans as well. Are HeLa cells the right model system for human physiology? Certainly not. Are HeLa cells the right model system to study molecular machineries acting in human cells? No, at the extent that all model systems are wrong, but yes as plenty of models derived from HeLa cells had predictive power, and we could infer that plenty that did not may have been because of lack of good scientific practice in general rather than HeLa cells themselves. John Masters, in his review, cites the opinion of Stan Gartler who revealed cross-contamination issues already in the sixties: ‘If the investigator’s requirement was for any human cell line, whether or not it was HeLa or another cell line does not seem important. However, in those cases in which the investigator has assumed a specific tissue origin of the cell line, the work is of dubious value’. Of course, it is of critical importance to accurately report on material used and to interpret data within the assumptions and limitations inherent in a specific model, but there is no reason to stigmatize HeLa cells, instead we should stigmatize poor practice in science, so widespread in the laboratory and the peer-review process, sometimes for ignorance, self-interest or – more often – because of the limited resources (time and money) and the high pressure scientists have to work with.

But let’s move on from two-dimensional culturing systems and arrive at the opposite, mouse model systems. Once again, mice models are very precious and had provided invaluable understanding on how life works and, how human physiology and pathology work. Are mice models the right model for human physiology and disease? Well, you got my opinion by now. No, they are the wrong model, but yes they are the correct model. It has been reported that observations carried out on mice cannot be reproduced in humans and not too infrequently even across laboratories. However, it may be argued that most of this lack of predictive power and reproducibility is again about scientific practice. Mice are not unfrequently of the wrong reported genetic background, laboratory conditions are too different from ‘real life conditions’ so that a laboratory imposed diet, physical and social activities will influence the outcome of the experiments, and the statistics used often lack the required rigour.

Again, the anonymous referee’s comment is sometimes revealing. I heard of colleagues being asked to attempt the falsification of their hypotheses by experiments done with transgenic mice. So often, that when we submit a paper, jokingly we predict we will be asked to do some animal experimentation. Often but not always, this may be a very considerate request, as to test the physiological relevance of observations done in vitro, in cell culture, or in insects, is certainly of fundamental importance. However, animal experimentation is performed within specific ethical guidelines and we, scientists, are asked to minimise the amounts of animals we use for research. Therefore, the choice to perform experiments in animals should be taken only when these experiments are necessary (and there are plenty of such cases). This choice should not be biased by the perception that some models are the perfect models for human physiology or disease while other are imperfect models (and there are plenty of such cases).

Then, which system is the best model system? The answer is obvious to anyone: it depends on the question. Every model system provides important information on a phenomenon within the limitations of the specific model. Or, in other words, any model system is ‘wrong’ because they are not the real thing. Contrary to physics, where we often study the real objects we want to study, in biomedical research, we cannot do experimentation on human beings, and we have to resort to model systems. It is always important to remember assumptions and limitations of a specific one.

Let me finish with another consideration, as I have mentioned the peer-reviewing process rather frequently in this assay. We, scientists, agree with each other a lot, but we equally disagree, and often passionately. Personally, I cannot understand why we may disagree on the fact that all model systems are valuable (or alternatively incorrect) or why colleagues will frequently ask to repeat experiments in yet another model system of the referee’s. Science is based on the process of falsification through experimental observation. This process cannot (and should not) be performed by a single scientist, group or consortium. Not only individuals rarely have the resources to perform experiments with a multitude of models and techniques; even if they do, experiments have to be performed by different experimenters, with different models and methods, in different places, in any case. All models are ‘wrong’, all techniques are limited, all experiments are somehow biased, but collectively they can inform us on the general principles and molecular mechanisms of human physiology and pathological states. Therefore, when arguing about the superiority of one model system compared to another one, let’s be passionate about it, let’s disagree, but if you are still arguing, let’s keep in mind that you are using the ‘wrong’ model system—like anyone else.

Otto Warburg “On the Origin of Cancer Cells” Science, Vol. 123, No. 3191 (1956)

John Masters “HeLa cells 50 years on: the good, the bad and the ugly” Nature Reviews Cancer, Vol. 2, 315-319 (2002)