Is the average between a cat and a dog a real animal?

dog-2632088_1280.jpg
Image credit: Pixabay License. Free for commercial use. No attribution require

Is it a cat? Is it a dog? Is the average between a cat and a dog a real thing, perhaps a caog or a doat?

Not all science should be based on single cell detection, and there are plenty of cases where single cell measurements are superfluous. However, too often we fail to appreciate the huge mistakes we can do in biology when we forget the assumptions we do when using population measurements.

But which assumptions do we really do?

Often implicitly, when doing population measurements (e.g., Western blots, sequencing, proteomics, etc…) we assume that populations of cells we measure are homogeneous and synchronous. Or at least we assume that these differences are unimportant and that they can be averaged out. In the best cases, we try to enforce a degree of synchronicity and homogeneity, experimentally. In reality, one of the most important assumptions we implicitly do is that the system we analyse is an ergodic system. In physics and statistics, an ergodic system is a system that, given a sufficiently long time, explore all its possible states. It is also a system where – if sufficiently sampled –  all its states are explored and, consequently, averages over time on a single cell and averages over a population at a given time are the same. However, there are limits to this assumption in biology. The obvious example is the cell cycle. There is significant literature about ergodicity and cell cycle [e.g., 1, 2, 3] and how this principle can be exploited, but…

The lottery for cell division makes you grow faster.

There is a particular phenomenon that we encountered while we were working on this project [4] that fascinated me for its simplicity and consequences. How cells can increase their fitness (i.e. their growth rate)? One obvious answer is by dividing faster. Another, at first glance less obvious answer, is by exhibiting an heterogeneous cell cycle length.  Let’s consider a population of cells that divides every 24 hours. Over one week, these cells will have 128 times the original population size. Now, let’s consider cells that divide on average every 24 hours but exhibit variation in cell cycle length, randomly, with a standard deviation of 4 hours and a normal distribution. Cells with 20 hours or 28 hours long cell cycle are equally probable to occur. However, in one week, cells with a 28 hours long cell cycle length will grow 64 times and cells with a 20 hours long cell cycle length will grow about 380 times. On average, these cells will grow ~200 times, that is much faster than cells dividing precisely every 24 hours (128 times). This is true for any pair drawn at equal distance from the two sides of the average; these pairs are equiprobable, thus cells dividing at a given average cell cycle length grow faster at increasing heterogeneity. Let’s remember that this can occur not just in the presence of genetic differences, but even just for stochastic variations where the progeny of one cell will not keep the same cell cycle length but will keep randomly changing according to an underlying distribution. This is a phenomenon that has been observed experimentally, for instance, in yeast [5] with single-cell measurements but that is occurring in any cellular systems as described in [1] and our own work [4]. Population measurements might conceal these very important phenotypic or mechanistic differences.

The sum of two normal distributions is not another normal distribution.

The beauty of the normal distribution is that it is such a ‘well behaved’ distribution and, at the same time, it represents many physical and biological phenomena.  If a population we are characterizing is made of two normal distributions, their average is the average of the normal distribution. If these have the same average, the variance of the sum will be the sum of the variances. These basic and useful mathematical relationships can be also rather misleading. In fact, while these statements are mathematically correct, two populations of cells that ‘behave rather differently’, for instance in response to a drug, cannot be averaged. For instance, one cell population might be killed with a given concentration of a drug. Another population might be resistant. By detecting 50% cell death, we could assume – incorrectly – that dosing at higher concentrations we could kill more cells.

The plot shown below illustrates this basic principle. The blue and red distributions, averaged together, exhibit the same variance and average of the yellow distribution but they represent very different systems. If the blue distribution represents the sizes of cats and the red distribution the sizes of dogs, the yellow distribution does not represent the size distribution of any real animals. In other words, the average phenotype is not a real phenotype and, in the best case scenario, when there is a dominant population, it represents the most frequent (the mode) phenotype. In all other cases, where the homogeneity of the phenotype is not checked, the average phenotype might be simply wrong.

gaussians

This is a very simple illustration of a problem we frequently encounter in biology, trusting our population measurements (averages and standard deviations over experimental repeats) without being sure of the distributions underlying our measurements. In the figure above, the purple distribution is a distribution where the average is the correct average of the blue and red distribution, but the purple distribution is the statistical error of the assay and it is unrelated to the scatter of the biological phenomenon we are measuring. Sometimes, we cannot do anything to address this problem experimentally because of the limitations of technologies but it is very important – at least – to be aware of these issues.

Just for the most curious, I should clarify that for two Gaussian distributions with relative weights A and B, we can define a mixing parameter p=A/(A+B). The average of the mixed population will be simply μP=p*μA+(1-p)*μB, i.e. for p=0.5 is the average of the means. The apparent variance is σP^2 = p*σA^2+(1-p)*σB^2+p(1-p)*(μA-μB)^2, i.e. σP^2 is the average of the variances summed to the squared separation of the two averages weighed by the geometrical averages of the mixing parameters of the two populations.

Collective behaviour of cells is not an average behaviour, quite the contrary.

When discussing these issues, I am often confronted with the statement that we eventually do not care about the behaviour of individual cells but with the collective behaviour of groups of cells. There are two important implications to discuss. First of all, when arguing the importance of single-cell measurements, we do not argue the importance of studying individual cells in isolation. Quite the contrary, we should measure individual cells in model systems the closest to the physiological state. However, many assays are incompatible with the study of cell behaviour within humans and we resort to a number of model systems: individual cells separated from each other, 2D and 3D cultures, ex and in vivo assays. The two arguments (single cell measurements or measurements in more physiological model systems of tissues or organisms) are not the same.

Second, collective behaviours are not ‘average behaviours’. There are great examples in the literature but I would advise just even to visit the websites of two laboratories that I personally admire. They nicely and visually illustrate this point, John Albeck’s laboratory at UC Davis and Kazuhiro Aoki’s laboratory at NIBB. Collective behaviours emerge from the interaction of cells in space and time as illustrated by waves of signalling or metabolic activities caused by cell-to-cell communication in response to stimuli. The complex behaviours that interacting cells exhibit, even just in 2D cultures, can be understood when single cells and their biochemistry are visualized individually. Once again, phenotypes or their mechanism might be concealed or misinterpreted by population or snapshot measurements.

This is, of course, not always the case. However, my advice is to keep at least in mind the assumptions we do when we perform an ensemble or a snapshot measurement and, whenever possible, to check they are valid.

Snap opinion on deep-learning for super-resolution and denoising

I am personally conflicted on this topic. I have recently started to work on machine learning and deep-learning specifically. Therefore, I am keen to explore the usefulness of these technologies, and I hope they will remove bottlenecks from our assays.

My knowledge about CNNs is rather limited, even more so for SR and denoising applications. My first opinion was not very positive. After all, if you do not trust a fellow scientist guessing objects from noisy or undersampled data, why should you trust a piece of software? That appeared to be also the response of many colleagues.

After the machine learning session at FoM, I partially changed opinion, and I am posting this brief -very naïve – opinion after a thread of messages I read on twitter by colleagues. Conceptually, I always thought of machine learning as ‘guessing’ the image, but suddenly I realise that CNNs are perhaps learning a prior or a set of possible priors.

I have mentioned in a previous post about the work by Toraldo di Francia on resolving power and information, often cited by Alberto Diaspro in talks. Di Francia, in his paper, states “The degrees of freedom of an image formed by any real instrument are only a finite number, while those of the object are an infinite number. Several different objects may correspond to the same image. It is shown that in the case of coherent illumination a large class of objects corresponding to a given image can be found very easily. Two-point resolution is impossible unless the observer has a priori an infinite amount of information about the object.”

Are CNNs for image restoration and denoising learning the prior? If so, issues about possible artefacts might be not put aside but at least handled a bit better conceptually by me. The problem would then shift to understand which priors a network is learning and how robust these sets are to typical variations of biological samples.

Great talks today at FoM. Eventually, we will need to have tools to assess the likelihood that an image represents the ground-truth and some simple visual representation that explain what a CNN is doing to a specific image that is restored and ensure good practise. Nothing too different from other techniques, but I feel it is better to deal with these issues earlier rather than later in order to build confidence in the community.

Related twitter thread: https://twitter.com/RetoPaul/status/1118435878270132225?s=19

Volume rendering: is this localization-based super-resolution?

Project outcome published in Biophysical Journal in 2010.

  • Esposito A*, Choimet JB, Skepper JN, Mauritz JMA, Lew VL, Kaminski CF, Tiffert T, “Quantitative imaging of human red blood cells infected with Plasmodium falciparum“, Biophys. J., 99(3):953-960

Most papers have an untold backstory that we cannot reveal in it so to focus on a main message and the most relevant discoveries. This one has a little one I wish to share. Volumetric imaging of red blood cells is not the most difficult thing I have ever done. However, accurate morphological and volumetric imaging of red blood cells infected by Plasmodium falciparum, the causative pathogen of malaria, caused me a few headaches. Let’s forget the time spent waiting for the cultures growing at the right speed to deliver bugs at the right stage of development, undecided if to sleep before or after the experiment, and always getting the decision wrong. Let’s not speak for now about the optimization of the sample preparation that that by trying and failing lead to other interesting observations. And here we focus on the very simple concept of accurate volume rendering.

In one way or another, volume rendering and estimation will require some sort of thresholding on the data so to discriminate the object from the background. As imaging conditions change even slightly from experiment to experiment, setting this threshold might confound the final outcomes. When you deal also with a sample that undergoes major morphological transitions, a simple problem soon became one for which I spent a lot of time to identify a solution for. As it happens, one perhaps does not find the best, most elegant or even the simplest solution, but the solution that they can find with their skills and tools. Mine was a brute-force solution of isosurface volume rendering, iteratively deformed by local refitting of a random sample of vertices in order to respect a specific model set for the transition of object to background. This was a method that permitted us to preserve high resolution morphological descriptions, at high accuracy and reproducibility for volume rendering.

This work was carried out while many of my colleagues were focusing on super-resolution, e.g. maximizing the spatial resolution in optical microscopy. Then, it was simple to notice that fitting a surface onto volumetric data delivers volume estimates at higher precisions than what the optical resolution of a microscope should permit. Indeed, whenever you have a model for an object, in my case the boundary of a red blood cell, in single-molecule super-resolution methods the point-spread-function of an emitter, it is possible to fit this model with a precision that is not (fully) constrained by diffraction, but – in the right conditions – only by the signal-to-noise ratio, the analytical tools and the adequacy of the model for the object.

In this Biophysical Journal paper, we focused on the biological application and, together with other published work, on the modelling of homeostasis of infected red blood cells. Also to avoid criticisms from referees, probably legitimate ones, I decided not to mention the concept of super-resolution. As my research focus is on biochemical resolution and its utilization to understand cellular decisions in cancer, I will not pursue this work any further, but I thought to write this little story.

While writing this brief story, I recalled my friend Alberto Diaspro often citing Toraldo di Francia on resolving power and information. I believe that my work was far from being breakthrough from an optical standpoint, but I wished to use it as a reminder of a fundamental issue that, often in biomedical applications, get forgotten. The resolution at which we can observe a phenomenon, irrespective of the tools used, depends both on the qualities of the instrument used and the quality of prior information we can utilize to interpret the data. Once technology permitted to image single emitters in fluorescence microscopy, the prior of point-like sources could be use to analyse images so to reveal the fullness of the information content of an image that is carried by photons.

In an experiment, information content is the most precious thing. Irrespective of the methodologies used, our protocols are designed to maximize signal-to-noise ratios and, thus, maximize information content, precision and resolution. However, as trivial as these statements are, in the biomedical sciences we often do not follow through the process of maximizing information content. Significant information can be provided by our a priori constrains and models. Moreover, a thorough understanding of information theory related to a specific assay can provide levels of precision and resolution that go beyond what we assume, at first, possible. However, priors and information theory are far too often neglected. This happens out of necessity as most people do not have the training and understanding of both biological and physical processes, and even those that might, have to invest their limited resources carefully. I wish that in the future there will be more collaborative work between the life sciences, physicists and mathematicians, aimed to better understand how to extract maximum information from experiments in the biomedical areas.

So… was our volumetric imaging super-resolution? I am not sure I care to really answer, but I wished to provoke some thoughts and make you think a little bit about the relevance of information theory in biomedical research.

Photon partitioning theorem and biochemical resolving power

Project outcome published in PLoS ONE in 2013.

  • Esposito A*, Popleteeva M, Venkitaraman AR, “Maximizing the biochemical resolving power in fluorescence microscopy”, PLOS ONE, 8(10):e77392

After my 2007 theoretical work on photon-economy and acquisition throughput, I occasionally worked on a more general framework attempting to falsify my hypothesis that multi-channel or multi-parametric imaging techniques can deliver better results than other simpler techniques.

My proposal to develop instrumentation to achieve spectrally and polarization resolved lifetime imaging (later defined as HDIM) was met with scepticism by many. The recurrent question was: if you struggle to do a double exponential fit with the small photon budget we have available in biological applications, how could you possibly dilute these photons over several channels and analyse them with more complex algorithms?

Here, there are a few fundamental misunderstandings. First, the analysis should not be carried out on each “detection channel” independently, but the entire dataset should be used to exploit all information at once. Second, the use of dispersive optics rather than filters permits to acquire a higher number of useful photons. Third, limitations in current technologies (e.g., speed or photon-collection efficiency) should not be an obstacle to the development of these techniques because these are not conceptual flaws, but simply technology obstacles that can be removed.

Although I have a lot of (unpublished) work I used to describe performances of multi-channel systems, I achieved a breakthrough only when I understood I had to focus my efforts on the description of the general properties of the Fisher information content in fluorescence detection rather than the Fisher information in a specific experiment. Fisher information is the information content that an experiment provides about an unknown we wish to estimate. Its inverse is the smallest variance ever attainable within an experiment, or what is called the Rao-Cramer limit. In other words, by maximizing Fisher information, we maximize the precision of our experiments.

Photon-partitioning theorem

The second breakthrough was the understanding that the best description of precision in biophysical imaging techniques was possible only defining the concept of biochemical resolving power that is a generalization of the resolving power of a spectrograph to any measured photophysical parameter and then to its application to biochemistry. The biochemical resolving power is proportional to the square root of the photon-efficiency of a microscopy technique and the number of detected photons. Maximization of Fisher information leads to the maximization of photon-efficiency and, therefore, net improvements in biochemical resolving power. This definition complements the definition of spatial resolution in microscopy and allows to define when two objects are spatially and/or biochemically distinct. It is worth to mention that this is equivalent to stating that two objects are spatially and photo-physically distinct, but we use the photophysics of fluorophores to do biochemistry, hence my nomenclature. I see possible implications for other techniques, including super-resolution and, perhaps, this will be the subject of a future work.

The third breakthrough was the utilization of numerical computation of Fisher information rather than the analytical solutions of equations that are not always available. This process is very common in engineering but not in our field. Therefore, we can now optimize the properties of any detection scheme in order to attain the highest performance.

This work is a very specialist one and I assume there will be not many people interested in it, although the implications of this piece of theory for everyone’s experiment are significant. I believe that this is my most elegant theoretical work, but I guess it is a matter of opinion. The paper in itself had to be expanded well beyond what I wished to publish during the refereeing process and it is now including examples, software, etc. I think the theoretical introduction and the mathematical demonstrations are the best part and the description of the numerical optimization of Fisher information the most useful.

NOTE: there are two typographical errors in the published manuscript within the definitions of photon economy and separability. These are described in a comment on PLOS ONE