**Is it a cat? Is it a dog? Is the average between a cat and a dog a real thing, perhaps a caog or a doat?**

Not all science should be based on single cell detection, and there are plenty of cases where single cell measurements are superfluous. However, too often we fail to appreciate the huge mistakes we can do in biology when we forget the assumptions we do when using population measurements.

**But which assumptions do we really do?**

Often implicitly, when doing population measurements (*e.g.*, Western blots, sequencing, proteomics, etc…) we assume that populations of cells we measure are homogeneous and synchronous. Or at least we assume that these differences are unimportant and that they can be averaged out. In the best cases, we try to enforce a degree of synchronicity and homogeneity, experimentally. In reality, one of the most important assumptions we implicitly do is that the system we analyse is an **ergodic system**. In physics and statistics, an ergodic system is a system that, given a sufficiently long time, explore all its possible states. It is also a system where – if sufficiently sampled – all its states are explored and, consequently, averages over time on a single cell and averages over a population at a given time are the same. However, there are limits to this assumption in biology. The obvious example is the cell cycle. There is significant literature about ergodicity and cell cycle [e.g., 1, 2, 3] and how this principle can be exploited, but…

**The lottery for cell division makes you grow faster.**

There is a particular phenomenon that we encountered while we were working on this project [4] that fascinated me for its simplicity and consequences. How cells can increase their fitness (*i.e.* their growth rate)? One obvious answer is by dividing faster. Another, at first glance less obvious answer, is by exhibiting an heterogeneous cell cycle length. Let’s consider a population of cells that divides every 24 hours. Over one week, these cells will have 128 times the original population size. Now, let’s consider cells that divide on average every 24 hours but exhibit variation in cell cycle length, randomly, with a standard deviation of 4 hours and a normal distribution. Cells with 20 hours or 28 hours long cell cycle are equally probable to occur. However, in one week, cells with a 28 hours long cell cycle length will grow 64 times and cells with a 20 hours long cell cycle length will grow about 380 times. On average, these cells will grow ~200 times, that is much faster than cells dividing precisely every 24 hours (128 times). This is true for any pair drawn at equal distance from the two sides of the average; these pairs are equiprobable, thus cells dividing at a given average cell cycle length grow faster at increasing heterogeneity. Let’s remember that this can occur not just in the presence of genetic differences, but even just for stochastic variations where the progeny of one cell will not keep the same cell cycle length but will keep randomly changing according to an underlying distribution. This is a phenomenon that has been observed experimentally, for instance, in yeast [5] with single-cell measurements but that is occurring in any cellular systems as described in [1] and our own work [4]. Population measurements might conceal these very important phenotypic or mechanistic differences.

**The sum of two normal distributions is not another normal distribution.**

The beauty of the normal distribution is that it is such a ‘well behaved’ distribution and, at the same time, it represents many physical and biological phenomena. If a population we are characterizing is made of two normal distributions, their average is the average of the normal distribution. If these have the same average, the variance of the sum will be the sum of the variances. These basic and useful mathematical relationships can be also rather misleading. In fact, while these statements are mathematically correct, two populations of cells that ‘behave rather differently’, for instance in response to a drug, cannot be averaged. For instance, one cell population might be killed with a given concentration of a drug. Another population might be resistant. By detecting 50% cell death, we could assume – incorrectly – that dosing at higher concentrations we could kill more cells.

The plot shown below illustrates this basic principle. The blue and red distributions, averaged together, exhibit the same variance and average of the yellow distribution but they represent very different systems. If the blue distribution represents the sizes of cats and the red distribution the sizes of dogs, the yellow distribution does not represent the size distribution of any real animals. In other words, the average phenotype is not a real phenotype and, in the best case scenario, when there is a dominant population, it represents the most frequent (the mode) phenotype. In all other cases, where the homogeneity of the phenotype is not checked, the average phenotype might be simply wrong.

This is a very simple illustration of a problem we frequently encounter in biology, trusting our population measurements (averages and standard deviations over experimental repeats) without being sure of the distributions underlying our measurements. In the figure above, the purple distribution is a distribution where the average is the correct average of the blue and red distribution, but the purple distribution is the statistical error of the assay and it is unrelated to the scatter of the biological phenomenon we are measuring. Sometimes, we cannot do anything to address this problem experimentally because of the limitations of technologies but it is very important – at least – to be aware of these issues.

Just for the most curious, I should clarify that for two Gaussian distributions with relative weights A and B, we can define a mixing parameter p=A/(A+B). The average of the mixed population will be simply μP=p*μA+(1-p)*μB, *i.e.* for p=0.5 is the average of the means. The apparent variance is σP^2 = p*σA^2+(1-p)*σB^2+p(1-p)*(μA-μB)^2, *i.e.* σP^2 is the average of the variances summed to the squared separation of the two averages weighed by the geometrical averages of the mixing parameters of the two populations.

**Collective behaviour of cells is not an average behaviour, quite the contrary.**

When discussing these issues, I am often confronted with the statement that we eventually do not care about the behaviour of individual cells but with the collective behaviour of groups of cells. There are two important implications to discuss. First of all, when arguing the importance of single-cell measurements, we do not argue the importance of studying individual cells in isolation. Quite the contrary, we should measure individual cells in model systems the closest to the physiological state. However, many assays are incompatible with the study of cell behaviour within humans and we resort to a number of model systems: individual cells separated from each other, 2D and 3D cultures, *ex* and *in vivo* assays. The two arguments (single cell measurements or measurements in more physiological model systems of tissues or organisms) are not the same.

Second, collective behaviours are not ‘average behaviours’. There are great examples in the literature but I would advise just even to visit the websites of two laboratories that I personally admire. They nicely and visually illustrate this point, John Albeck’s laboratory at UC Davis and Kazuhiro Aoki’s laboratory at NIBB. Collective behaviours emerge from the interaction of cells in space and time as illustrated by waves of signalling or metabolic activities caused by cell-to-cell communication in response to stimuli. The complex behaviours that interacting cells exhibit, even just in 2D cultures, can be understood when single cells and their biochemistry are visualized individually. Once again, phenotypes or their mechanism might be concealed or misinterpreted by population or snapshot measurements.

**This is, of course, not always the case. However, my advice is to keep at least in mind the assumptions we do when we perform an ensemble or a snapshot measurement and, whenever possible, to check they are valid.**