Statistics tell us the probability of throwing a 2 with a regular die, is one in 6 – something we can easily verify by throwing the die repeatedly to see whether our prediction matched reality.

We can also predict that rolling one die ten times would give us the same result, on average, as rolling 10 dice just once. Again, we can verify our predictions using real-world experiment.

Statistics can also help us know if dice are not identical. If we take 10 dice and split them into two groups of five and then treat each group differently in some way, we can perform a physical experiment to test our assumption that the treatment had an effect compared to the control. Statistics can tell us how many times we need to throw the dice in order to reject the null hypothesis – that the treatment had no effect.

While statistics allow us to measure probability using inanimate objects like dice or playing cards, I would like to propose that their use in human or biological experiments is more problematic.

Imagine we want to know if a certain diet causes weight loss. As we know, if we put two people on the same diet, one loses weight while the other becomes depressed and neurotic. Enter the randomised controlled trial.

In our imaginary trial, Pete and Dave end up in our control group and for the purposes of this experiment, we will presume their placebo diet had no effect on their weight. Fred and George are in our treatment arm and we measure their weight Sunday night at 8 pm and again one week later. Let’s say, the most Fred and George could gain or lose was 5kg each way and we will measure their change to the closest kg. After one week, Fred lost 2 kg and George gained 2 kg. Our experiment would conclude that the diet had no effect as the average result was the same as the placebo group and we therefore accept the null hypothesis.

If Fred and George were dice, statistical theory would assume that the next time they did the experiment, each would have an equal chance of every result, from -5 to +5.

But Fred and George are human beings. They are not identical.

What if we did the experiment again, keeping the initial randomisation and got the same result? What if they have physiologic differences that make them respond differently to that diet? Those factors may not be known.

What if we ran the same experiment again, just like we do to test the outcome of dice? If fact, let’s give them a week off (they come back to their baseline weight) and we start again. The next week Fred loses 1kg and George gains 1kg. We keep doing this experiment for 50 repetitions. What we might find is that every time we run the experiment, Fred loses weight and George gains weight, in approximately equal proportions.

The conclusion, based on normal statistical analysis would be the diet has no effect on weight loss but this is not a true result because the diet worked for Fred and didn’t work for George. Does the fact that their individual results were negated by averaging invalidate the value of the diet for Fred?

Of course, the very basis of randomised controlled trials is that any confounding factors that could affect the result will be equally distributed between the treatment and control groups. Supposedly, the only variable is then the intervention. Nevertheless, each group’s results are averaged and the statistics measure the area under the bell curve of each group to see if there is a statistically significant difference in the means and standard deviations. But the statistics that measure this difference are based on the dice assumption, that each time the dice is rolled, there is an equal and random chance of rolling each number. While this assumption fits the politically correct narrative that all people are equal, is it true?

Whether running the same randomised controlled trial on the same people multiple times with the same randomisation would give you the same result has never been tested. What if the outcome of every intervention is predetermined for every individual based on their genetics, physiology and environment? It would be as if, when using 10 die, each was weighted to favour a different number. The statistics we currently use don’t account for that possibility. Additionally, what if the average result actually suited nobody?

In his recent book “The End of Average” Todd Rose tells the story of the US Air Force’s attempt to build a cockpit that would suit the average pilot in the 1950s. Using detailed measurements of 4063 pilots, Gilbert S Daniels crunched the numbers to find out how many of those pilots were “average” even allowing for a 30% variation. His answer? None of them. Zilch.

If we can’t get something as tangible as physical dimensions right using an average, how are the results of average responses to a treatment supposed to give us anything meaningful in terms of treatment success?

What if results cannot be generalised? What if results are unique to an individual and can only be determined by trial and error? In fact, that’s what we would expect if statistics were an unreliable guide to outcomes, and what actually happens in the real world. When a doctor gives you a drug and you react badly or it doesn’t work, you go back and get a different drug; trial and error.

If science is ever to give us predictable outcomes in medicine, it has to stop treating people as if they are dice, and it has to stop applying average outcomes to everyone. I understand the motivations of institutions who benefit from dealing only with averages but for clinicians, if patient-centered is ever to mean anything beyond giving patients choices they don’t really understand, science must account for physiologic diversity. If randomised controlled trials result in a trial and error approach, is their utility over-estimated?