Occasionality or Probability - What is the Right Term?

The concept of probability comes from mathematics and it has rigorous mathematical definition. It is a measure of distribution of outcomes of truly independent trials. Wikipedia describes it as -

When dealing with experiments that are random and well-defined in a purely theoretical setting (like tossing a fair coin), probabilities can be numerically described by the statistical number of outcomes considered favorable divided by the total number of all outcomes (tossing a fair coin twice will yield head-head with probability 1/4, because the four outcomes head-head, head-tails, tails-head and tails-tails are equally likely to occur).

Do you see the emphasis on ‘truly independent’ or ‘well-defined’? Consider tossing a coin for example. Based on our understanding of physical laws governing its motion, it can be said that the motion is well-defined and each outcome is truly independent. The concept of probability is used extensively in theoretical physics (e.g. wave function as ‘probability amplitude’ in quantum mechanics, or distribution of gas atoms or atoms on vibrating springs in statistical physics to derive the concept of temperature), but the physicists always had to qualify their assumption of truly independent by further experimental confirmation. The probabilistic theory of quantum mechanics confused even heavyweights like Einstein, but nobody managed to come up with an experiment to prove it wrong.

When it comes to genetics, we are increasingly seeing the use of the term ‘probability’ to describe experiments, which are far from well-defined. This is not only confusing, but also harmful when being applied in medical context. Professor Ken Weiss wrote an excellent blog post to argue that use of the term ‘probability’ in genetics should be replaced with ‘occasionality’ to make sure people understand that that they are dealing with a different beast altogether.

Occasionality: a more appropriate alternative concept–where there’s no oh!

When many factors contribute to some measured event, and these are either not all known or measured, or in non-repeatable combinations, or not all always present, so that each instance of the event is due to unique context-dependent combination, we can say that it is an occasional result. In the usual parlance, the event occasionally happens and each time the conditions may or may not be similar. This is occasionality rather than probability, and there may not be any ‘o-value’ that we can assign to the event.

This is, in fact, what we see. Of course, regular processes occur all around us, and our event will involve some regular processes, just not in a way to which probability values can be assigned. That is, the occasionality of an event is not an invocation of mystic or immaterial causation. The word merely indicates that instances of the event are individually unique to an extent not appropriately measured, or not measured with knowable accuracy or approximation, by probabilistic statistical (or tractable deterministic) approaches. The assumption that the observations reflect an underlying repeated or repeatable process is inaccurate to an extent as to undermine the idea of estimation and prediction upon which statistical and probabilistic concepts are based. The extent of that inaccuracy is itself unknown or even unknowable.

There are clearly genetic causal events that are identifiable and, while imperfect because of measurement noise and other unmeasured factors, sufficiently repeatable for practical understanding in the usual way and even treated with standard probability concepts. Some variants in the CFTR gene and cystic fibrosis fall into that category. Enough is known of the function of the gene and hence of the causal mechanism of the known allele that screening or interventions need not take into account other contextual factors that may contribute to pathogenesis but in dismissible minor ways. But this seems to be the exception rather than the rule. Based on present knowledge, I would suggest that that rule is occasionality.

There is another problem that he does not mention, and we discussed it in A Sequel to Heng Lis Mysterious New Program, but say here again.

In the current trend of bioinformatics, the theoretical researchers are resorting to being ‘tools provider’, and then the experimentalists are using those ‘benchmarked tools’ to analyze experimental data. That creates distance between those who are developing ‘software tools’ (note - not theoretical models) and those who are interpreting data. If the tool generates a ‘p-value’, then the p-value becomes scientific truth. Moreover, the science becomes diluted, because tools do not leave much room for discussions about fundamental principles. Try to argue with your computer program and see who wins :)

‹»Chromosome-scale Shotgun Assembly using an in vitro Method for Long-range Linkage« »Repetitive Elements May Comprise Over Two-Thirds of the Human Genome«›