gms | German Medical Science

GMS Zeitschrift für Hebammenwissenschaft

Deutsche Gesellschaft für Hebammenwissenschaft e.V. (DGHWi)

ISSN 2366-5076

A new method supporting decision-making in case of unclear scientific evidence: Test implementation and simulation

Research article

Search Medline for

  • corresponding author Christine Loytved - Institute of Midwifery, Health Department, School of Health Professions, Winterthur, Switzerland
  • Rebecca Erdin - Institute of Midwifery, Health Department, School of Health Professions, Winterthur, Switzerland

GMS Z Hebammenwiss 2018;5:Doc01

doi: 10.3205/zhwi000011, urn:nbn:de:0183-zhwi0000111

This is the English version of the article.
The German version can be found at:

Received: September 20, 2017
Accepted: April 16, 2018
Published: December 28, 2018

© 2018 Loytved et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at


Background: Many health workers are aware of the results of reviews like those of the Cochrane Collaboration. Some results of these reviews show, at that stage of the research, no advantage for one of two promising interventions. In these cases, Beck-Bornholdt and Dubben propose a modified, never-change-a-winning-team algorithm. Similar algorithms are used in cases of study group assignments or adjustments to the design of a study in progress.

Research question: Is the method proposed by Beck-Bornholdt and Dubben in 2003 helpful for the daily work of midwives when they have to choose between two interventions with similar evidence of success?

Methodology: The application of the algorithm is being simulated for possible use by health workers. This includes all existing experiences made with both interventions to decide on the intervention for the next person to be treated.

Results: Simulations were carried out for various scenarios with different likelihood for success with regard to both interventions. It can be demonstrated that the average success rate in all scenarios is already improved starting with the second person treated, in comparison to the average success rate for both interventions.

Conclusions: The results can serve as a basis for discussion for the applicability of the suggested method. If the evidence is unclear, the algorithm can support the decision of health workers for one of two possible treatments, with positive effect. The special conditions of the setting in question (clientele, treatment realization) are hereby taken into account in each case.

Keywords: algorithm, intervention, midwifery research


In midwifery, the issue of evidence-based practice has increasingly come to the fore. Midwives themselves insist on the three pillars (scientific evidence, the midwife’s experience and the client’s preference) being taken into account and this is something that is also required by other parties involved – whether by other healthcare professionals or by clients. On some topics however, such as heartburn during pregnancy, the scientific evidence does not clearly favour a single intervention, which can comprise of either action or non-action. For some topic areas, evidence-based guidelines such as those published by the National Institute for Health and Care Excellence (NICE) or systematic reviews such as those by the Cochrane Collaboration make recommendations supporting different interventions in a specific situation. This particularly applies in situations where, based on the current state of research, there is no apparent advantage for any of the interventions tested. In the case of heartburn during pregnancy, for instance, it remains unclear whether acupuncture alleviates the condition or whether clients should be advised to adjust their eating habits [8]. Especially for issues that do not involve a significant intervention, such as administering oxytocin for induction of labour, but rather less invasive measures, it is also unlikely that the state of research will improve in near future. In cases like this where reviews or guidelines conclude that either method could result in a successful outcome, midwives need access to a tool supporting the decisions they have to make in their daily work between one of two interventions. Given that the available studies only support decision-making to a limited extent in these instances, it is advisable for midwives to use the internal evidence based on their own practical experience up to now to make their decisions. In other words, the successes and failures experienced in connection with an intervention should be effectively incorporated into future decision-making.

Beck-Bornholdt and Dubben [2] propose a potential solution in form of a modified never-change-a-winning-team algorithm. Based on this algorithm, the decision on which intervention to use for the next client presenting with a relevant diagnosis is made according to a specific formula: in each case, the decision should be made based on all experiences of both intervention options so far. This idea, also referred to as the “play-the-winner rule”, draws on the works of Bayes [1] and Wold [11], although similar approaches had already been described in earlier studies [12]. Bayes’ theorem was proposed for the following applications:

To create a rule identifying the interim results that would make it ethically necessary to stop a randomised controlled trial,
to determine the appropriate drug dosage for a randomised controlled trial and
to interpret evidence from trials conducted to date [10].

Similar methods were discussed for use in the assignment for trial participants to groups [6], adjustments to the design of a trial when it is in progress [4] and for evaluations in the healthcare sector [10]. Beck-Bornholdt and Dubben [2] use Bayes’ theorem by drawing on the successes and failures observed to date to help individuals decide between two interventions considered to be similarly promising. To date, this method has not been used as a decision-making tool in the day-to-day practice of healthcare professionals. The application of the algorithm for a modified “play-the-winner rule” should be as user-friendly as possible. One plausible option would be to develop an app where the user only has to enter the relevant interventions and whether or not they were successful. The application of the algorithm should not be rigid, however, and in no way should it restrict the client’s right to self-determination. The algorithm should only serve as an instrument for the midwife to create a pool of experience more effectively and rapidly in some areas. These experiences would otherwise still be acquired but not systematically processed.

Objective and research question

If, based on systematic reviews and evidence-based guidelines, the conclusion can be drawn that two interventions are equally promising, midwives (and other healthcare professionals) should have access to a decision-making tool based on current scientific knowledge to enable them to provide their clients with the best possible care. This should not infringe the client’s right of self-determination, however. The client’s individual circumstances should also be taken into account. If, after weighing up all the relevant factors, no clear-cut decision can be made in favour of one intervention and against a second, the midwife should receive support. The purpose of the tool described here is to help the midwife to decide which recommendation to make in the process of informed shared decision making. If her recommendation is implemented, the midwife acquires further experience, which will, in turn, help her to make future recommendations.

This study therefore seeks to address the following question: Is the method proposed by Beck-Bornholdt and Dubben [2] a helpful decision-making tool for the day-to-day work of midwives when they are faced with a choice between two interventions shown in the literature as having similar evidence of success?


To facilitate the decision as to which intervention should be used, the modified “play-the-winner rule” algorithm is applied as follows:

For the first client one of the two interventions is defined at random as the starting intervention (e.g. by tossing a coin). This is termed IS.
Provided that IS is successful, the intervention is repeated for the next client.
When IS fails for the first time, the midwife switches to the other intervention. This is termed IA.
Again, IA is then continually repeated for each subsequent client as long as it remains successful.
When IA fails for the first time, for each subsequent client the probability of selection (α) is calculated for one of the two interventions (termed αS and αA).

Equation 1


Equation 2

where w is the number of switches between the two methods and the success rate of the individual interventions is calculated using the following formula:

Equation 3a Equation 3b

Using a random value between 0 and 1 (here termed z) and one of the two probabilities αS and αA, the intervention to be chosen this time is then determined. Here, we outline the approach based on αS but we could equally have used αA (because the following is always true: αA=1αS ≡ αS=1αA would have applied):

if z≤αS → select IS for next client

if z>αS → select IA for next client

Put into words: first, the algorithm gathers initial experiences from both interventions. As soon as information is available about the probability of success of each of the two methods, this growing body of information is then incorporated into the next decision. The more successful an intervention appears to be (based on experiences acquired), as compared to the other intervention, the more frequently this intervention will be selected. And, with increasing experiences of both methods (number of switches), this preference is reinforced. However, due to the technique of using a random value to select the intervention, the intervention which has so far been less successful is also occasionally given another chance. This approach is important since it is distinctly possible that the intervention which is actually the more successful appears, by chance, to be less successful during the initial experiences. By repeatedly applying both interventions, the success rates observed over the course of the time series converge on the true success rates.

For the application of this algorithm to be useful, the following conditions must be met.

  • It is unclear which of the two interventions should be favoured in the given situation; there is no scientific evidence that one intervention might be more suitable than the other.
  • According to the midwife’s assessment, all clients have the same starting situation, meaning the implementation of the interventions selected would be suitable for them. They generally have the same predefined symptoms.
  • The success/failure of the intervention must be clearly identifiable and always determined according to the same criteria, predefined by the midwife. Here, subjective factors both from the side of the client and the midwife play a role. However, these are part of the methodology since the question we are seeking to address is: Which is the most promising intervention for my daily work?
  • The clients come one after the other and the success/failure of the intervention for the preceding client is already known by the time the next client is treated.

Obviously the two interventions to be compared can also comprise intervention versus non-intervention (for instance, in the case of heartburn during pregnancy: acupuncture versus no treatment), or the same intervention with different dosages (for example, different advice on coffee consumption during pregnancy [5]).

For the simulation in the present study, we use the decision between applying quark (a form of curd cheese) and cabbage in the case of excessive initial breast engorgement during the lactation period as an example scenario. We assume that the diagnosis is correct. According to the relevant guideline [3], the consensus is:

‘Based on many years of experience from midwifery practice, the conclusion is that cold compresses in the form of cold packs, cabbage leaves or quark as well as deep tissue massage can be applied for symptomatic treatment.’

A Cochrane Review [7] also refers to different treatments for breast engorgement, including different cabbage leaf applications: trials looking at cabbage leaves showed no difference between room temperature and chilled cabbage leaves, between chilled cabbage leaves and gel packs and between cabbage cream and the inactive cream. However, all forms of treatment provided some relief. There was no trial looking at the application of quark.

In order to be able to simulate the functioning of the algorithm, we assume that we know the true success rates of quark and cabbage and so arbitrarily define that quark is the more successful of the two methods. For our simulation example, success is defined as complete elimination of redness and swelling within 24 hours of treatment. We tested six different scenarios, each with the more successful (quark) and less successful (cabbage) intervention. We selected success rates at different levels (in the lower, mid and upper percentage range) (see Table 1 [Tab. 1]) and the difference between the success rates was set once at 5 percentage points and once at 17 percentage points for each intervention. The success rates used have nothing to do with the true success rates of quark and cabbage in excessive initial breast engorgement during the lactation period, which, to our knowledge are not yet known, but rather are generated for the purpose of illustrating the algorithm.

For each of the six simulation scenarios with the predetermined success rates, 10,000 simulations were started for a series of 300 consecutive clients: to indicate treatment success, each client was allocated a random value (“a roll of the dice”), where the probability of a successful outcome corresponded with the predetermined success rate.

All simulations and analyses in this study were conducted using the statistical software R [9].


In the next two sections, we describe the results of the simulation for the client series using the algorithm. In the first section, we show three examples of simulated series and, in the second section, the mean success rates across all 10,000 simulations for the six different scenarios are presented and compared.

Examples from the simulation series

Figure 1 [Fig. 1] shows an example of a simulation using the algorithm for the first 100 clients. It is one of the 10,000 simulations calculated for the scenario with the mid-range success rates and the greater difference between the success rates of the two interventions, where the success rate for quark is set at 57 percent and for cabbage at 40 percent.

In this first example, quark is selected at random (step 1 of the algorithm, see Methodology section) as the starting intervention. For the first hypothetical client, the quark compress resulted in a success, for the second, it did not and thus, for the third client, a switch was made to the cabbage treatment (step 3 of the algorithm, see Methodology section). Similarly, the cabbage treatment was initially successful but then, for the fourth hypothetical client, it failed. Therefore, from the fifth client on, each successive decision was made based on the respective current selection probability α for both the quark and cabbage interventions and a random value (step 5 of the algorithm, see Methodology section). Over the first four applications, cabbage was only successful once, whereas quark demonstrated two successful outcomes. Quark was therefore subsequently selected much more frequently than cabbage. But, over the course of the series, cabbage also had two more turns owing to the random value mechanism.

The simulated series – much like real applications of an algorithm like this – are different every time due to the very nature of chance. To show this diversity, in Figure 2 [Fig. 2], we selected another two series from the same simulation scenario as in Figure 1 [Fig. 1] (hypothetical success rates of 0.57 for quark and of 0.40 for cabbage) from among the 10,000 simulations. In both series, cabbage was selected as the starting intervention and also in both series the intervention was initially successful and then unsuccessful. The subsequent quark treatment failed in both cases which meant that, in both series, it was the turn of the cabbage treatment again, which then also failed in both cases. Despite the fact that the starting point is the same in these two series, starting with the fifth client, they differ considerably. In the upper simulation series in Figure 2 [Fig. 2], by chance, cabbage initially demonstrates many successes and is thus the intervention predominantly selected until around the 60th client. Here we can see the importance of the random value mechanism, as this means that, despite the supposed dominance of cabbage, quark is still occasionally selected. Thus, sooner or later (in this example after around 60 applications), the true dominance of quark comes to light and, in the long term, the algorithm settles on quark, irrespective of these initial experiences. In the simulation displayed in the lower diagram in Figure 2 [Fig. 2], however, precisely the opposite occurs, and, by chance, the cabbage intervention frequently fails during the early applications and thus, after being used for just a few clients, is then only rarely selected.

Mean success rates in the simulation scenarios

The aim of the examples presented in the previous section was to illustrate the various individual applications of the algorithm and show the possible different series of applications of the same scenario (same true success rates). In order to be able to draw any conclusions on the benefits of the algorithm, in this section we will now no longer examine the individual simulations but rather the mean success rates across 10,000 simulations of the same scenario.

Figure 3 [Fig. 3] shows the mean success rate of the algorithm for the scenario already used in the previous section where the success rates of both interventions are in the mid-range. The green horizontal lines indicate the hypothetical success rates used as a basis for both interventions: 0.57 for quark and 0.40 for cabbage. The blue line at 0.485 corresponds to the mean success rate if we were to treat half of the clients using one intervention and the other half using the second, in other words, alternating between quark and cabbage, for example. In this scenario, the maximum achievable mean success rate is 0.57, the success rate for quark. The dots in Figure 3 [Fig. 3] show the success rates that were actually observed for the first to the 300th client averaged across the 10,000 simulations conducted. For the first client, this mean success rate is, by definition, located on the blue line (with a value of 0.485), since quark was selected as a starting intervention in half of the cases and cabbage in the other half. Already from the second client on, the average success rate has exceeded the mean success rate of the two interventions. The mean success rates observed in this simulation scenario are dispersed along a steadily ascending curve, which, in the long term, converges on the optimum value of 0.57. The increase is initially steep but then levels out. The dispersion occurs due to random fluctuations because we “only” carry out 10,000 simulations. The more simulations conducted, the smaller the dispersion.

In other words: already starting with the second client, the algorithm can profitably use the experience acquired with the first client. Moreover, with each subsequent client, the algorithm continues to learn, meaning that the chance of success continues to increase for each subsequent client.

As in Figure 3 [Fig. 3], also in Figure 4 [Fig. 4] the mean success rates are calculated for all six simulation scenarios (see Table 1 [Tab. 1]) based on 10,000 simulations per scenario and shown for a series of 300 clients.

The two diagrams at the top in Figure 4 [Fig. 4] show the two scenarios with the low success rates for both interventions, the diagrams in the centre of the illustration show the scenarios with the mid-range success rates and the diagrams on the bottom show the scenarios with the high success rates. The diagrams on the left show the three scenarios with the biggest difference between the success rates of the two interventions (17 percentage points) and the diagrams on the right depict the scenarios with the smaller difference (5 percentage points). The learning effect of the algorithm that already starts with the second client as well as the steady increase in the mean success rate converging towards the optimum value can be observed in all the scenarios. A comparison of the diagrams on the left and on the right in Figure 4 [Fig. 4] shows that for the scenarios where there is just a small difference between the success rates of the two interventions, the dispersion is considerably larger. Particularly in the scenario with the mid-range success rates, the dispersion is so large that the actual curve is barely still recognisable. The comparison also shows that in the scenarios with the larger differences in success rates, the algorithm learns much more rapidly than in the scenarios with the smaller differences. This also intuitively makes sense: if, in reality, quark was far more successful than cabbage, this would be determined more quickly than if quark was only a slightly more successful treatment than cabbage.

A comparison of the scenarios with the low, mid-range and high success rates of the two interventions in Figure 4 [Fig. 4] shows that the learning effect is stronger in the scenarios with the low and high success rates than in those with the mid-range success rates. This is because with differences in fixed percentage points, such as those we selected for the scenarios, the relative difference is smallest in the mid percentage range. In the case of low success rates, the difference is large relative to the successes, and in the case of high success rates, relative to the failures. And these comparatively large relative differences between the success rates in the scenarios in the lower and higher percentage ranges enables the algorithm to learn more rapidly which of the two interventions is the more successful.


The answer to the question as to whether there is a helpful method to assist midwives in deciding between two equally promising interventions in their day-to-day work is thus affirmative.

There will always be areas where reviews and guidelines are (as yet) unable to present a clear advantage for one intervention over another and therefore recommend both equally. In situations like this, probability calculations such as the algorithm presented here can provide midwives with a useful decision-making tool. The specific conditions of each individual setting (clientele, how the treatment is implemented) will always be taken into account. The joint shared decision-making will not be compromised by the use of such a tool but rather enriched by it as it enables the midwife to capture her experiences in numbers. Intervention alternatives with significant differences are not to be anticipated as these would have already been apparent in a meta-analysis and incorporated into the scientific evidence. Depending on the number of cases and the experiences acquired through them, but also depending on the dominance of an intervention, the women seeking advice will primarily be offered the dominant intervention slightly later or earlier. As described, the algorithm already more frequently favours the more successful intervention from the second application on. A midwife would therefore not require as many cases in her own day-to-day practice as in the simulations shown. One of the advantages of this algorithm is that, by continuously incorporating all available experiences, it suggests an intervention. However, we are unable to identify the number of cases that the algorithm would require for which intervention and in which target group in order to determine which of the two interventions should always be recommended. As already outlined, the series of decisions for one or the other intervention and the series of successes and failures will be different for each application. This is partly down to the nature of chance because each time a different client with different requirements will be observed and it is partly due to the way different midwives carry out the treatment. These aspects could be seen as limitations of the method. The approach described is also not suitable for all issues arising in the field of midwifery because a prerequisite for the algorithm is that the success or failure of an intervention can be clearly determined relatively quickly before the next client with similar treatment needs appears. On the positive side, this approach is suitable for treatments showing only low success rates.


In line with more general endeavours to work in an evidence-based manner, for practicing midwives, a method like this could be a useful complement to the existing guidelines, reviews and study findings in cases where meta-analyses describe different intervention options as effective. The method helps midwives to select which alternative is the more effective in their day-to-day practice. A midwife will never be able to say with absolute certainty that intervention A, which, in some cases, might also mean skilled non-intervention, is fundamentally more effective than intervention B but her course of action is always based on her existing knowledge and experiences to date. This application could also be used for entire midwifery teams. The approach would establish a decision-making basis in midwifery practice, which, however, should not be seen as static but rather as a method which should be continuously adapted to the recommendations of current guidelines and reviews. For midwives and other healthcare professionals who would like to capture their experiences in a structured manner and systematically put this experience into practice, this could be a suitable tool. The method has, however, not yet been tested in practice. The readers reactions to this article will therefore be received with particular interest.


Editor’s note

At the suggestion of both reviewers, I wrote to Mr Dubben requesting a comment on the application of the method proposed by him and Mr Beck-Bornholdt. Unfortunately, however, given the short time available, this did not prove possible.


We would like to thank Carla Welch, qualified translator, for assisting with the English translation of the manuscript.

Competing interests

The authors declare that they have no competing interests.


Bayes M, Price M. An Essay towards Solving a Problem in the Doctrine of Chances. By the Late Rev. Mr. Bayes, F.R.S. Communicated by Mr. Price, in a Letter to John Canton, A.M.F.R.S. Philosophical Transactions. 1763;53:370–418. DOI: 10.1098/rstl.1763.0053 External link
Beck-Bornholdt HP, Dubben HH. Der Schein der Weisen: Irrtümer und Fehlurteile im täglichen Denken. 7th ed. Reinbeck: Rowohlt Taschenbuch Verlag; 2003. German.
Deutsche Gesellschaft für Gynaekologie und Geburtshilfe (DGGG). S3 Leitlinie: Therapie entzündlicher Brusterkrankungen in der Stillzeit; 2013. [Zugriff/access Jun 2018] Verfügbar unter/available from: German. External link
Huskins WC, Fowler VG, Evans S. Adaptive Designs for Clinical Trials: Application to Healthcare Epidemiology Research. Clin Infect Dis. 2018;66(7):1140–6. DOI: 10.1093/cid/cix907 External link
Jahanfar S, Jaafar SH. Effects of restricted caffeine intake by mother on fetal, neonatal and pregnancy outcomes. Cochrane database Syst Rev. 2015;(6):CD006965. DOI: 10.1002/14651858.CD006965.pub4 External link
Liang Y, Carriere KC. Stratified and randomized play-the-winner rule. Stat Methods Med Res. 2008;17(6):581–93. DOI: 10.1177/0962280207081606 External link
Mangesi L, Zakarija-Grkovic I. Treatments for breast engorgement during lactation. Cochrane database Syst Rev. 2016;(6):CD006946. DOI: 10.1002/14651858.CD006946.pub3 External link
Phupong V, Hanprasertpong T. Interventions for heartburn in pregnancy. Cochrane database Syst Rev. 2015;(9):CD011379. DOI: 10.1002/14651858.CD011379.pub2 External link
R Core Team. R - A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2014 [Zugriff/access Jun 2018]. Verfügbar unter/available from: External link
Spiegelhalter DJ. Incorporating Bayesian Ideas into Health-Care Evaluation. Statistical Science. 2004;19(1):156–74.
Wold HOA. A study in the analysis of stationary time series. Stockholm: Almqvist & Wiksell; 1938.
Zelen M. Play the Winner Rule and the Controlled Clinical Trial. Journal of the American Statistical Association 1969;64(325):131–46. DOI: 10.1080/01621459.1969.10500959 External link