The Science of Medicine

ByByron J. Hoogwerf, MD, Cleveland Clinic

Reviewed/Revised Aug 2021 | Modified Sep 2022

VIEW PROFESSIONAL VERSION

How Doctors Try to Learn What Works|

Doctors have been treating people for many thousands of years. The earliest written description of medical treatment is from ancient Egypt and is over 3,500 years old. Even before that, healers and shamans were likely providing herbal and other remedies to the ill and injured. A few remedies, such as those used for some simple fractures and minor injuries, were effective. However, until recently, many medical treatments did not work and some were actually harmful.

There were many reasons why doctors recommended ineffective (and sometimes harmful) treatments and why people accepted them:

Typically, there were no effective alternative treatments.
Doctors and sick people often prefer doing something to doing nothing.
People are comforted by turning problems over to an authority figure.
Doctors often provide much-needed support and reassurance.

Most importantly, however, doctors could not tell which treatments worked.

Treatment and recovery: Cause and effect?

If one event comes immediately before another, people naturally assume the first is the cause of the second. For example, if a person pushes an unmarked button on a wall and a nearby elevator door opens, the person naturally assumes that the button controls the elevator. The ability to make such connections between events is a key part of human intelligence and is responsible for much of our understanding of the world. However, people often see causal connections where none exist. That is why athletes might continue to wear the "lucky" socks they had on when they won a big game, or a student might insist on using the same "lucky" pencil to take exams.

This way of thinking is also why some ineffective medical treatments were thought to work. For example, if an ill person’s fever broke after the doctor drained a pint of blood or the shaman chanted a certain spell, then people naturally assumed those actions must have been what caused the fever to break. To the person desperately seeking relief, getting better was all the proof necessary. Unfortunately, the apparent cause-and-effect relationships observed in early medicine were rarely correct, but belief in them was enough to perpetuate centuries of ineffective remedies. How could this have happened?

People get better spontaneously. Unlike “sick” inanimate objects (such as a broken axe or a torn shirt), which remain damaged until repaired by someone, sick people often get well on their own (or despite their doctor’s care) if the body heals itself or the disease runs its course. Colds are gone in a week, migraine headaches typically last a day or two, and food poisoning symptoms may stop in 12 hours. Many people even recover from life-threatening disorders, such as a heart attack or pneumonia, without treatment. Symptoms of chronic diseases (such as asthma or sickle cell disease) come and go. Thus, many treatments may seem to be effective if given enough time, and any treatment given near the time of spontaneous recovery may seem dramatically effective.

The placebo effect may be responsible. Belief in the power of treatment is often enough to make people feel better. Although belief cannot cause an underlying disorder, such as a broken bone or diabetes, to disappear, people who believe they are receiving a strong, effective treatment very often feel better. Pain, nausea, weakness, and many other symptoms can diminish even if a pill contains no active ingredients and can be of no possible benefit, such as a "sugar pill" (termed a placebo). What counts is the belief.

An ineffective (or even harmful) treatment prescribed by a confident doctor to a trusting, hopeful person often results in remarkable improvement of symptoms. This improvement is termed the placebo effect. Thus, people might experience an actual (not simply perceived) benefit from a treatment that has had no obvious effect on the disease itself. Current research suggests there is a biologic basis for the placebo effect in some disorders, even though that effect is not targeting the actual disease.

Why does it matter? Some people argue that the only important thing is whether a treatment makes people feel better. It does not matter whether the treatment actually “works,” that is, affects the underlying disease. This argument may be reasonable when the symptom is the problem, such as in many day-to-day aches and pains, or in illnesses such as colds, which generally go away on their own. In such cases, doctors do sometimes prescribe treatments that have little effect on the disease and, instead, may at least in part relieve symptoms due to the placebo effect. However, in any dangerous or potentially serious disorder, or when the treatment itself may cause side effects, it is important for doctors to only prescribe a treatment that really does work. A treatment's potential benefits must be balanced against its potential harms. For example, drugs with many side effects may be worth taking for people with life-threatening diseases, such as cancer. Some cancer drugs may cause serious damage, such as to the kidneys or heart, but these risks are often acceptable because the alternative to the drugs (the effects of untreated cancers), is likely worse than the drug side effects

How Doctors Try to Learn What Works

Because some doctors realized long ago that people can get better on their own, they naturally tried to compare how different people with the same disease fared with or without treatment. However, until the middle of the 19th century, it was very difficult to make this comparison. Diseases were so poorly understood that it was difficult to tell when two or more people, even with similar symptoms, had the same disease.

Doctors using a given term were often talking about different diseases entirely. For example, in the 18th and 19th centuries, the diagnosis of “dropsy” was given to people whose legs were swollen. We now know that swelling can result from heart failure, kidney failure, or severe liver disease—quite different diseases that do not respond to the same treatments. Similarly, numerous people who had fever and who were also vomiting were diagnosed with “bilious fever.” We now know that many different diseases cause fever and vomiting, such as typhoid, malaria, appendicitis, and hepatitis.

Only when accurate, scientifically based diagnoses became common around the beginning of the 20th century could doctors begin to effectively evaluate treatments. However, they still had to determine how to best evaluate a treatment.

Sample size

First of all, doctors realized they had to look at more than one sick person's response to treatment. One or two people getting better (or sicker) might be a coincidence. Achieving good results in many people is less likely due to coincidence. The larger the number of people treated (sample size), the more likely any observed benefit or side effect is real.

Control groups

Even if doctors find a good response to a new treatment in a large group of people, they still do not know whether the same number of people (or more) would have gotten well on their own or done even better with a different treatment. Thus, doctors typically compare results between a group of people who receive a study treatment (treatment group) and another group (control group) who receive

An older treatment
Dummy treatment (a placebo, such as a sugar pill)
No treatment at all

Studies that involve a control group are called controlled studies.

Time frame

At first, doctors simply gave all their patients with a certain illness a new treatment and then compared their results to a control group of people treated at an earlier time (either by the same or different doctors). The previously treated people are considered a historical control group. For example, if doctors found that 80% of their patients survived malaria after receiving a new treatment, whereas previously only 60% had survived, then they might conclude that this new treatment was more effective.

A problem with making comparisons to results from an earlier time is that advances in general medical care in the time between the old and the new treatments may be responsible for any improvement in outcome. For example, it is not appropriate to compare the results of people treated in 2021 with those treated in 1971. In one example, peptic ulcer disease was originally treated with a milk and cream diet or surgery, then with drugs that block acid, and more recently with antibiotics. Comparisons of treatments used over time need to consider the changes in understanding the disease process.

Prospective studies can help avoid the problems with historical control groups. In prospective studies, doctors try to create treatment groups and control groups at the same time and observe the results of the treatment as they unfold. Relevant characteristics of the people in the treatment and control groups should be similar. For example, if the outcome being studied is death resulting from cancer or heart disease, the ages of the people in each group should be similar because these diseases are more common in older people.

Comparing apples to apples

The biggest concern with all types of medical studies, including historical studies, is that similar groups of people should be compared.

In the first example of a historical control, if the group of people who received the new treatment (treatment group) for malaria was made up of mostly young people who had mild disease, and the previously treated (control) group was made up of older people who had severe disease, it might well be that people in the treatment group fared better simply because they were younger and healthier. Thus, a new treatment could falsely appear to work better.

Many other factors besides age and severity of illness also must be taken into account, such as

The overall health of people being studied (people with chronic diseases such as diabetes or kidney failure tend to fare worse than healthier people)
The specific doctor and hospital providing care (some may be more skilled and have better facilities than others)
The percentages of men and women that comprise the study groups (men and women may respond differently to treatment)
Whether the study included a diverse population (treatments need to be safe and work well in people who have different characteristics, such as different ethnicities, geographic locations, or socioeconomic status) because treatments may work more effectively in certain of those groups

Doctors have tried many different methods to ensure that the groups being compared are as similar as possible, but there are two main approaches:

Case-control studies: Precisely pairing people who receive the new treatment (cases) with those who do not (controls) based on as many factors as possible (age, gender, health, and so forth) and using statistical techniques to help insure comparability among the groups
Randomized trials: Randomly assigning people to each of the study groups before beginning the study

Case-control studies seem sensible. For example, if a doctor is studying a new treatment for high blood pressure (hypertension), and one person in the treatment group is 42 years old and has diabetes, then the doctor would try to ensure the placement of a 40-some-year-old person with hypertension and diabetes in the control group. However, there are so many differences among people, including differences that the doctor does not even think of, that it is nearly impossible to intentionally create an exact match for each person in a study.

Randomized trials reduce the risk of differences between groups affecting the study results using a completely different approach. The best way to ensure a match between groups is to take advantage of the laws of probability and randomly assign (typically with the aid of a computer program) people who have the same disease to different groups. Comparability of the groups is more likely if the groups are matched using known variables such as age, gender, and the presence of other diseases. However, one uniquely important advantage of randomization is that any factors that affect the study result but are unknown (and thus cannot be matched among groups) are likely to be randomly distributed among the participants and groups. The larger the size of each group, the greater the odds are that people in each group will have similar characteristics.

Prospective, randomized studies are the best way to make sure that a treatment or test is being compared between equivalent groups.

Eliminating other factors

Once doctors have created equivalent groups, they must make sure that the only difference they allow is the study treatment itself. That way, doctors can be sure that any difference in outcome is due to the treatment and not to some other factor, such as the quality or frequency of follow-up care.

The placebo effect is another important factor. People who know they are receiving an actual, new treatment rather than no treatment (or an older, presumably less effective treatment) often expect to feel better. Some people, on the other hand, may expect to experience more side effects from a new, experimental treatment. In either case, these expectations can exaggerate the effects of treatment, causing it to seem more effective or to have more complications than it really does.

Blinding, also called masking, is a technique used to reduce the problems of the placebo effect. There are two general types of blinding: single and double.

Single blinding is when the people in a study must not know whether they are receiving a new treatment. That is, they are “blinded” to this information. Blinding is usually accomplished by giving people in the control group an identical-appearing substance, usually a placebo—something with no medical effect. In single-blinded studies, the study personnel know the treatment assignment, but the participants do not.
Double blinding is when both the participants in a study and the study personnel do not know which study participants are receiving a new treatment or a placebo. Because the doctor or nurse might accidentally let a person know what treatment they are receiving, and thus "unblind" the person, it is better if all involved health care practitioners remain unaware of what is being given. Another reason for double blinding is that the placebo effect can affect even the doctor, who may unconsciously think a person receiving treatment is doing better than a person receiving no treatment, even if both are faring exactly the same. Double blinding usually requires that a person separate from the study, such as a pharmacist, prepare identical-appearing substances that are labeled only by a special number code. The number code is broken only after the study is completed.

Not all medical studies can be double-blinded. For example, surgeons studying two different surgical procedures obviously know which procedure they are performing (although the people undergoing the procedures can be kept unaware). In such cases, doctors make sure that the people evaluating the outcome of treatment are blinded as to what has been done so they cannot unconsciously bias the results.

When an effective treatment for a disease already exists, it may be unethical to give the control group only a placebo. In those situations, treatments can still often be evaluated using other study designs, as in the following examples:

To determine whether a new treatment adds to the effectiveness of a standard treatment, a study can compare results using the standard treatment plus either the new investigational treatment or a placebo.
To compare a new treatment known to be effective with the standard treatment, a study can compare results using the new treatment with those using the standard treatment. If necessary to maintain blinding, placebos can be added to both treatment groups.

In each of the approaches, the substances for each of the treatments must appear identical to the participants and, if a double-blinded study, to the study personnel. If the treatment group receives a red, bitter liquid, then the control group should also receive a red, bitter liquid. If the treatment group receives a clear solution given by injection, then the control group should receive a similar injection.

Choosing a clinical trial design

The best type of clinical trial incorporates all of the above elements, such that they are

Prospective, meaning treatment and control groups are enrolled in a study before it begins and they are followed over time
Randomized, meaning people in the trial are randomly split between assigned treatment groups
Placebo controlled, meaning that some people in the trial receive a placebo (an inactive treatment)
Double blinded, meaning neither the people in the trial nor those conducting the trial know who is receiving treatment and who is receiving a placebo

This design allows for the clearest determination of the effectiveness of a treatment. However, in some situations, this trial design may not be possible. For example, with very rare diseases, it is often hard to find enough people for a randomized trial. In those situations, retrospective case-control trials may be conducted.

Diversity

For the trial results to be applicable to the real world, trial participants should represent the entire population who has the disease under investigation, including across applicable ages, genders, races, ethnicities, socioeconomic status, and lifestyles. A more precise comparison of apples to apples is often made easier by restricting study participants to particular groups. However, the clinical trials whose results are most applicable to the entire population recruit a diverse pool of participants. In the United States, for example, racial and ethnic minorities make up almost 40% of the population. A study that lacks such diversity could miss some important factors. For some drugs, the race and genetic background of a person may influence the effectiveness of that drug. For example, a deficiency of the enzyme G6PD is more common in men of African, Asian, or Mediterranean descent, and certain drugs can trigger hemolytic anemia in people with G6PD deficiency. By including people from diverse backgrounds, clinical trials can show if the treatments are safe and work well for people from different groups. Still, factors such as socioeconomic status, literacy level, access to transportation, and proximity to the study site can make it difficult to recruit a diverse enough population.

Test your KnowledgeTake a Quiz!