statistics Archives - The Psychology Lounge ™The Psychology Lounge ™

I’ve written before about common mistakes in interpreting medical research in my blog post How to Read Media Coverage of Scientific Research: Sorting out the Stupid Science from Smart Science. I recently read a very interesting post by Gary Schwitzer about the most common mistakes that journalists make when reporting health and medical findings.

The three mistakes that he discusses:

1.      Absolute versus relative risk/benefit data

“Many stories use relative risk reduction or benefit estimates without providing the absolute data. So, in other words, a drug is said to reduce the risk of hip fracture by 50% (relative risk reduction), without ever explaining that it’s a reduction from 2 fractures in 100 untreated women down to 1 fracture in 100 treated women. Yes, that’s 50%, but in order to understand the true scope of the potential benefit, people need to know that it’s only a 1% absolute risk reduction (and that all the other 99 who didn’t benefit still had to pay and still ran the risk of side effects).

2.      Association does not equal causation

A second key observation is that journalists often fail to explain the inherent limitations in observational studies – especially that they cannot establish cause and effect. They can point to a strong statistical association but they can’t prove that A causes B, or that if you do A you’ll be protected from B. But over and over we see news stories suggesting causal links. They use active verbs in inaccurately suggesting established benefits.

3.      How we discuss screening tests

The third recurring problem I see in health news stories involves screening tests. … “Screening,” I believe, should only be used to refer to looking for problems in people who don’t have signs or symptoms or a family history. So it’s like going into Yankee Stadium filled with 50,000 people about whom you know very little and looking for disease in all of them. … I have heard women with breast cancer argue, for example, that mammograms saved their lives because they were found to have cancer just as their mothers did. I think that using “screening” in this context distorts the discussion because such a woman was obviously at higher risk because of her family history. She’s not just one of the 50,000 in the general population in the stadium. There were special reasons to look more closely in her. There may not be reasons to look more closely in the 49,999 others.”

Let’s discuss each of these in a little bit more depth. The first mistake is probably the most common one, where statistically significant findings are not put into clinical perspective. Let me explain. Suppose we are looking at a drug that prevents a rare illness. The base rate of this illness, which we will call Catachexia is 4 in 10,000 people. The drug reduces this illness to one in 10,000 people, a 75% decrease. Sounds good, right?

Not so fast. Let me add a few facts to this hypothetical case. Let’s imagine that the drug costs $10,000 a year, and also has some bad side effects. So in order to reduce the incidence from four people to one person in ten thousand, 9996 people who would never develop this rare but serious illness must be treated. The cost of doing so would be $99,960,000! Plus those 9996 people would be unnecessarily exposed to side effects.

So which headline sounds better to you?

New Drug Prevents 75% of Catachexia Cases!

New Drug Lowers the Prevalence of Catachexia Cases by Three People per 10,000, at a Cost of Almost $100 Million Dollars

The first headline reflects a reporting of the relative risk reduction, without cost data, and the second headline reflects the absolute risk reduction, and the costs. The second headline is the only one that should be reported but unfortunately the first headline is much more typical in science and medical reporting.

The second error where association or correlation does not equal causation is terribly common as well. The best example of this is all of the studies looking at the health effects of coffee. Almost every week we get a different study that claims either a health benefit of coffee or a negative health impact of coffee. These findings are typically reported in the active tense such as, “drinking coffee makes you smarter.”

So which headline sounds better to you?

Drinking Coffee Makes You Smarter

Smarter People Drink More Coffee

Scientists Find a Relatively Weak Association between Intelligence Levels and Coffee Consumption

Of course the first headline is the one that will get reported, even though the second headline is equally inaccurate. Only the third headline accurately reports the findings.

The theoretical problem with any correlational study of two different variables is that we never know, nor can we ever know, what intervening variables might be correlated with each. Let me give you a classic example. There is a high correlation between the consumption of ice cream in Iowa and the death rate in Mumbai, India. This sounds pretty disturbing. Maybe those people in Iowa should stop eating ice cream. But of course the intervening variable here is summer heat. When the temperature goes up in Iowa people eat more ice cream. And when the temperature goes up in India, people are more likely to die.

The only way that one could actually verify a correlational finding would be to do a follow-up study where you randomly assign people to either consume or not consume the substance or food that you wish to test. The problem with this is that you would have to get coffee drinkers to agree not to drink coffee and non-coffee drinkers to agree to drink coffee, for example, which might be very difficult. But if you can do this with coffee, chocolate, broccoli, exercise, etc. then at least you could demonstrate a real causal effect. (I’ve oversimplified some of the complexity of controlled random assignment studies, but my point stands.)

The final distortion which involves confusion about screening tests is also very common, and unfortunately, incredibly complex. The main point that Schwitzer is trying to make here though is simple; screening tests are only those tests which are applied to a general population which is not at high risk for any illness. Evaluating the usefulness of screening tests must be done in the context of a low risk population, because that is how most screening tests are used. Most people don’t get colon cancer, breast cancer, or prostate cancer, even over 50. If you use a screening test only with high-risk individuals then it’s not really a screening test.

There is the whole other issue with reporting on screening tests that I’m only going to briefly mention because it’s so complicated and so controversial. It’s that many screening tests may do as much harm as good. Recently there has been a lot of discussion of screening for cancer, especially prostate and breast cancer. The dilemma with screening tests is that once you find cancer you almost always are obligated to treat it because of medical malpractice issues and psychological issues (“Get that cancer out of me!”) The problem with this automatic treatment is that current screening doesn’t distinguish between fast-growing dangerous tumors and very slow growing indolent tumors. Thus we may apply treatments which have considerable side effects or even mortality to tumors that would never harm the person.

Another problem is that screening often misses the onset of fast-growing dangerous tumors because they begin to grow between the screening tests. The bottom line is that screening for breast cancer and prostate cancer may have relatively little impact on the only statistic that counts – the cancer death rate. If we had screening tests that could distinguish between relatively harmless tumors and dangerous tumors then screening might be more helpful, but that is not where we are yet.

One more headline test. Which headline do you prefer?

Screening for Prostate Cancer Leads to Detection and Cure of Prostate Cancer

Screening for Prostate Cancer Leads to Impotence and Incontinence in Many Men Who Would Never Die from Prostate Cancer

The first headline is the one that will get reported even though the second headline is scientifically more accurate.

I suggest that every time you see a health or medicine headline that you rewrite it in a more accurate way after you read the article. Remember to use absolute differences rather than relative differences, to report association instead of causation, and add in the side effects and costs of any suggested treatment or screening test. This will give you practice in reading health and medical research accurately.

Also remember the most important rule, one small study does not mean anything. It’s actually quite humorous how the media will seize upon a study, even though the study was based on 20 people and hasn’t been replicated or repeated by anybody. They also typically fail to put into context the results of one study versus other studies of the same thing. A great example is eggs and type II diabetes. The same researcher, Luc Djousse, published a study in 2008 (link) that showed a strong relationship between the consumption of eggs and the occurrence of type II diabetes, but then in 2010 published another study finding absolutely no correlation whatsoever. Always be very skeptical, and most often you will be right.

I’m off to go make a nice vegetarian omelette…

I read a lot. One of my favorite online magazines is Slate.com. It is a wide-ranging online mag that covers politics, news, the arts, business, and science. I was reading the other night and noticed an article by the writer Will Saletan that was looking at some scientific research on “Gaydar”. Gaydar is the supposed ability to discern whether a person is homosexual simply by looking at them.

In the original article, Saletan quoted research by Nicholas Rule, Nalini Ambady, Reginald Adams Jr., and Neil Macrae at Tufts University. The researchers took personal ad photos from gay and straight men, and then had college students look at them to rate whether they were straight or gay. For some reason the researchers chose to use correlation coefficients or R scores to report their data. The highest R scores were 0.31, which in the original version of the article Saletan incorrectly stated was the equivalent of an accuracy rate of 65%. I’m not sure where he got the 65% number, but I immediately recognized that this was a mistake. An R score, when squared, represents the percentage of the variance being explained. So squaring an R score of 0.31 means that roughly 9% of the variance has been explained. That means that 91% of the variance in the dependent variable is still unexplained.

In the original article Saletan had called these experiments “impressive”. Given the tiny bit of variance explained by even the strongest of the experiments, I would call them less than impressive. And given the subject of the experiment, I would actually call them “oppressive”. This is a great example of taking extremely weak scientific findings and spinning them into something approaching meaningfulness. There are so many alternate explanations for why tiny findings could have happened that do not require any assumption of accurate “gaydar”.

I wrote a comment on the article explaining the mistake. To the credit of Saletan (and Slate magazine), they noticed and read my comment on the inaccurate reporting of statistical findings, and after an e-mail correspondence with me regarding the accurate interpretation of the statistics, posted a revised version of the article. That’s honest and impressive. It also shows that it’s worth writing comments on online articles, and that writers read the comments.

I still think the original research doesn’t merit even the corrected coverage that Slate gave it, but at least the science is accurately reported. Of course, the biggest flaw in the research was that they were only looking at photos of gay men who were openly gay, and the article really is about can you tell if a man is secretly gay. So the bottom line is that even if the researchers had done better research, it still wouldn’t answer the original question of the article.

I should add that I question the use of science to pursue questions that tread dangerously close to prejudice and stereotyping. But we live in a free country, and scientists have every right to do research on any topic they choose. I’m just not sure that the National Science Foundation should be funding such research. In any case, I was glad to be able to correct misinterpretations of the statistical results of the study.

Notes:

The original version of the article is in Google’s cache, here, at least for now. (Google updated the page, so now it’s the same as the corrected page.)

The corrected version of the article is here.

The research that the article is based on is here.

The Psychology Lounge ™

by Dr. Andrew Gottlieb (650) 324-2666

Tag Archives: statistics

How Reporters Screw up Health and Medical Reporting (and How You Can Catch Them Doing So)

Bad Science, Reported Badly, and Then Corrected Thanks to Your Intrepid Blogger!