And yet, if we "look under the hood" we will find the true nature of how statistics work.
Recently The Wall Street Journal published an article regarding the health benefits of
exercise (Landro, January 5, 2010). Most people who know me or who have read much on my
website know that I'm a strong supporter of exercise, so in choosing this article to review
and dissect, I am choosing one that has the same agenda I do: increase people's awareness
of the benefits of exercise. Therefore, there should be less bias on my part in critiquing
such an article since I support the underlying premise. In addition, I chose The Wall Street
Journal because of its reputation whereas an article from a less respected source could easily
be dismissed as "atypical."
Even though I am using an article on exercise as my example of the problems with statistics,
we could take any article on any topic and find the same problems. Once you have a better
understanding of the use of statistics try applying the concepts learned here to the global
warming controversy or to the efficacy of medications. You might discover some very
Why are the underlying numbers important? Most articles as the one I'm quoting report the
percentage of risk or the percentage of improvement but do not indicate the underlying numbers
that were used. As a result, the conclusions are based upon meaningless numbers.
For instance, the article mentioned above indicates that "studies show that exercise can
lower the risk of colon cancer by over 60%." On the surface this statement is most probably
accurate. In addition, it sounds pretty impressive. Who wouldn't want to reduce a
potentially fatal disease by over 60%? However, what does that number really mean?
Without the baseline number indicating actual risk, the claim of 60% reduction is meaningless.
For example, if the chances of having colon cancer (the following numbers are made up) were
1 out of 10 or 10%, and if our risk of colon cancer is reduced 60% by exercise we now have
a 6% chance of colon cancer. That may be considered a significant enough reduction that a
person is willing to exercise routinely for the health benefit. However, the actual annual
incidence in the U.S. (I'm rounding all numbers so they are easier to understand) of
colorectal cancer is .05% or 1 out of 2000 if we use a population figure of 300 million.
So, if we consider that exercise reduces the "risk" of colon cancer which is .05%, to
determine the actual change we multiply the risk (.05%) by the reduction (60%) and find
that the risk for exercisers is .03% or 1 out of 3333. What this means is that if you are
an exerciser the chance of colon cancer would be 1 out of 3333 instead of 1 out of 2000.
For some people, such as myself, this would be significant enough to change their behavior,
for others it would not. It depends upon what the change means to the individual. For
instance, if these numbers were instead reported for non-chocolate eaters (in other words,
I would have to give up chocolate) rather than exercisers, I would probably consider the
reduction in my quality of life to not be worth the reduction of risk. My point is that we
need to know the underlying numbers to make an informed decision instead of relying on someone
else to tell us what the numbers mean and what we "should" do.
Certainly, the purpose of this article is not to give people another excuse not to exercise.
Therefore, let's use a different example that has different underlying numbers. The Wall
Street Journal (January 5, 2010) indicates that exercise "reduces the incidence of diabetes
by approximately 50%." You may think, "That's not even as good as the 60% reduction for
colon cancer and she just showed that my chances of colon cancer are not high." But wait
a minute! We need to look at the underlying numbers for diabetes and we will see an
entirely different picture.
The annual incidence of diabetics in the U.S. is 1 out of 340 or .29% or 798,000 people
develop Type 2 diabetes each year. Therefore, to determine the difference regarding how
exercise impacts the development of diabetes, we again (as we did with colon cancer)
multiply the risk (.29%) by the reduction (50%) and find that the risk for exercisers is
.17% or 1 out of 588 (using a U.S. population figure of 275,000,000 as the source did).
Now this may be much more significant for many people. It certainly was for me. I was at
risk for developing diabetes 15 years ago because I already had an insulin-resistant disorder
and a family history of diabetes. However, I didn't like the idea of giving up sweets,
especially chocolate, for the rest of my life so I decided to lose weight and exercise.
As a result, my blood sugars, blood pressure, and cholesterol are all perfect and I can
still eat chocolate (within reason). Having information regarding the effect of exercise
on the development of diabetes provided me with a method to gain greater control over my health.
If this hasn't convinced you of both the importance of the underlying numbers for
understanding statistics as well as the importance of exercise, read the section below
under definition of incidence regarding hypertension.
Failure to clearly define terms can lead to inaccuracies in the reporting of statistics.
Frequently, by the time statistics are reported to the public the writer is not the
statistician who compiled the statistics but typically a writer who has little or no
training on the interpretation of statistics or understanding of mathematics. Therefore,
although statisticians may define terms precisely, the terms used in many articles are
imprecise or the same terms are used inaccurately. For instance, to understand the basics
of statistics one must understand the difference between prevalence and incidence of diseases.
is the number of cases in a population during a particular
time period typically the
lifetime. Although prevalence can be any specified time period, most popular media use
the concept of "lifetime prevalence" although they frequently only state "prevalence" and
we must assume they are referring to lifetime prevalence.
However, it does make a difference what number they are reporting. For example,
the one-year prevalence of anxiety disorders in the adult population, meaning the
number of cases that are present in a typical year, is 18% (Kessler, Chiu, et al,
2005) whereas the lifetime prevalence for anxiety disorders in the adult is almost
29% (Kessler, Berlund, et al, 2005).
So, if I was trying to make a point to a college class about how common anxiety
disorders are I could say, "Look around you. One out of every five people you see
in this room have an anxiety disorder" because I'm using the concept of one-year
prevalence and since few anxiety disorders are resolved in less than a year, I'm
referring to how many people currently present in the room have an anxiety disorder.
However, I could also make a similar, but different, point if I was to say to a
client with an anxiety disorder "You are not unusual. Over the course of their
lifetime, almost 1 in 3 adults will suffer with an anxiety disorder." These comments
are based on different statistics: one-year prevalence versus lifetime prevalence.
is the number of new cases that occur in a population during
a specified period which is typically reported as annual incidence. Incidence is usually
reported as a rate which is the number of people who developed the disease during the period
divided by the population. So, if we are looking at the incidence of diabetes reported above
we can obtain the U.S. rate of .29% by dividing 798,000 (number of annual cases) by 275,000,000
(the population of the U.S. at the time the statistics were obtained). Incidence statistics
tend to be used to indicate the "risk" of developing a disease.
A great deal of confusion occurs between the terms of "prevalence" and "incidence" and,
unfortunately, many writers of publicly disseminated information use these terms
interchangeably. For example, when I searched the web for the incidence of hypertension
I found the following statement on a site called "Up to Date Online" providing medical
information to the public:
"NHANES data from 1999-2000 and United States Census bureau information demonstrated a
29 to 31 percent incidence (italics mine) of hypertension in the 18 year and older
population of the United States (www.utdol.com)."
Knowing the correct definition of "incidence" I would translate this statement to mean
that there are 29 to 31 percent new cases of hypertension every year. You can see how
ridiculous that is because it would mean the entire population of the U.S. would have
high-blood pressure within 4 years! I don't know the quality of the site from which I
obtained this data, however, I do know that they are using the term "incidence"
inappropriately which causes me to question other data they provide. This common
error of confusing the two also tends to create further inaccuracies in the reporting
of statistics. If I wasn't a savvy reader I could easily pass on their incidence data
to other unsuspecting readers contributing to the confusion among the public.
The Wall Street Journal reported that exercise reduces the incidence of high-blood
pressure by 40%. Even though articles published on websites used the term "incidence,"
I had trouble finding true incidence figures because most information about high-blood
pressure focuses upon prevalence statistics. This is due to the difference between a
chronic disorder such as hypertension and an acute disorder such as the flu. Annual
incidence is a more important statistic for the flu because the prevalence statistic
would not be meaningful as people only have the flu for a short period of time.
However, the Wall Street Journal article reported the risk or incidence of developing
hypertension for exercisers and non-exercisers so I needed to use the same type of
statistic in order to examine the meaning of the percentages. Therefore, incidence
is 1 out of 31 (extrapolating from Canadian statistics: Tu, 2008) or over 3% of population
of the adult population. Again, by multiplying risk (3%) by reduction (40%) we obtain an
incidence of approximately 1% for exercisers which would be equivalent to 1 out of 100.
Now that might be considered a substantial difference by most people's standards. In
addition to that, since hypertension is a chronic disorder, if everyone in the U.S.
exercised, the number of people with hypertension would be reduced over a 5-year
period by 14 million people!
Sometimes agendas are beneficent and sometimes they are self-serving, but agendas always exist.
Therefore, to fully evaluate the statistics, the agenda of the reporter needs to be considered.
Usually the primary source of data, the scientific article, is fairly free from bias as it
typically needs to provide all the underlying numbers and to be clear and precise with the
interpretation of these numbers to be eligible for publishing in a scientific journal.
However, the further away from the primary source and the more times the results have been
paraphrased, the more potential for inaccuracies exist.
So, understanding the agenda of the reporter can help with interpreting the data. However,
certain assumptions must be made about agendas. For instance, I assume that the agenda for
the journalist who wrote the article for The Wall Street Journal article described above is
to provide an interesting, accurate article that will inform the public about an important
topic. However, what is the agenda of her source for the statistics she provided? One
source mentioned in the sidebar to the article (from which the percentage statistics
discussed above were drawn) is the American College of Sports Medicine (ACSM). A search
on the internet shows the ACSM to be a professional society providing basic and applied
exercise science conferences, meetings and workshops. I certainly to not want to denigrate
an organization I know little about, but it is possible that their agenda is to convince
others of the importance of exercise so as to obtain paid registrants to their conferences
and workshops? If that were the case, they may be more likely to present data in a
positively skewed format such as using percentages instead of providing the underlying
numbers. Although another possibility is that they did provide the underlying numbers
but the newspaper didn't use them because extreme numbers are more likely to sell newspapers.
I'm not saying that I know this is the case in this situation but it is part of what we need
to consider when determining other's agendas.
To determine agendas we want to speculate as to what benefit the individual or the
organization obtains from providing the information. This allows us to determine why
we might find certain inaccuracies in the information provided. For instance, a statement
in the sidebar of The Wall Street Journal Article states that according to the ACSM exercise
can "decrease depression as effectively as Prozac or behavioral therapy." As a behavioral
therapist myself, I am puzzled by this statement because I know that behavioral therapy is
about changing behaviors which would include the behavior of exercise. So, of course
exercise is as effective as behavioral therapy at decreasing depression because exercise
is one of the tools of behavioral therapy. What I don't know is what study this statement
was derived from: was it research examining the difference between medication, therapist-aided
behavioral therapy, and exercise alone? In which case, I would also want to know how the
subjects were motivated to exercise: were they self-motivated or did the researcher ask them
to exercise? If they were asked to exercise, how was that condition of the research
different from the therapist-aided behavioral therapy? I could continue with this line
of questioning but I just want to give you an idea of some of the questions that can be
considered when evaluating data that is provided to the public. For all I know, the source
of this statement may have an anti-therapy bias, which agenda may bias the overall presentation
of the statistics.
Although there are many other issues to be considered when evaluating the statistics that
are presented by the national media, I hope that I have provided you with an understanding
of some of the most critical issues to evaluate as you read or hear statistics that are
presented. In another article I intend to use the concepts presented here to evaluate the
efficacy of medications and how to determine whether the benefits outweigh the side effects.
* "Figures often beguile me, particularly when I have the arranging of them myself; in which
case the remark attributed to Disraeli would often apply with justice and force: 'There are three
kinds of lies: lies, damned lies and statistics (Benjamin Disraeli).'
- Mark Twain's Own Autobiography: The Chapters from the North American Review
Lifetime Prevalence and Age-of-Onset Distributions of DSM-IV Disorders in the National
Comorbidity Survey Replication. Archives of General Psychiatry, 62, 593-602.
Kessler R.C., Chiu W.T., Demler O., Walters E.E. (2005). Prevalence, severity, and comorbidity
of twelve-month DSM-IV disorders in the National Comorbidity Survey Replication (NCS-R).
Archives of General Psychiatry, 62, 617-27.
Landro, L. (2010, Jan. 5). The Hidden Benefits of Exercise. The Wall Street Jounal.
New York: Dow Jones & Company.
Tu, K., Chen, Z., & Libscombe, L.L. for the Canadian Hypertension Education Program
Outcomes Research Taskforce (2008). Prevalence and incidence of hypertension from 1995
to 2005: a population-based study. Canadian Medical Association Journal, 178, 1429-1435.
Copyright © 2010 by Excel At Life, LLC
Permission to reprint this article for non-commercial use is granted if it includes this entire copyright
and an active link.
Not a day goes by when I don't throw down the morning newspaper complaining about the use of
statistics in an article. In our world the media liberally sprinkles statistics throughout
articles and television programs to support a point of view. The problem, however, is that
statistics are frequently misleading if not outright inaccurate. Without a clear understanding
of the nature of statistics and the definitions of statistical terms, the public believes the
statistic-supported statements as if they are fact. In addition, without understanding the
agenda of the journalist or analyst using the statistics, the public accepts these "facts"