Lies, Damned Lies, and Statistics

- Home
- Contact
- About
- Terms of Use
- Privacy Policy
- Comments Policy
- On the Web
- Referrals

Popular Articles

Crazy-Makers: Dealing with Passive-Aggressive People

Why Are People Mean? Don't Take It Personally!

When You Have Been Betrayed

Struggling to Forgive: An Inability to Grieve

Happy Habits: 50 Suggestions

The Secret of Happiness: Let It Find You (But Make the Effort)

Excellence vs. Perfection

Depression is Not Sadness

20 Steps to Better Self-Esteem

7 Rules and 8 Methods for Responding to Passive-aggressive People

What to Do When Your Jealousy Threatens to Destroy Your Marriage

Happiness is An Attitude

Guide to How to Set Achieveable Goals

Catastrophe? Or Inconvenience?

Popular Audios

Relaxation for Children

Loving Kindness Meditation

Self-Esteem Exercise

Lies You Were Told

Choosing Happiness

Audio Version of Article: Crazy-Makers: Passive-Aggressive People

Audio Version of Article: Why Are People Mean? Don't Take It Personally!

Audio Version of Article: Happiness Is An Attitude

All Audio Articles

	Lies, Damned Lies, and Statistics by Monica A. Frank, Ph.D.

"Sometimes agendas are beneficent and sometimes they are self-serving, but agendas always exist. Therefore, to fully evaluate the statistics, the agenda of the reporter needs to be considered."

Not a day goes by when I don't throw down the morning newspaper complaining about the use of statistics in an article. In our world the media liberally sprinkles statistics throughout articles and television programs to support a point of view. The problem, however, is that statistics are frequently misleading if not outright inaccurate. Without a clear understanding of the nature of statistics and the definitions of statistical terms, the public believes the statistic-supported statements as if they are fact. In addition, without understanding the agenda of the journalist or analyst using the statistics, the public accepts these "facts" uncritically.

And yet, if we "look under the hood" we will find the true nature of how statistics work. Recently The Wall Street Journal published an article regarding the health benefits of exercise (Landro, January 5, 2010). Most people who know me or who have read much on my website know that I'm a strong supporter of exercise, so in choosing this article to review and dissect, I am choosing one that has the same agenda I do: increase people's awareness of the benefits of exercise. Therefore, there should be less bias on my part in critiquing such an article since I support the underlying premise. In addition, I chose The Wall Street Journal because of its reputation whereas an article from a less respected source could easily be dismissed as "atypical."

Even though I am using an article on exercise as my example of the problems with statistics, we could take any article on any topic and find the same problems. Once you have a better understanding of the use of statistics try applying the concepts learned here to the global warming controversy or to the efficacy of medications. You might discover some very interesting "facts."

Why are the underlying numbers important? Most articles as the one I'm quoting report the percentage of risk or the percentage of improvement but do not indicate the underlying numbers that were used. As a result, the conclusions are based upon meaningless numbers.

For instance, the article mentioned above indicates that "studies show that exercise can lower the risk of colon cancer by over 60%." On the surface this statement is most probably accurate. In addition, it sounds pretty impressive. Who wouldn't want to reduce a potentially fatal disease by over 60%? However, what does that number really mean? Without the baseline number indicating actual risk, the claim of 60% reduction is meaningless. For example, if the chances of having colon cancer (the following numbers are made up) were 1 out of 10 or 10%, and if our risk of colon cancer is reduced 60% by exercise we now have a 6% chance of colon cancer. That may be considered a significant enough reduction that a person is willing to exercise routinely for the health benefit. However, the actual annual incidence in the U.S. (I'm rounding all numbers so they are easier to understand) of colorectal cancer is .05% or 1 out of 2000 if we use a population figure of 300 million. So, if we consider that exercise reduces the "risk" of colon cancer which is .05%, to determine the actual change we multiply the risk (.05%) by the reduction (60%) and find that the risk for exercisers is .03% or 1 out of 3333. What this means is that if you are an exerciser the chance of colon cancer would be 1 out of 3333 instead of 1 out of 2000.

For some people, such as myself, this would be significant enough to change their behavior, for others it would not. It depends upon what the change means to the individual. For instance, if these numbers were instead reported for non-chocolate eaters (in other words, I would have to give up chocolate) rather than exercisers, I would probably consider the reduction in my quality of life to not be worth the reduction of risk. My point is that we need to know the underlying numbers to make an informed decision instead of relying on someone else to tell us what the numbers mean and what we "should" do.

Certainly, the purpose of this article is not to give people another excuse not to exercise. Therefore, let's use a different example that has different underlying numbers. The Wall Street Journal (January 5, 2010) indicates that exercise "reduces the incidence of diabetes by approximately 50%." You may think, "That's not even as good as the 60% reduction for colon cancer and she just showed that my chances of colon cancer are not high." But wait a minute! We need to look at the underlying numbers for diabetes and we will see an entirely different picture.

The annual incidence of diabetics in the U.S. is 1 out of 340 or .29% or 798,000 people develop Type 2 diabetes each year. Therefore, to determine the difference regarding how exercise impacts the development of diabetes, we again (as we did with colon cancer) multiply the risk (.29%) by the reduction (50%) and find that the risk for exercisers is .17% or 1 out of 588 (using a U.S. population figure of 275,000,000 as the source did). Now this may be much more significant for many people. It certainly was for me. I was at risk for developing diabetes 15 years ago because I already had an insulin-resistant disorder and a family history of diabetes. However, I didn't like the idea of giving up sweets, especially chocolate, for the rest of my life so I decided to lose weight and exercise. As a result, my blood sugars, blood pressure, and cholesterol are all perfect and I can still eat chocolate (within reason). Having information regarding the effect of exercise on the development of diabetes provided me with a method to gain greater control over my health.

If this hasn't convinced you of both the importance of the underlying numbers for understanding statistics as well as the importance of exercise, read the section below under definition of incidence regarding hypertension.

Definitions

Failure to clearly define terms can lead to inaccuracies in the reporting of statistics. Frequently, by the time statistics are reported to the public the writer is not the statistician who compiled the statistics but typically a writer who has little or no training on the interpretation of statistics or understanding of mathematics. Therefore, although statisticians may define terms precisely, the terms used in many articles are imprecise or the same terms are used inaccurately. For instance, to understand the basics of statistics one must understand the difference between prevalence and incidence of diseases.

Prevalence

is the number of cases in a population during a particular time period typically the lifetime. Although prevalence can be any specified time period, most popular media use the concept of "lifetime prevalence" although they frequently only state "prevalence" and we must assume they are referring to lifetime prevalence.

However, it does make a difference what number they are reporting. For example, the one-year prevalence of anxiety disorders in the adult population, meaning the number of cases that are present in a typical year, is 18% (Kessler, Chiu, et al, 2005) whereas the lifetime prevalence for anxiety disorders in the adult is almost 29% (Kessler, Berlund, et al, 2005).

So, if I was trying to make a point to a college class about how common anxiety disorders are I could say, "Look around you. One out of every five people you see in this room have an anxiety disorder" because I'm using the concept of one-year prevalence and since few anxiety disorders are resolved in less than a year, I'm referring to how many people currently present in the room have an anxiety disorder. However, I could also make a similar, but different, point if I was to say to a client with an anxiety disorder "You are not unusual. Over the course of their lifetime, almost 1 in 3 adults will suffer with an anxiety disorder." These comments are based on different statistics: one-year prevalence versus lifetime prevalence.

Incidence

is the number of new cases that occur in a population during a specified period which is typically reported as annual incidence. Incidence is usually reported as a rate which is the number of people who developed the disease during the period divided by the population. So, if we are looking at the incidence of diabetes reported above we can obtain the U.S. rate of .29% by dividing 798,000 (number of annual cases) by 275,000,000 (the population of the U.S. at the time the statistics were obtained). Incidence statistics tend to be used to indicate the "risk" of developing a disease.

A great deal of confusion occurs between the terms of "prevalence" and "incidence" and, unfortunately, many writers of publicly disseminated information use these terms interchangeably. For example, when I searched the web for the incidence of hypertension I found the following statement on a site called "Up to Date Online" providing medical information to the public:

"NHANES data from 1999-2000 and United States Census bureau information demonstrated a 29 to 31 percent incidence (italics mine) of hypertension in the 18 year and older population of the United States (www.utdol.com)."

Knowing the correct definition of "incidence" I would translate this statement to mean that there are 29 to 31 percent new cases of hypertension every year. You can see how ridiculous that is because it would mean the entire population of the U.S. would have high-blood pressure within 4 years! I don't know the quality of the site from which I obtained this data, however, I do know that they are using the term "incidence" inappropriately which causes me to question other data they provide. This common error of confusing the two also tends to create further inaccuracies in the reporting of statistics. If I wasn't a savvy reader I could easily pass on their incidence data to other unsuspecting readers contributing to the confusion among the public.

The Wall Street Journal reported that exercise reduces the incidence of high-blood pressure by 40%. Even though articles published on websites used the term "incidence," I had trouble finding true incidence figures because most information about high-blood pressure focuses upon prevalence statistics. This is due to the difference between a chronic disorder such as hypertension and an acute disorder such as the flu. Annual incidence is a more important statistic for the flu because the prevalence statistic would not be meaningful as people only have the flu for a short period of time.

However, the Wall Street Journal article reported the risk or incidence of developing hypertension for exercisers and non-exercisers so I needed to use the same type of statistic in order to examine the meaning of the percentages. Therefore, incidence is 1 out of 31 (extrapolating from Canadian statistics: Tu, 2008) or over 3% of population of the adult population. Again, by multiplying risk (3%) by reduction (40%) we obtain an incidence of approximately 1% for exercisers which would be equivalent to 1 out of 100. Now that might be considered a substantial difference by most people's standards. In addition to that, since hypertension is a chronic disorder, if everyone in the U.S. exercised, the number of people with hypertension would be reduced over a 5-year period by 14 million people!

Agendas

Sometimes agendas are beneficent and sometimes they are self-serving, but agendas always exist. Therefore, to fully evaluate the statistics, the agenda of the reporter needs to be considered. Usually the primary source of data, the scientific article, is fairly free from bias as it typically needs to provide all the underlying numbers and to be clear and precise with the interpretation of these numbers to be eligible for publishing in a scientific journal. However, the further away from the primary source and the more times the results have been paraphrased, the more potential for inaccuracies exist.

So, understanding the agenda of the reporter can help with interpreting the data. However, certain assumptions must be made about agendas. For instance, I assume that the agenda for the journalist who wrote the article for The Wall Street Journal article described above is to provide an interesting, accurate article that will inform the public about an important topic. However, what is the agenda of her source for the statistics she provided? One source mentioned in the sidebar to the article (from which the percentage statistics discussed above were drawn) is the American College of Sports Medicine (ACSM). A search on the internet shows the ACSM to be a professional society providing basic and applied exercise science conferences, meetings and workshops. I certainly to not want to denigrate an organization I know little about, but it is possible that their agenda is to convince others of the importance of exercise so as to obtain paid registrants to their conferences and workshops? If that were the case, they may be more likely to present data in a positively skewed format such as using percentages instead of providing the underlying numbers. Although another possibility is that they did provide the underlying numbers but the newspaper didn't use them because extreme numbers are more likely to sell newspapers. I'm not saying that I know this is the case in this situation but it is part of what we need to consider when determining other's agendas.

To determine agendas we want to speculate as to what benefit the individual or the organization obtains from providing the information. This allows us to determine why we might find certain inaccuracies in the information provided. For instance, a statement in the sidebar of The Wall Street Journal Article states that according to the ACSM exercise can "decrease depression as effectively as Prozac or behavioral therapy." As a behavioral therapist myself, I am puzzled by this statement because I know that behavioral therapy is about changing behaviors which would include the behavior of exercise. So, of course exercise is as effective as behavioral therapy at decreasing depression because exercise is one of the tools of behavioral therapy. What I don't know is what study this statement was derived from: was it research examining the difference between medication, therapist-aided behavioral therapy, and exercise alone? In which case, I would also want to know how the subjects were motivated to exercise: were they self-motivated or did the researcher ask them to exercise? If they were asked to exercise, how was that condition of the research different from the therapist-aided behavioral therapy? I could continue with this line of questioning but I just want to give you an idea of some of the questions that can be considered when evaluating data that is provided to the public. For all I know, the source of this statement may have an anti-therapy bias, which agenda may bias the overall presentation of the statistics.

Although there are many other issues to be considered when evaluating the statistics that are presented by the national media, I hope that I have provided you with an understanding of some of the most critical issues to evaluate as you read or hear statistics that are presented. In another article I intend to use the concepts presented here to evaluate the efficacy of medications and how to determine whether the benefits outweigh the side effects.

* "Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: 'There are three kinds of lies: lies, damned lies and statistics (Benjamin Disraeli).'"
- Mark Twain's Own Autobiography: The Chapters from the North American Review

Lifetime Prevalence and Age-of-Onset Distributions of DSM-IV Disorders in the National Comorbidity Survey Replication. Archives of General Psychiatry, 62, 593-602.

Kessler R.C., Chiu W.T., Demler O., Walters E.E. (2005). Prevalence, severity, and comorbidity of twelve-month DSM-IV disorders in the National Comorbidity Survey Replication (NCS-R). Archives of General Psychiatry, 62, 617-27.

Landro, L. (2010, Jan. 5). The Hidden Benefits of Exercise. The Wall Street Jounal. New York: Dow Jones & Company.

Tu, K., Chen, Z., & Libscombe, L.L. for the Canadian Hypertension Education Program Outcomes Research Taskforce (2008). Prevalence and incidence of hypertension from 1995 to 2005: a population-based study. Canadian Medical Association Journal, 178, 1429-1435.

curved line