3091
post-template-default,single,single-post,postid-3091,single-format-standard,stockholm-core-2.4.5,select-theme-ver-9.12,ajax_fade,page_not_loaded,menu-animation-underline,,qode_menu_,wpb-js-composer js-comp-ver-8.1,vc_responsive

Protecting Yourself in the Age of Information: Simpson’s Paradox

A growing problem in today’s digital age is the propensity for false or misleading information to become mixed with the legitimate, thereby muddying the proverbial waters and making it difficult to navigate through the sea of information without getting some of the contaminated muck on yourself. Whether its Trump’s constant tirades about “fake news” or the latest online article about the New Cancer Drug [which] Kills 100% of Cancer Cells! It becomes difficult to separate the fact from the fiction, the advertisement from the news, the informative from the fluff. One of the most common ways people become confused about information is statistics, which can be used manipulatively to mislead readers.

 

Consider Simpson’s Paradox, named after statistician Edward Simpson, which illustrates how surface-level data can fool you by not revealing what lies beneath. The classic example of Simpson’s Paradox involves a case from 1973 when UC Berkeley was sued for gender discrimination against women based on admissions figures:

 

Men Women
Applicants Admitted Applicants Admitted
8442 44% 4321 35%

 

The data shows that men are significantly more likely to be accepted into UC Berkeley than women. Why is this misleading? Let’s look at the data for admission rates in the six largest departments at UC Berkeley:

 

 

Department

Men Women
Applicants Admitted Applicants Admitted
A 825 62% 108 82%
B 560 63% 25 68%
C 325 37% 593 34%
D 417 33% 375 35%
E 191 28% 393 24%
F 373 6% 341 7%

 

Notice something funny? For 4/6 of the departments, women are actually more likely than men to be accepted. Then why do the totals show a higher proportion of men being admitted?

 

Direct your attention to the row for Department A and see the number of applicants for men and women. Even though more women are accepted to Department A at a rate 20% higher than men, the raw number of women accepted is far lower than men. The same pattern is true for department B.

 

Compare that to Department C, where far many women than are applying than men, but only 34% are being admitted. It turns out women tended to apply to highly competitive departments with low rates of admission, while men tended to gravitate toward less competitive departments with high rates of admission, explaining the surface-level data’s suggestive gender discrimination.

 

 

Other examples are in sports statistics, such as in hockey. Consider the following hypothetical goalie save% for two goalies, Swiss Cheese and Mr. Sieve, across two years:

 

2017 2018 2017 and 2018
Swiss Cheese 456/487 (0.936) 2003/2301 (0.871) 2460/2788 (0.882)
Mr. Sieve 2116/2312 (0.915) 1544/1789 (0.863) 3660/4101 (0.892)

 

Despite Swiss Cheese having the better save% in 2017 and 2018, his overall save% for the two years is lower than that of Mr. Sieve. If you look at the number of shots each goalie is facing in each year, you can begin to understand why.

 

So what can we take away from understanding Simpson’s Paradox? Simply to be careful about trusting every statistic you see at face value, because as with anything, there’s usually something hiding under the hood.

Tags: