...

One has to be very careful with row data

Even accurate data may easily yield wrong conclusions

We will consider several examples today

A pharmaceutical company promotes a new treatment for acne:

Mild Acne | Severe Acne | |||
---|---|---|---|---|

Cured | Not Cured | Cured | Not Cured | |

Old Treatment | 2 | 8 | 40 | 40 |

New Treatment | 30 | 60 | 12 | 8 |

and 60% versus 50% for severe acne.

90 patients received old treatment, 42 were cured with

42/90=46.7% success rate

110 patients received new treatment, 42 were cured with

42/110=38.7% success rate

The artificial subdivision into "mild" and "severe" cases made it possible to present the data in a way such that the new treatment looks better while it is not!

Basketball example

First Half | Second Half | ||||||
---|---|---|---|---|---|---|---|

Player | Baskets | Attempts | percent | Baskets | Attempts | percent | |

Kevin | 4 | 10 | 40% | 3 | 4 | 75% | |

Kobe | 1 | 4 | 25% | 7 | 10 | 70% |

While totally during the whole game

Player | Baskets | Attempts | percent |

Kevin | 7 | 14 | 50% |

Kobe | 8 | 14 | 57% |

In 85% cases, a mammogram correctly identifies a tumor as benign (not cancer) or malignant (cancer)

How threatening a positive mammogram may be?

Out of 10,000 tumors, 100 are malignant, and 85 of them will be correctly identified by mammograms.

Out of these 10,000 tumors, 9900 are benign, and \( 0.15 \times 9900 = 1485 \) of them will be misidentified by mammogram as cancer.

Thus out of 1485+85=1570 positive mammograms only 85, i.e 85/1570=0.054 or 5.4% cases have cancer.

Not as threatening as it looks!

Assumption made: only 1% of tumors (100 out of 10,000) are malignant. What if 20%?

Now out of 10,000 tumors, 2,000 are malignant, and 1,700 of them will be correctly identified by mammograms.

Out of these 10,000 tumors, 8,000 are benign, and \( 0.15 \times 8000 = 1200 \) of them will be misidentified by mammogram as cancer.

Thus out of 1,700+1,200=2,900 positive mammograms already 1,700, i.e 1700/2900=0.586 or 58.6% cases have cancer.

Now it looks bad!

Overly optimistic assumption: just 0% of tumors (0 out of 10,000) are malignant.

No worry in this case, whatever the mammogram indicates: no cancer cases

Overly pessimistic assumtion: just 100% of tumors (all 10,000 out of 10,000) are malignant

Too bad a scenario: all 8,500 cases are identified correctly

Conclusion: how threatening is that depends not only on the accuracy of the identification, but also on the frequency of malignant tumors among all tumors

Assume that 4% out of a 1000 athletes (i.e. 40 people) use drugs for enhanced performance

A 95% accurate test will give us false positive on 5% out of 960 people

That comes out as \(0.05 \times 960 = 48\) innocent athletes accused

The test will find correctly \(0.95 \times 40=38\) drug users

Thus out of 38+48=86 accused only 38/86=0.44, i.e. 44% indeed used the drugs, and 56% did not.

That is a result of a 95% accurate test.

2011 effect of 2001 tax cut

While the chart on the left suggests that rich pay bigger share of taxes, the right chart shows that they pay smaller amount in absolute dollars.

Clearly, that is because the total tax revenue was smaller.