Comparison Tests are about statements like these:
- "90% like Brand A better than Brand X."
- "For the treatment of vegetative myopathy, Simglobulon significantly outperformed placebos in the FDA approval trials."
- "I can't tell the difference between properly frozen and fresh roasted coffee when used for espresso"
In each of these statements, two things are being compared, and an assertion is being made about their differences or lack of it. How are such statements tested?
The basics are really easy, and frequently misunderstood. Suppose you are comparing A & B:
- If A beats B every single time, the pattern becomes obvious in 3 to 4 trials, and pretty much incontrovertible after 7 to 10 trials.
- But suppose A beats B only 11 times out of 20. Then the pattern only becomes obvious in 100 to 200 trials, and requires around 500 to a 1000 to become pretty much incontrovertible.
"Pretty much incontrovertible," "saying for certain," "beyond all reasonable doubt," ... statisticians have a phrase that means the same thing ... "a statistically significant result." This does not mean the difference between A & B is big, it means enough trials have been run to say the difference, whatever it may be, exists beyond reasonable doubt. Statisticians put a number on this significance, typically 5%, 1%, and 0.1%. This means that their statement of asserted differences will be true 95 time out of 100, 99 times out of 100, or 999 time out of 1000.
So how many trial does it take to establish a significant difference? That depends how much better A is than B. Here's a handy table:
----------------------------------------------------------------
Frequency of A Beats B Trials needed for 5% for 1% for 0.1%
----------------------------------------------------------------
100% 5 7 10
90% 8 11 14
80% 11 19 27
70% 21 37 65
60% 80 145 245
55% 300 560 980
51% 7,000 14,000 24,000
-----------------------------------------------------------------
So the next time you hear about a drug trial that used a cast of thousands; ask whether the drug company was being really thorough, or whether the test needed that many people since the drug's potency isn't all that different from a placebo's.
As far as coffee tests are concerned, my guess is that people are hardly interested in making a change if it turns out not to beat their current set up at least 6/10 times. Unfortunately, even that requires more trials (75 to 200) than any of us amateur testers can easily handle. So, if we discover no difference, it's somewhere from 30/70 to 70/30, and if we do discover a difference, it's because its more extreme than that.
I hope this has been educational; and now back to our regular programming.



