SonVolt wrote:Have you ever switched from one beer to another and been slapped in the face with how weird/awful the 2nd beer tastes until your palate readjusts?
Keep in mind we're comparing two samples of the same coffee. To put it another way, it's more like comparing Coke and Pepsi than a stout and lager beer.
homeburrero wrote:That result is well within expectations due to random chance.
In this specific case, the difference was clear enough that I'm confident that if we had increased the number of samples, the proportions would remain about the same. That is, if I had been served 4 pairs, I'm confident I would have picked the same one every time. I consider myself an average taster and the difference was significant enough to overcome my middling taste abilities.
baldheadracing wrote:Two different machines, two different baristas, and two different grinders (same model, but maybe not the same design burrsets!): too many effects for the experiment's design to handle.
I was surprised by Ben's comment about the burrs; nobody mentioned that prior to the test or during our group discussion.
For what it's worth, I have noticed that when the "expected favorite" doesn't win the group taste test, all sorts of interesting theories emerge to explain the unanticipated result. For example, in the Ratio Eight Brewer Review group taste test, there was talk about the level of the Bonavita negatively impacting its results. We did another round with the level verified and the two units in their standard configuration (we had used the same KONE filters in the first round). There was no change.
The main reason reviews include a "Forgiveness Factor" is that I believe they have a significant impact on the real-world results. In the hands of a world class barista and with scrupulous attention to eliminating all possible variables, you may be able to demonstrate statistical significance that most will accept. But most buyers are not world class baristas, they're average home baristas, where I believe the ease of use carries more weight than absolute potential. In the case of last week's group taste test, I believe that was the main difference.