For years I've been testing espresso machines and grinders and writing reviews. One of the highlights of the research phase is a group taste test, frequently held at Counter Culture's training center in Durham, NC. The tasting itself is done blinded, i.e., we discreetly mark the bottom of one cup and hand the taster two espressos. They simply have to pick their favorite and place that cup on the "winner" side. In the end, we typically have at least 8 pairs. A few times it's ended in a near draw. Only once has there been a complete blowout where one was picked unanimously. Most of the time, there's a clear winner, but it's far from a landslide victory.
In preparation for the tasting, we've tried different formats to isolate variables. For example, we'll use the same grinder type, coffee, basket, temperature, and dose. We also agree to the same brew ratio. In the past, some have argued that not exploiting a particular piece of equipment's unique features (e.g., pressure profiling for the Vesuvius), we effectively handicap one competitor by "normalizing" acceptable output.
This past Friday, we held a group taste test as part of the Profitec Pro 800 Review
. Cognizant of the aforementioned concern, we changed the format slightly: While the coffee, basket, dose and so on were the same, each barista was instructed, given 20 minutes, to dial in the best espresso they could with whatever brew ratio or grind setting they deemed appropriate.
Once the final results were tallied, a few participants expressed concerns that the comparison was not valid.
I agreed with them that when cupping coffees, it's absolutely essential that one adhere to a precise recipe. But if testing real-world usage of an espresso machine, isn't it valid to compare the results from two espresso machines that were independently dialed in by competent baristas? I argued that allowing two baristas, both of which are intimately familiar with the equipment before them, sufficient time to dial in a coffee more accurately reflects real-world usage versus a rigid protocol that may favor one and disfavor another.
Maybe, maybe not. Perhaps we ended up comparing the skills of the two baristas. Or maybe it was preference bias that favored one style of preparation. Or maybe it was a perfectly valid real-world comparison. What do you think?Final score: 6-2. No, it wasn't close. Was the competition fair?