This report covers some taste tests I did for cupped coffees, shots and cappas. My goal with this grinder was to get improved shots on bright SOs usually unsuitable for espresso. A report on that will follow next week.
METHOD: The 4 cup tests were blind. For the 9 cappas or macs, 4 were blind and 5 non-blind. For the 31 straight shots, 15 were blind and 16 non-blind. The cappa and mac tests were done as part of the shot tests, each shot was sipped and rated, then the cappas or macs were made and rated.
The blind requirement raised a severe problem for shot making, since I could not get enough volunteers to have one make, another taste. Doing blind tests of shots requires that the shot flow rates be equal, so that it doesn't act as a tell. This proved impossible to do consistently with a chopped pf, since the visible flow from the M3 is almost always better than the mini's. The sweeper/chute exit of the grinder does a very good job in creating an even density fill that makes for picture perfect naked pf shots.
For a regular PF, another problem arose. I did most of the sighted shots first. After I did the first eight blind shots, it was obvious the results were radically divergent from the sighted ones. Had the sighted shots been an exercise in self-deception? For the blind test, I had set the mini on its sweet spot, then adjusted the M3 to produce the same extraction rate. So I did a second sequence of seven blind shots with the M3 set on its sweet spot and the Mini adjusted to match the extraction. The results from this made sense of all the data. The sighted series may contain some bias, but massive self-deception now seems unlikely. I finished testing with six shots from the Peppina, 2 sighted and 2 blind at each grinder's sweet spot. I broke off testing when the new results showed no more surprises or large changes in the offing, and when I became so familiar with the small taste and extraction idiosyncrasies of each grinder that further blind testing was impossible.
CUPPING: I did four blind cuppings. The first was a terrible coffee where I did just one cup from each grinder; the M3's was really terrible with all the ghastliness very clear, whereas the Mini's was muffled. The difference became especially apparent on cooling. Based on (unfortunately awful) taste clarity, I picked the M3 as the better grinder. The second was a double triangle cupping of an excellent Sumatra Lintong. I picked the odd cup from each group, and designated the M3 as the better grinder, this time based on good taste clarity. Again this was especially apparent on cooling. The third cupping was 5 cups of Kenya Meru, the task was to pick the two M3 cups from the five. I divided the cups into two groups of two that tasted clearly different, but in a way I couldn't characterize or form a preference on, and one cup that was in between. The cooled off taste was no help. One pair was the M3 cups. The final taste test was the last of Barry's Haimi. Since DPs are so variable, I wasn't expecting much. But this is a very clean cup, and by the time they cooled, I had no trouble picking out the two strawberry bombs from the five on the table. This results is that odds of the grinders being indistinguishable and my guesses lucky are 1 in 1800. In other words, the grinders pretty definitely produce different tastes for steeped coffee with the M3's better.
I was particularly impressed on how the grinders differentiated as the cups cooled. The M3 cups stayed relatively clean and crisp, whereas the mini cups became more drab, muddy and bitter (except in the very puzzling Meru, where all the cups stayed ultra-clean and crisp, see also the cappa test). There were clues in the hot cups, but nothing as definitive. My guess at this point was that the grinds from the M3 are more resistant to over-extraction.
CAPPAS: The alties in Seattle had different favorites when it came to straight shots from the best cafes; but we were all struck by the uniform excellence of the short milk drinks. It may be no accident that all these cafes use conical grinders, since the M3 cappas and macs blew the Mini's off the field. It simply wasn't a contest. I had 8 in all, and scored them from -2 to +2. -2 is the mini's being a lot better, -1 somewhat better, up to +2 for the M3's being a lot better. Of the nine cappas, seven (four sighted, three blind) scored a +2 and two, one sighted, one not, were a tie at 0. The two tied cappas came from the same Meru that didn't get bitter on cooling in the cupping. For what it's worth, the mean score comes to 1.56, and the 95% confidence interval for the mean is from 0.88 to 2.
Basically, there is no detectable generic bitterness at all in the M3 cappas. The experience was odd, almost like not drinking coffee, but an exotic milkshake. However, all the complex coffee tastes were there, just in a sweet, non-acrid form even Robert Parker could love. Again, this suggests a physical difference in the grind that reduces an aspect of overextraction that gets past the milk's taste filtering effect.
SHOTS: As stated in the methods section, I did 15 blind and 16 unblind shots. 7 of the blind shots were on the M3's sweet spot, with the Mini set to match, the other 8 were on the Mini's sweet spot with the M3 set to match. The 16 sighted shots were done at each grinder's sweet spot. 6 shots, 2 from each group, were made on the La Peppina, a spring lever machine with precise shot temperature control, that I use for blend development. The La Peppina shots, as usual, tasted substantially better and had worse crema and body than the Tea shots, but their relative scoring for each grinder was completely in line with the Tea shots. Due to sample size limitations, I'm not reporting them separately, but I have shown their occurrence in brackets in the summed data table
The form of the scoring is the same simple one as in the cappas. I was looking for both pleasant and clearly defined tastes using a blend I personally find both pleasant and having interesting flavors. There were only two cases where I scored a 0 for no preference because one shot had better flavors and the other was more pleasingly balanced. In the other cases, these two aspects went together.
Here are the results:
- Table:
--------------------------------------------------------------------
SCORE #_SIGHTED #_BLIND_M3_SPOT #_BLIND_MINI_SPOT TOTAL
--------------------------------------------------------------------
-2 0 0 1 1
-1 1 0 3 (1) 4 (1)
0 3 1 2 (1) 6 (1)
+1 7 (1) 3 (1) 2 12 (2)
+2 5 (1) 3 (1) 0 8 (2)
--------------------------------------------------------------------
AVE-SCORE 1.00 1.29 -0.38 0.71
--------------------------------------------------------------------
95% CI LO* 0.52 0.59 -1.26 0.31
95% CI HI* 1.48 1.98 0.51 1.11
--------------------------------------------------------------------
* The bottom two lines show the range for the 95% confidence interval of the average scores. The sighted, blind m3 favorable grind, and combined scores are statistically significant in their preference for the M3 (i.e. 0 is not part of the confidence interval) while the scores on the blind tests favoring the mini are not statistically significant in favor of the mini (although I have little doubt they would have become so with a longer series of tests to narrow down the confidence interval).
The tests show that the M3 grinder is much better on it's "homefield," the mini perhaps slightly better on its homefield, and the M3 grinder is distinctly, but not overwhelmingly, better with best practice for both grinders.
Note in particular that there is no overlap in the two blind test confidence intervals, therefore no strong possibility of the results coming from the same underlying process. This shows how critical working out appropriate comparative grinder adjustments is to this type of testing; and, obviously, how critical such adjustment is for the taste of the espresso. My untested but strong impression from doing the dial-ins is that the M3 requires a tighter one for optimum taste than the mini; i.e. in the possible range from 20 second 2.25 ounce shots to 35 second 1 ounce shots for a double, the sweet spot for the M3 is narrower in volume and time.
I'm not sure if and how much bias there is in the sighted tests. The outcome, as one would expect of an unbiased test, falls between the blind tests favoring the mini and m3. However, they do lie closer to the m3 favored ones, although not enough for there to be a statistical smoking gun (the average from the sighted test is 1, the average from the two unsighted tests together, weighted equally, is 0.46. The t-value for there being a difference in the two means is 1.44, which is not statistically significant). Furthermore, there is no logical reason that the sighted format of best practice on both grinders should exactly split the difference of the two blind formats. Readers can make up there own minds about how much subtle bias I showed; I certainly can't. The results prove that rosy-tinted self deception was not a factor; which is the important thing.
How would I characterize the taste differences qualitatively?
-- There was no consistent difference between the grinders in crema or body as far as I could discern. On the whole, both grinders performed excellently in these aspects.
-- The aftertaste from the M3 shots is more lingering and sweeter. In unpaired shots, where one can distinguish, the aftertaste lingered several hours.
-- In the aroma, there was less of the acid nip from the bright components, although the fruit was there. Given that the nip evokes in every espresso hound the conditioned Pavlovian reflex of dreading the upcoming shot; this counts as an improvement in an odd sort of way.
-- At the respective grinder's sweet spots, the shots were equally sweet and balanced in flavor (almost by definition), but the M3 taste had slightly less of the irritant edges at the top and bottom, this is it's main plus in terms of both clarity and pleasantness.
-- The M3 tastes were quite clear, but more integrated and smoothly blending into each other than on the Mini, where they seemed more separated. I liked this, but did not factor it into the score one way or the other, since my liking may be more for the novelty than the effect itelf.
CONCLUSIONS I know that this small amount of comparative tasting is not overwhelming evidence for my conclusions. But further tasting by me is probably useless. I know enough of the grinders' quirks now to make any blind tasting a fig leaf; moreover, I'm personally convinced the conclusions are sound and unlikely to change in any dramatic fashion. These factors combined would seem to make further taste tests by me a self-fulfilling prophecy.
The results from cup, cappa and shot all support a simple hypothesis: the M3 is a better grinder because its grinds are more resistant to overextraction than the Mini's, and by extension, other commercial flat burr grinders.
This hypothesis is supported by the grinder's construction. The grinder uses a conical set of burrs to crush the beans to a rough grind, followed by a flat set to shear them to a fine grind. Compared to standard flat burr grinders, the crushing and shearing paths are longer, and the beans are ground more gradually and gently. It is reasonable to assume that this leads to less fragmentation of the cell walls. This means fewer dust-like fines containing only cell wall fragments, and more of the fines containing nearly intact cells. The extraction of compounds found in the cell walls but not the cell interiors would be slowed down by this. Given that specialized cell wall flavor compounds would have evolved to discourage ingestion by animals, they could well be responsible for the characteristic unpleasant bitterness in overextracted coffees, e.g soluble, aka instant, coffees.
The alternative mechanism, that the grinder's slow speed doesn't melt the lipids and transport undesirable flavors to the particle surfaces, seems less likely to be the cause. In sporadic home use, conventional grinders do not overheat enough to melt the lipids.
This hypothesis should be testable directly, by microscopy or measurements of the right sort, perhaps of grinder fines or extraction ratios. I hope others will be able to do this, since evidence of this type would be more probative than further taste testing on my part.
Whether this improvement is enough to justify the M3's added price is up to each reader. My impression is that it is roughly on par with the improvement one gets when upgrading to a higher class of espresso machine.
It should be noted that with the advent of the naked PF, the M3 loses some of a large advantage it would have had earlier. A year ago, I would have considered the grinder's very even deposition of grinds into the basket miraculous in its consistency. Even now it's amazingly good. But the discipline of the naked PF has forced everyone to improve their distribution and packing skills to the point where we are all getting far fewer sink shots with existing equipment. In this respect, the M3 raises the bar a good notch, but hardly launches it into the stratosphere -- the naked PF already did that last year.
LIQUORING TDS TEST There is one physical measurement I could perform to test the resistance to overextraction hypothesis. I steeped six cups from each grinder with the same amount of water, 4 ounces, and grinds, 8 grams, as best as I could measure. The grinds felt the same, but the Mini grinds looked coarser. These are the same settings I used in the cuppings. Then I filtered three cups each at 4 minutes and 3 cups each at 10 minutes through a swiss gold filter, and measured their total dissolved solids (TDS). If the M3 grinds are more resistant to overextraction, one would expect the 10 minute TDS to be lower than the mini's.
- Table:
BREWED COFFEE TDS
SAMPLE M3 @ 4 MIN MM @ 4 MIN M3 @ 10 MIN MM @ 10 MIN
--------------------------------------------------------------------
1 1624 1741 1809 1825
2 1425 1360 1584 1964
3 1508 1504 1704 1907
--------------------------------------------------------------------
MEAN 1519 1535 1699 1899
ST.DEV 100 192 113 70
--------------------------------------------------------------------
SIGNIFICANT? NO NO-ish (6%)
--------------------------------------------------------------------
Note: TDS meter readings for coffee are 1/10 of actual TDS (source: Barry Jarrett).
I do not know if my TDS meter is up to the required precision, nor do I know the margins of error on my weighing and measuring, so the results, while suggestive, are probably uninterpretable. However, they do suggest it is a test worth doing for someone set up to do them precisely.




