Warning: Do not attempt to operate dangerous equipment when reading this section!
If you haven't figured it out already, this section is designed for three groups of readers: those who are skeptical and want to know exactly how the testing was done, those who find this sort of thing interesting, want to know more, and who might be interested in doing their own studies, and finally for those those who suffer from insomnia in spite of the best attempts of their medical professionals. We will try to satisfy all of you.
The coffee that was compared in this experiment was single origin MAO Ethiopian Harrar Horse obtained in green (unroasted) form late in the summer of 2006 from coffeewholesalers.com. This coffee was selected for several reasons including the fact that it makes nice single origin espresso, I had enough of it in inventory to test, and finally because it tends to show when it is staling by losing its multidimensional flavors and becoming "flat." All of it came in a single 11 pound bag and all batches were roasted identically to approximately 442°F with the same roast parameters, a level at the very beginning of second crack which produced beans with no visible oiling. A 500 grams gas-fired drum sample roaster was used as shown to the right. The beans were introduced at an approximate drum temperature of 360°F, after which first crack started between 9 minutes and 9 minutes 15 seconds, and the total roast duration was between 12 minutes 15 seconds and 13 minutes. Roast progress was followed with the aid of an internal thermocouple in the roast drum, plus a Fluke digital thermometer. All of the samples, be they fresh and never frozen / previously frozen, were the result of at least two separate batches that were thoroughly mixed together.
The coffee that was destined for the freezer was immediately put into commercial plastic coffee valve bags. Excess air was evacuated by hand and the seams of the bags were sealed. A piece of Scotch tape was placed over the valves because these valves rely on a drop of oil within the valve and the valve can hence freeze in either the open or the closed position. Tape was used to prevent the possible entry of air from the freezer through a valve that might possibly have frozen open. The coffee, now in sealed valve bags, was then put near the bottom of a very cold 7 cubic foot chest freezer, whose measured temperature was generally in the range of -15°F/-26°C to -20°F/-29°C on an NSF freezer thermometer. The coffee then remained in the freezer for periods of 4 or 8 weeks as detailed earlier in this article. When defrosted, the coffee was removed in the sealed bags and allowed to reach room temperature in a dark kitchen pantry; the piece of tape over the valves was removed once the bag reached room temperature.
The fresh and never frozen coffee used for comparison was roasted with identical roast parameters as was the coffee that had been previously roasted then frozen. It was roasted 4 days before the first day of taste testing, and hence was tasted from days 4 to 8 in the degassing process. The previously frozen coffee was assumed to have at least slightly aged while in the freezer. Therefore it was decided to remove this previously frozen coffee from the freezer 1 day later, after the "fresh, never frozen" coffee was roasted. What this means is that if one disregards any degassing that may have occurred in the freezer, the previously frozen coffees degassed 3 days before the first day of tasting and were tasted over a period for 3 days to 7 days out of the freezer.
The grinders were adjusted as needed during the trial to produce 1.25-1.5 ounce double espressos within a time range of 20-30 seconds, generally never more than 35 seconds. There were a couple of instances where one or other shot were simply not satisfactory due either to channeling, too rapid flow, choking, or other problems. In those cases both shots were discarded and another set was made to replace them. Every time that coffee was changed in the grinders, the grinders were completely cleaned of what remained from the earlier coffee. This included using a chop sticks to dislodge beans that can hang up above the grinder burrs, and also cleaning out the grinder chutes and the dosers manually, so that the new coffee introduced would not be "contaminated" with the old one. Both of the grinders (which use 64mm burrs) were of approximately the same age, about 3.5 years. One of the grinders had its burr set changed one month before this test was conducted. The other grinder had run approximately 100 to 150 pounds of coffee through it over the lifespan of its burrs.
There were obvious differences between the espresso machines used in this study. One was a 1995-vintage semi-automatic Cimbali Junior pourover with a vibratory pump; the other was a similar automatic rotary pump equipped Cimbali Junior manufactured in late 2002. Both have identical groups and heat exchangers, but the boilers (although identically sized) are made differently. Both machines have been modified with electronic temperature controls ("PID") in lieu of their original pressurestats. It was necessary to set both machines up in such a way that they delivered nearly identical extraction temperature and pressure profiles to eliminate, as much as possible, the impact of these factors on the taste of the espresso shots they would produce. This was made a little bit easier by the fact that the rotary pump machine has been modified  with a pump delay timer, producing "preinfusion" which largely mimics the pressure ramp up characteristics of a vibratory pump. Extensive adjustment and testing was done with a Scace thermofilter and handheld datalogger. Graphs of brew temperatures were obtained and are reproduced below. Both machines had their extraction pressures adjusted to 9 bar with a portafilter manometer.
During the course of this experiment, necessary machine maintenance was performed. This included such things as water backflushing and portafilter "wiggles" two or three times during the three hour test period, plus a chemical backflush of both machines after the end of each day's testing. Every effort was made to adhere as close to practicable to the program of one set of shot pairs every 7.5 minutes, in order to try to replicate the temperature stability shown on the Scace portafilter graphs further below. Please note that the temperature scales displayed are different due to inherent differences in the shape of the shot temperature curves produced by each machine. Closer examination of the actual shot temperatures demonstrates that they are similar.
I am indebted to Jim Schulman for his work in designing the way that this experiment was to be conducted, and for performing the statistical analysis of the data that was obtained. Originally Jim was to be a taster in this trial but unfortunately he was only here in spirit and I had to turn to other friends for the execution of the actual experiment.
The tastings occurred over 4 days with three different tasters. My friends Bob and Randy both participated in half of the tasting sessions each, and I participated in all four of them. Bob is a friend of mine for two decades and is a foodie with serious interest in wine. His wife once owned a high end catering company, and still practices gourmet cooking at home. They own an Andreja Premium E61 espresso machine with a commercial Rossi grinder. I have been supplying them with freshly roasted coffee the last few years. Bob drinks almost exclusively straight espresso shots, never with sugar and seldom in milk drinks. I helped them to select their Andreja Premium as a replacement for a Bezerra espresso machine, and at the time of purchase several years ago gave them refresher lessons on proper espresso preparation and shot timing/volume factors. My other friend and assistant, Randy, does not make home espresso but he is a home coffee roaster using a Freshroast+. He's often over at my house and has had quite a few espressos over the years, produced by my machines. When our testing got underway I was surprised that he was actually able to help me with the machines; turns out he once dated a barista when he lived in Seattle!
Because this study failed to find significant differences, the raw data, presented either numerically or as diagrams, shows the randomness of the choices made by the tasters. For those interested in seeing more of this, I am reproducing it below. First is a table showing the actual ratings given by the tasters for all 64 shot pairs tasted over 4 days:
FEBRUARY 2007 FROZEN COFFEE vs NEVER FROZEN TEST DATA DATE 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 TASTER RA RA RA RA RA RA RA RA KE KE KE KE KE KE KE KE MACHINE RO VI RO VI RO VI RO VI RO VI RO VI RO VI RO VI GRINDER NB NB NB NB OB OB OB OB NB NB NB NB OB OB OB OB COFFEE 4W 4W 8W 8W 4W 4W 8W 8W 4W 4W 8W 8W 4W 4W 8W 8W OVERALL +1 +1 +1 +2 -1 0 -2 +2 0 -1 +3 +2 -1 +1 +2 -2 CREMA -1 +1 -2 -2 -1 0 +1 -1 -1 +2 +1 +1 -1 0 -1 -1 TASTE +1 +1 +1 +1 -1 0 -1 +2 -1 +1 +2 -1 -2 -1 +1 -2
DATE 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 TASTER BO BO BO BO BO BO BO BO KE KE KE KE KE KE KE KE MACHINE RO VI RO VI RO VI RO VI RO VI RO VI RO VI RO VI GRINDER NB NB NB NB OB OB OB OB NB NB NB NB OB OB OB OB COFFEE 4W 4W 8W 8W 4W 4W 8W 8W 4W 4W 8W 8W 4W 4W 8W 8W OVERALL -1 -2 +1 +2 -1 +1 -1 -1 -1 +1 +1 +2 +1 +2 +1 -1 CREMA -1 +1 +1 -1 +1 +1 +1 -1 +2 -2 -1 -1 +1 +1 0 +1 TASTE -1 +1 +1 -2 +1 +1 +1 -1 -1 +2 +1 0 -1 +1 0 -1
DATE 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 TASTER BO BO BO BO BO BO BO BO KE KE KE KE KE KE KE KE MACHINE RO VI RO VI RO VI RO VI RO VI RO VI RO VI RO VI GRINDER NB NB NB NB OB OB OB OB NB NB NB NB OB OB OB OB COFFEE 4W 4W 8W 8W 4W 4W 8W 8W 4W 4W 8W 8W 4W 4W 8W 8W OVERALL +2 -1 -2 -1 +1 -1 +1 +1 -2 +1 -1 +1 -1 +2 +1 -1 CREMA +2 -1 -2 -1 -1 -1 +1 +1 +1 0 +2 -1 -1 -1 -2 0 TASTE +2 +1 -1 +2 -1 +1 -1 0 -1 -1 -1 +1 -1 -1 -1 +1
DATE 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 TASTER RA RA RA RA RA RA RA RA KE KE KE KE KE KE KE KE MACHINE RO VI RO VI RO VI RO VI RO VI RO VI RO VI RO VI GRINDER NB NB NB NB OB OB OB OB NB NB NB NB OB OB OB OB COFFEE 4W 4W 8W 8W 4W 4W 8W 8W 4W 4W 8W 8W 4W 4W 8W 8W OVERALL -2 -2 0 -1 -2 -2 +1 +1 +1 -1 -1 -2 +1 +2 -1 +2 CREMA -2 0 -1 -1 -1 -2 +1 -1 +1 -1 -2 -1 +1 +2 -1 +1 TASTE -2 -1 0 -1 -2 -2 +1 +1 +1 -1 +1 -2 +1 +2 0 +1
Below are graphical representations of the same data:
Please note that these three charts above show comparison scores of fresh coffee compared with previously frozen coffee. That is why there is one number in each comparison (0, ±1, 2, or 3), showing which coffee was preferred over the other and by how much.
Jim Schulman performed the statistical data analysis and has given these explanations of his findings, followed by his own commentary on the findings:
"The test was a sequence of 64 paired shots. One of each pair was fresh coffee, the other either frozen 4 weeks or 8. The tasters were KEN, Ken, RAN, randy, and BOB, Bob. Ken tasted 32 pairs, Randy and Bob 16 each. The tasting was over 4 days. Each pair of shots were done on 2 machines, 2 grinders, and 2 frozen coffees (4 and 8 weeks). The magic number is 8 for this test, 2*2*2, to capture all the combinations once each. The other magic number is 40, which would get 5 datapoints into each of the eight bins and guarantee reliable analyses.
The data is set up as three dependent variables:
The values of these variables ranges from -3 to 3, with the positive number always indicating a preference for the frozen coffee.
The data had three independent variables:
Running the analyses of variance for all three variables produced no significance on either the intercept (a straight overall preference for fresh or frozen, on any of the three variables, or on any of the 4 interaction effects). If one didn't factor in time or taster, the results are completely indistinguishable. In fact, if I were looking at these data from an unknown, I'd be doing cheating chi-squares, since they seem, if anything, too insignificant—8 variables times three regressions, and none gets close to the 10% mark. In this case, I attribute the extra flat outcome to the mind numbing boredom of tasting nearly identical shots over and over: lot's of "whatever dude" scores. In any case, this reverse anomaly shows that any distinction between fresh and frozen was minuscule.
If one does add in the 4 tasting days and 3 tasters as extra variables, one ends up eating up most of the degrees of freedom with interaction effects. However, there are some significant results. Randy disliked the 4 week frozen on the first round, fresh out of the freezer, but liked it on the fourth day. Ken's preference for the 8 week frozen got less as it aged. Bob liked the fresh coffee on the rotary with dull burrs the most in both his trials. These results, along with some even more indescribable ones, register as statistically significant, but are definitely in the "so what and who cares" category.
Ken and I discussed the test design since summer 2006. I did some triangle and 2 of 5 cupping tests of fresh and frozen coffees in order to get some sense of how to describe the differences and pose the questions. I was able to pick out the frozen cup or cups from the fresh ones better than chance; but could get no good verbal handle of how they tasted different. When I did the same test with two successive fresh roasts, I achieved the same success rate separating out cups. So my preliminary research found nothing specific to test or taste for. I had two hypotheses which both got shot down—that frozen coffees age faster once unwrapped, and that frozen coffees taste cartoonish, with missing subtleties. Neither panned out in the least.
Ken's test design reflects this— it basically shows there is no difference between fresh and frozen under normal espresso making conditions. Ken never believed frozen coffee was worse. He designed a test that would have disproved his belief had a large, or even a minor but systematic, difference existed.
Now the ball is with the anti-freeze people, not to nit pick his results, but to announce something like "the difference between fresh and frozen is this, here's how you taste for it, here's how to set up a blind test." Any sort of "freezing is just bad—put some vague reason here—" is, after this test, simply BS. The test moves the debate to the realm of discussing narrow differences, precisely specified."
Thanks go to my fellow tasters, Bob and Randy, whose buzzed out participation was essential to the conduct of this study, and to Abe Carmeli, Dan Kehn, Jeff Sawdy, Andy Schecter and John Weiss for their editing assistance. And for the 127th time, I am indebted to Jim Schulman for his efforts in both experimental design and data analysis. This study could not have been completed without his assistance.