www.seattlecoffeegear.com: let us help you find the right gear

Espresso equipment test methodology

Postby cafeIKE on Sat Jun 16, 2007 9:32 pm

Split from Titan Grinder Project by moderator...




Ken Fox wrote:Perhaps when this project is finished, or at the end of it, a real, statistically valid blind tasting can be done on at least a couple of grinder pairs being studied here. It may be that if the differences in the shots are really so obvious, that the tester should be asked not only to pick a preference on the shots but also to try to identify which grinder produced it.

These kinds of studies are a huge PITA to design, and even worse to execute.

I am not volunteering . . . . .

ken

To be valid, you'd need several of each grinder model to ensure differences were not summations of manufacturing tolerances.

Assuming a particular grinder type is prefered, there is problem of association:
Grinder A on Machine B with Coffee C at Temperature D vs Grinder B on Machine C with Coffee D at Temperature E.
User avatar
cafeIKE
 
Posts: 2905
Joined: Jun 27, 2006
Location: Woodland Hills, CA

Postby Ken Fox on Sun Jun 17, 2007 12:56 am

cafeIKE wrote:To be valid, you'd need several of each grinder model to ensure differences were not summations of manufacturing tolerances.

Assuming a particular grinder type is prefered, there is problem of association:
Grinder A on Machine B with Coffee C at Temperature D vs Grinder B on Machine C with Coffee D at Temperature E.


Ian,

You give a seemingly conclusive argument for never doing any kind of controlled coffee research, because you can never possibly know all the variables at play and hence never possibly control for them. The result is that you would have to do an undoable study, so complex and involved, so time-consuming and expensive, that you just throw up your hands and give up without trying.

I find this a sort of rubbish argument, one that says that doing research of any sort isn't worth the effort because you can't possibly account for all the variables. The beauty of the scientific method, when used properly, is that the researcher lays out all the known variables, how it was attempted to control them, and the methodology used to execute the study. Regardless of the results, the onus is then on the reader to design another study (hopefully based on a knowledge of the issues and why a particular result may have occurred) that will show why the first study reached the conclusions that it reached and why those conclusions were either in error or misleading.

Science does not generally advance by leaps and bounds, rather it advances in small steps. If we make no effort to go beyond the descriptive, e.g. "I just bought this huge honkin' grinder and lookit the coffee it makes! Sooooooo much better than that old cheaper grinder I used to own!," then we in fact learn next to nothing from experience. If instead an attempt is made to use the scientific method to study things, we must accept the limitations of that method but at the same time acknowledge that those results are better than anything else available to us.

I invite you to design and execute your own studies and to publish them here or in any other readily accessible venue. It is however much easier to sit around on the periphery and throw stones, than it is to design and execute a study. I guess that is because, as you point out, you really can't ever "prove" anything.

ken
What, me worry?

Alfred E. Neuman, 1955
Ken Fox
 
Posts: 2433
Joined: Oct 28, 2005
Location: Idaho

Postby cafeIKE on Sun Jun 17, 2007 7:14 pm

Ken,

Fine rifles are test fired at completion of manufacturer. The very best are set aside and sold as "One of One Hundred" or "One of One Thousand" on larger production. These "One of" units are coveted by marksmen and competition shooters. The run of the mill units are very accurate, but unlikely match winners or better than competing brands.

In high quality optics, Leitz, Zeiss, Swarovski, Nikon, Canon et al., considerable unit to unit variations exist, readily apparent to a skilled observer. All units pass rigorous manufacturing standards but if one selects from a large sample there is usually one that is head and shoulders better than its brethren. The difference is close that between TV and HiDef. When someone tries the missus' binocs and acclaims their clarity and brilliance we are scrupulous to inform they are "One of..."

When we are talking bimodal distribution of particles with size variations in microns and typical manufacturing tolerances maybe large enough to radically shift the nodes, it behoves the testers to determine the entire class, not just a specific unit, is worthy of consideration.

And since I'm stowing thrones, shipping the grinders all over hell's ½ acre gives me pause... :wink:

Here's a granite pebble for your burrs: Tester A grinds 20 pounds on new burrs, then ships to Tester B who grinds 30 pounds, then ships to Tester C. Tester C finds after 40 pounds that his results are diametrically opposed to Testers A and B. Revelation or Rubbish?

When one knows the variables and does not account for them, one may have rubbish results. First, Test the Test.

I'm all for 'testing', more correctly 'sampling', provided the results are
    relevant to the audience at large.
    not just laboratory knob-dicking.
    dismissable for lack of rigor.
Anything that will someday enable us to have consecutive, not even necessarily identical, gShots... 40 years and counting.
User avatar
cafeIKE
 
Posts: 2905
Joined: Jun 27, 2006
Location: Woodland Hills, CA

Postby Ken Fox on Sun Jun 17, 2007 9:32 pm

cafeIKE wrote:Ken,

Fine rifles are test fired at completion of manufacturer. The very best are set aside and sold as "One of One Hundred" or "One of One Thousand" on larger production. These "One of" units are coveted by marksmen and competition shooters. The run of the mill units are very accurate, but unlikely match winners or better than competing brands.

In high quality optics, Leitz, Zeiss, Swarovski, Nikon, Canon et al., considerable unit to unit variations exist, readily apparent to a skilled observer. All units pass rigorous manufacturing standards but if one selects from a large sample there is usually one that is head and shoulders better than its brethren. The difference is close that between TV and HiDef. When someone tries the missus' binocs and acclaims their clarity and brilliance we are scrupulous to inform they are "One of..."

When we are talking bimodal distribution of particles with size variations in microns and typical manufacturing tolerances maybe large enough to radically shift the nodes, it behoves the testers to determine the entire class, not just a specific unit, is worthy of consideration.

And since I'm stowing thrones, shipping the grinders all over hell's ½ acre gives me pause... :wink:

Here's a granite pebble for your burrs: Tester A grinds 20 pounds on new burrs, then ships to Tester B who grinds 30 pounds, then ships to Tester C. Tester C finds after 40 pounds that his results are diametrically opposed to Testers A and B. Revelation or Rubbish?

When one knows the variables and does not account for them, one may have rubbish results. First, Test the Test.

I'm all for 'testing', more correctly 'sampling', provided the results are
    relevant to the audience at large.
    not just laboratory knob-dicking.
    dismissable for lack of rigor.
Anything that will someday enable us to have consecutive, not even necessarily identical, gShots... 40 years and counting.


Design your own studies and DO them. Point out the flaws in what I and others are doing, and correct them with your own work. Until then, I will use the best information available to me.

There is so little unbiased and objective work in coffee, that to read a post like yours is . . . . . . . painful.

ken
What, me worry?

Alfred E. Neuman, 1955
Ken Fox
 
Posts: 2433
Joined: Oct 28, 2005
Location: Idaho

Postby JonR10 on Sun Jun 17, 2007 10:34 pm

cafeIKE wrote:
And since I'm stowing thrones, shipping the grinders all over hell's ½ acre gives me pause... :wink:

Here's a granite pebble for your burrs: Tester A grinds 20 pounds on new burrs, then ships to Tester B who grinds 30 pounds, then ships to Tester C. Tester C finds after 40 pounds that his results are diametrically opposed to Testers A and B. Revelation or Rubbish?


Interesting - but probably moot. These monster grinders are most likely built to process large amounts of coffee between burr changes, so as long as user 1 and user 2 and user 3 only grind "normal for home use" amounts then there should be no signifigant wear in the burrs, yes?

Your point is valid, but I agree with ken in that this should not get in the way of the fun and informative experiment.








Now then, how do I wrangle an invitation to someone's house who is pulling shots from a Mazzer Robur? :D
User avatar
JonR10
 
Posts: 845
Joined: May 04, 2005
Location: Houston, TX

Postby RapidCoffee on Sun Jun 17, 2007 11:22 pm

JonR10 wrote:Now then, how do I wrangle an invitation to someone's house who is pulling shots from a Mazzer Robur? :D


Any time, my friend - as long as it's in the next week or so. After that you'll have to put up with the Super Jolly. :)
John
User avatar
RapidCoffee
 
Posts: 2745
Joined: Dec 11, 2005
Location: Rapid City, SD

Postby Walter on Mon Jun 18, 2007 4:42 am

Perhaps when this project is finished, or at the end of it, a real, statistically valid blind tasting can be done on at least a couple of grinder pairs being studied here.

To read "real, statistically valid" and "blind tasting" in one sentence is mildly amusing. For statistics to be valid you needed a huge number of probes and probants. Done with a few probants and probes the results may still be interesting, but "statistically" they are entirely irrelevant and might just tell us exactly nothing...

Ken Fox wrote:Ian,

You give a seemingly conclusive argument for never doing any kind of controlled coffee research, because you can never possibly know all the variables at play and hence never possibly control for them. The result is that you would have to do an undoable study, so complex and involved, so time-consuming and expensive, that you just throw up your hands and give up without trying.

I beg to disagree here too. Ian's points are, IMHO, very valid and what he does is not more than to point out that we are dealing with very complex matter here. A fact which is ever so often entirely ignored in "pseudo-scientifical" experiments. After all we're not doing "science" here, the extractions are not 100% reproduceable, we're not running e.g. GC/MS examinations of the extracted coffee and comparing the results and so forth. No, we're preparing shots and we're tasting. That is nothing remotely reminiscent of real science.

What I see here is an interesting comparison of good grinders with some really interesting facts and a scientifical touch, like the SEM images or particle size distribution measurements. Nothing more, nothing less. But to consider this "science" is a tad too ambitious, IMO...
User avatar
Walter
 
Posts: 98
Joined: Jul 26, 2005
Location: Graz, Austria

Postby Walter on Mon Jun 18, 2007 4:49 am

JonR10 wrote:Interesting - but probably moot. These monster grinders are most likely built to process large amounts of coffee between burr changes, so as long as user 1 and user 2 and user 3 only grind "normal for home use" amounts then there should be no signifigant wear in the burrs, yes?

Your point is valid, but I agree with ken in that this should not get in the way of the fun and informative experiment.

Yes, but these grinders also need a few kilos of beans during the "breaking in" period until they are "stable". I do not seem to recall having read anything whether or not this has be done before the tests started...

But I entirely agree with you about the "...fun and informative experiment"
User avatar
Walter
 
Posts: 98
Joined: Jul 26, 2005
Location: Graz, Austria

Postby Ken Fox on Mon Jun 18, 2007 11:47 am

Walter wrote:
Perhaps when this project is finished, or at the end of it, a real, statistically valid blind tasting can be done on at least a couple of grinder pairs being studied here.

To read "real, statistically valid" and "blind tasting" in one sentence is mildly amusing. For statistics to be valid you needed a huge number of probes and probants. Done with a few probants and probes the results may still be interesting, but "statistically" they are entirely irrelevant and might just tell us exactly nothing...


I beg to disagree here too. Ian's points are, IMHO, very valid and what he does is not more than to point out that we are dealing with very complex matter here. A fact which is ever so often entirely ignored in "pseudo-scientifical" experiments. After all we're not doing "science" here, the extractions are not 100% reproduceable, we're not running e.g. GC/MS examinations of the extracted coffee and comparing the results and so forth. No, we're preparing shots and we're tasting. That is nothing remotely reminiscent of real science.

What I see here is an interesting comparison of good grinders with some really interesting facts and a scientifical touch, like the SEM images or particle size distribution measurements. Nothing more, nothing less. But to consider this "science" is a tad too ambitious, IMO...


Nothing has been posted in the grinder thread, So Far that would qualify as an experiment using the scientific method. As you state, some scientific tools have been used, however no real scientific experimentation has been done.

On the other hand, there is nothing special about blind tasting as an experimental technique that requires that it be treated differently than any other test methodology. All that statistics will do is to look at the probability that an observation could have occurred on the basis of chance alone, and to assign a numerical probability of that. If a properly designed experiment (and YES, you can do a properly designed experiment utilizing blind tasting) shows that an observed result could have occurred only 5% of the time on the basis of chance alone (5% confidence interval) then YES, ONE out of TWENTY times you might get that result by chance alone. If the confidence interval is 1%, then only 1% of the time you could expect that result by chance. The more the data points, the more the observed difference, then the more you can be confident that your results mean something.

There are also things to be learned from the more descriptive portions of the aforementioned article, even if they do not rise to the level of being scientific research. For one thing, they are additional pieces of information obtained by people generally having no ax to grind. You can't say that about the other information currently out there on these grinders.

Some people here and in other online coffee venues make an attempt to study and explain things, and they go to the trouble of writing these things up for the benefit of all readers. When those writings have flaws they should be pointed out constructively, giving the original author an opportunity to respond. This would constitute constructive criticism. What I can learn absolutely nothing from is people who criticize these efforts of others while offering absolutely nothing in return. If the "research" you see here does not meet your standard, then do some yourself that does, and enlighten us with the results. Isolated criticism with no other contribution is of no value at all, not to me nor to anyone else here.

ken
What, me worry?

Alfred E. Neuman, 1955
Ken Fox
 
Posts: 2433
Joined: Oct 28, 2005
Location: Idaho

Postby cafeIKE on Mon Jun 18, 2007 1:39 pm

Ken Fox wrote:Some people here and in other online coffee venues make an attempt to study and explain things, and they go to the trouble of writing these things up for the benefit of all readers. When those writings have flaws they should be pointed out constructively, giving the original author an opportunity to respond. This would constitute constructive criticism. What I can learn absolutely nothing from is people who criticize these efforts of others while offering absolutely nothing in return. If the "research" you see here does not meet your standard, then do some yourself that does, and enlighten us with the results. Isolated criticism with no other contribution is of no value at all, not to me nor to anyone else here.

ken


Earlier, in the original thread,

Methinks thou dost protest too much. :wink:

Lord knows, we don't need another generation of knick-knack paddy-whackers dumping their perfectly decent grinder and then bemoaning they still can't pull 1 in a row. :roll:

Let me reiterate. I applaud those who undertake projects and make great efforts to communicate their findings. Your work on HX PID, seminal in my Vibiemme decision. Dan's work on grinders last year, the trigger on the MC4. Jim's work on extraction, a large step up the ladder in consistency. Ad infinitum...

Apologies for my 'painful' ineloquent posts and if they appear baselessly critical. My intent is but to ensure the readers are aware there are 'issues' with the 'test' and 'results' are not annointed.
User avatar
cafeIKE
 
Posts: 2905
Joined: Jun 27, 2006
Location: Woodland Hills, CA

Next

Return to Knockbox