Acoustics of Coffee Roasting: Machine Learning Edition - Page 2

August 24th, 2016, 12:28 pm

I'm developing this on Windows (with hopes of moving everything to rPi in some distant future) so now I need to get pyAudioAnalysis running on my development machines. Here is the procedure I used to get all of this running on a pair of Windows 10 systems:

Download and install the development environment tools:

Download Python 2.7 installer (probably easiest to accept the default 32bit version): https://www.python.org/downloads/
Run Python 2.7 installer. Install for all users, accept the default install location of C:\Python27, then select the installation option "Add python.exe to Path" at the bottom of the Customize Python step.
Download and Install GitHub Desktop from https://desktop.github.com/
Download "scipy-0.18.0-cp27-cp27m-win32.whl" from http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy
Download "numpy-1.11.1+mkl-cp27-cp27m-win32.whl" from http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy
Note that these last two requirements can only be found on this one dude's .edu home directory for some reason, and it is slow as heck. Be patient.

Install and configure the development libraries:

Open an admin command prompt
You'll likely start in C:\WINDOWS\system32, so let's head over to the default Python scripts folder before we begin:
```
cd /d C:\Python27\Scripts
```
Paste the following to make sure pip is all up-to-date. The first command kicked out a bunch of red errors but worked anyway.
```
pip install --upgrade pip
pip install --upgrade urllib3[secure]
```
Install the two compiled libraries you've downloaded. Replace <path to downloads> with the full path to wherever you've deposited these things, add quotes if spaces are involved.
```
pip install <path to downloads>\scipy-0.18.0-cp27-cp27m-win32.whl
pip install <path to downloads>\numpy-1.11.1+mkl-cp27-cp27m-win32.whl
```

Now we can install the required libraries. Paste the following to get everything rolled out:

pip install matplotlib
pip install sklearn
pip install hmmlearn
pip install simplejson
pip install eyed3

Download and test pyAudioAnalysis

Clone the pyAudioAnalysis repo with git:

git clone https://github.com/tyiannak/pyAudioAnalysis.git

Change to the new folder:
```
cd pyAudioAnalysis
```
Finally, test the installation by running an analysis against the included test data:
```
python audioAnalysis.py fileChromagram -i data/doremi.wav
```

If everything worked correctly the script should open a window that looks something like this:

Now to see if I can't actually train something...

August 24th, 2016, 2:44 pm

RESULTS!

It appears that the 32bit version of everything I've installed pukes on my recordings with memory errors. I'm going to need to see if I can scrape together enough 64bit versions of everything to make this work, but for the time being I've resorted to slicing up the recordings into 3-4 minute chunks which seems to work.

For purposes of this exercise I'm working with 3 files:
trimmed_recording_16-08-23_1820.wav - 3 minute recording used to train the model.
trimmed_recording_16-08-23_1820.segments - The "segments" file which specifies which class each segment of the file belongs to (see below)
trimmed_recording_16-08-23_2136,wav - A 4 minute recording similar to the first, to use as our test case for running the trained model against

The segments file looks like this:

0,60,environment
60,115,firstcrack
115,181,environment

This file tells the classifier that the first 60 seconds of the recording are environmental noises, the next 55 seconds are to be classified as "first crack", and the remainder of the recording is once again to be classified as "environment".

These files have been saved into a folder named "recordings" under the "pyAudioAnalysis" folder created in the previous post. From that folder I then run the following command to create a Hidden Markov Model classifier named "HMMfirstcrack"

python audioAnalysis.py trainHMMsegmenter_fromfile -i recordings/trimmed_recording_16-08-23_1820.wav --ground recordings/trimmed_recording_16-08-23_1820.segments -o HMMfirstcrack -mw 0.1 -ms 0.1

Then I can run an unclassified recording through this model:

python audioAnalysis.py segmentClassifyFileHMM -i recordings/trimmed_recording_16-08-23_2136.wav --hmm HMMfirstcrack

The trained classifier then generates output that looks like this:

The spikes coincide with the sounds of first crack. IT WORKS!

#13: Post by **btreichel** » August 24th, 2016, 3:15 pm

Interesting! Any idea if it will need to be trained for each machine that its on. I'm asking because i'm presuming that its learning and suppressing the background noise from the target. I would presume, that if the algorithm is robust enough the answer would be no.

August 24th, 2016, 3:34 pm

I suspect that it might in fact need to be trained per machine, although I also think there may be a way to make that process a little less painful. What I have in mind is to start/stop the recording w/ Artisan (already done), then parse through the Artisan .alog file and use the FCs/FCe/SCs/SCe events to automatically segment and classify the matching recording. That way a user could use Artisan to manually flag events the way you always have, and use that to train the model against your machine in your environment.

This is going to take some work and I've got some fairly serious hurdles to deal with first. I either need to figure out a way to get this to ingest a large file (using 64bit Python and libraries maybe?) or split the recordings. I need to dial-in the classification process a bit, I need to figure out a way to parse those logs in a way people can use (I tend to use PowerShell for everything as I'm not a Python guy which becomes problematic for non-Windows users), and finally I need some way to classify data in realtime. That last one is going to be a pain because the included scripts require ALSA which is Linux only.

Having said all that, I'm happy to accept any help on knocking out some of these tasks out if anyone has the chops for it...

#15: Post by **btreichel** » August 24th, 2016, 7:00 pm

I'd volunteer to help but I lack either the skill set or hardware. I went from a Linux netbook to an Android tablet years ago. My laptop was salvage from my wife and was intended just to run artisan.

#16: Post by **happycat** » August 24th, 2016, 11:17 pm

For file size, why 44100 16 bit sampling? Why not shrink down the file size using just enough quality to resolve a flurry of cracks?

August 25th, 2016, 9:55 am

I'd like to say that it was an intentional design choice based on earlier work showing a significant amount of spectral energy above 10khz (particularly for second crack) but the real answer is that I just picked an arbitrary value that I figured would capture "enough" of the spectrum and 44/16 was a safe value. The total recording for a 10 minute roast is on the order of 80mb which isn't too bad for the training period. Once I have a reasonable corpus it would be straightforward to transcode down in bitrate/sample freq and then test both training and classification accuracy and performance.

#18: Post by **btreichel** » August 25th, 2016, 1:37 pm

Will it let you do a band pass or other filter?

August 25th, 2016, 3:47 pm

I'm using SoX for recording which supports all manner of filters (including bandpass) but the spectrum of the cracks is broadband. The intent here is not to dive into the weeds regarding the characteristics of the recorded signal itself. That approach has been tried and I don't find a lot of positive results published as a result of those efforts. Instead, I'm looking to grab as much data as possible to throw at training the classifier and let make the determination as to what it "thinks" FC and SC sounds like.

There may be room for optimizing the recording of training data down the road, but I have enough storage and compute to cast a wide net during development and I don't want to start with throwing away data that might be required later.

August 27th, 2016, 10:35 pm

It works!

This morning I roasted a batch while recording everything with SoX, splitting into 1 second chunks. I manually copied the 1 second chunks into folders named "environment" and "firstcrack". I ran the training engine against these samples, using the k-nearest neighbor (KNN) and support vector machine (SVM) algorithms to see how they each performed using default settings.

cd /d "C:\Python27\Scripts\pyAudioAnalysis"
python audioAnalysis.py trainClassifier -i "recordings\environment" "recordings\firstcrack"  --method svm -o RoastLearnerSVM

python audioAnalysis.py trainClassifier -i "recordings\environment" "recordings\firstcrack"  --method knn -o RoastLearnerKNN

I then bodged together an unholy pile of batch scripts to wrangle my temp/humidity sensor, launch the SoX recording engine, launch both classifiers, and then to collect the various outputs from each in a way that doesn't require Artisan to wait during each collection interval.

After some dry runs I charged a new load of beans and let the trained classifiers run against the recording of the roast in realtime, collecting the results and charting them in Artisan. In the image above you see the KNN classifier in green, SVM in purple. FCs/FCe were marked manually. With no manual tuning the KNN model is extremely accurate, zero false positives and a very close alignment to my own entry of the FCs/FCe events by ear.

Here's what the console looks like while the roast is running, the two classifiers are the bottom 4 LCDs on the right.

The probability is reported for each of the two classes I'm currently training (environmental/firstcrack) on a scale of 0-1. In the device assignment configuration I've multiplied that out by 100 to obtain a percentage. FCe is long past in this screenshot and both classifiers are 100% confident that they aren't hearing first crack.

Here's what the device setup looks like:

When I hit 50 degrees before my set charge temp an Artisan Alarm starts up the SoX recorder, temp/humidity monitor, and acoustic classifiers in the background. Here's what they look like when running:

I'm only classifying two sounds for development purposes. The classifiers can handle an arbitrary number of bins, so if you wanted to detect the charge by the sound of beans rushing into the drum, that's totally doable. Same for second crack, drop, or anything else your microphone can pick up.

While these results are encouraging, the pile of batch files I have holding all of this together is extremely unportable. I'm going to need to do some significant work before I can hand this over to someone else with any hope of it working. It'll be 100% Windows as I have next to zero Python experience to make this portable to other platforms. I'm going to keep monkeying with this, tuning the windows for accuracy and performance.

If anyone thinks they're ready and serious about getting this running, let me know and I'll switch gears toward cleaning up the code to remove things like hard-coded paths and filenames. Otherwise, I'll keep hacking on this to get a better understanding of how to improve accuracy and how to tune the models appropriately.