autocorrelation – Andrew E. Scott

I’ve been recently writing an app that uses the autocorrelation approach to detect the pitch of musical notes. This approach basically tries to see if a given musical note is in a digital audio signal by comparing each sample with the next sample in the signal that ought to be the same (since a given note should repeat periodically as per its frequency). In exploring how to best do this in my app, I’ve come across something I found curious.

Before I get to that, I need to explain a couple of things. Firstly, I am doing this pitch detection for a particular instrument: the flute. The flute is ordinarily considered to be able to play notes from B3 (the B immediately below middle C, but only if a flute has a “B foot”, otherwise from middle C) to C7 (the C three octaves above middle C). However, very skilled players might be able to get a few notes higher, to F7. Also, the piccolo flute can go up to C8, but we’ll ignore that more now. Given that the frequency of B3 is 246.9Hz and that of F7 is 2,793.8Hz, the 43 notes are spread across about 2,550Hz of frequencies.

The other thing to explain is that CDs (and many electronic devices) use a sample frequency of 44,100Hz. This is considered to be sufficiently high to record and reproduce audio signals up to 20,000Hz, which is the general limit of human hearing. However, a higher sample frequency, of 48,000Hz, is being increasingly used, such as in DAT tapes or DVDs.

These two things come together in autocorrelation because it requires knowing the period of each note, measured in numbers of samples. For example, the audio signal for a pure B3 tone should repeat every 178.6 samples if sampled at 44.1kHz or every 194.4 samples if sampled at 48kHz. Similarly, F7 should repeat every 15.8 samples at 44.1kHz or every 17.2 samples at 48kHz. Except there’s no such thing as a fraction of a sample, so for my autocorrelation calculations, I would round to the nearest sample.

Rounding introduces error, so using a period of 16 samples (at 44.1kHz) or 17 samples (at 48kHz) for F7 is not ideal. In fact, these periods correspond to different frequencies – 2,756.3Hz and 2,823.5Hz respectively. The intervals between musical notes are measured in cents, and there are 100 evenly-spaced cents to a semitone. The frequency corresponding to a period of 16 samples at 44.1kHz is 23 cents below the real F7, and the frequency of 17 samples at 48kHz version is 18 cents above F7. Higher notes are more error-prone, and the corresponding errors for a low note like B3 are 4 cents below (for 44.1kHz) and 3 cents above (for 48kHz).

For my autocorrelation algorithm, some error in detecting pitch is okay, since as long as the flute is playing in tune, if the algorithm was less than 50 cents out, it would always get the right note. So, I wrote some code to look at what the maximum error in cents was in following this approach, considering a range of sample frequencies from 2,000Hz to 60,000, and got a curious graph:

You might be able to see small red dots at the points for 44.1kHz and 48kHz (or you can click through to see a bigger version of the photo). This graph shows the maximum error in cents across all notes in the range between A3 and F7, and it is less than 40 cents for both 44.1kHz and 48kHz. In fact, the maximum error for 44.1kHz (29.3 cents, relating to the note G#6) is less than 48kHz (37.0 cents, relating to the note D7), and 44.1kHz is close to the minimum for all sample frequencies up until about 57.8kHz.

There is a general trend that the higher rates result in lower errors, although I wasn’t expecting that the sample rate of 44.1kHz would have lower maximum error than 48kHz. I wondered if this was due to the specific range I was examining, so I wrote some more code to examine the impact of maximum errors on these two frequencies if I used ranges of notes starting at A3 and finishing at between C7 and C8. Here’s the resulting graph:

As with before, for a note range going up to F7, 44.1kHz has a lower maximum error in cents than 48kHz. However, if the note range had stopped at C7, 48kHz would have a lower maximum error. Also, if we’d gone above A7, 48kHz would also be more accurate than using 44.1kHz but at that point the error would be above 50 cents, i.e. not accurate enough to be useful.

So, curiously 44.1kHz happens to be well-suited to autocorrelation of notes in the flute range. I’m sure this wasn’t a consideration when that was selected as a common sample frequency for audio recordings, but it happens to benefit me now.

Tag: autocorrelation

Mathematical, musical curiosity