Wednesday, April 20, 2016

Note Selection - How fast do we need to be?

A Different Approach

As you probably noticed, I was lucky enough to have a guest post from Johan Berglund, who showed how to use a simple algorithm that reads all of the switches (keys) of the electronic woodwind, and adds or subtracts semitones from the "center" MIDI note that the instrument produces with no keys pressed.

After reading Johan's post, I thought some of you might be interested in how important it is to have a fast algorithm for map key events to MIDI note on/note off events. For the impatient: it doesn't matter very much. 

Johan's Algorithm

With no keys pressed, Johan's instrument will play a MIDI note 61, which is a C#4 (one half-step above middle C on a piano). Pressing the left-hand index finger switch will lower the pitch by 2 semitones, which means his instrument will produce a MIDI note 59, or a B3 (one half-step below middle C). When more than one key is pressed, all that's needed is to sum all of the effects of each key, although though there are a few exceptions where several keys interact, so some logical AND and NOT operations are needed.

My Lookup Table Approach

By contrast, my approach uses a lookup table. I scan all of the switches, and put the value of each switch (on/off) into a bit array. So if only the left-hand index finger of Johan's instrument is pressed, my bit value would look like 00100000 00000000. If all of the 14 keys in Johan's instrument are pressed, the bit value would look like 00111111 11111111. The leftmost 2 bits are always zero because we're putting these 14 key switch values into a 16-bit number, so 2 bits are always zero.

I maintain a table that has two columns: a fingering bitmask value, and a MIDI note, like this:

Bitmask            MIDI note
00100000 00000000  59

To figure out which note our instrument should be playing, we read all of the switches and pack them into a bitmask, then start at the beginning of this table and compare the current switch on/off values with the bitmask in the table. When we find a match, we know which MIDI note to produce.

Now, if we naively create a table that has every possible combination of those 14 switches, we'll end up with a table with 2^14 slots, or 16,384 slots. Since the bitmask is 2 bytes (16 bits) and the MIDI note is 1 byte (8 bits), my table will occupy at least 3 * 16,384 = 49,152 byes, or 48 kbytes. That's way more RAM memory than the Arduino Uno has (2 kbytes), so we need to be smarter here.

One trick is to realize that not all fingering combinations need to be considered valid. For example, no saxophone player will use a fingering that includes the ring finger of the left hand and the pinky of the right hand (unless they're doing some sort of weird extended technique). So we can get rid of the majority of the entries in our table because they'll never be used. So, really, we only need as many entries in the table as there are notes that our instrument can produce, plus any alternate fingerings (two ways of playing the same note). Since Johan's instrument produces all 12 semitones of the chromatic scale across two octaves, if we conservatively allow for 3 alternate fingerings for each of those 24 notes, we only need 72 table entries for a total size of 216 bytes. That's more like it!

Let's Race!

Now that we have a compact way to represent all the fingering-to-MIDI note mappings, let's think for a bit about efficiency. In a drag race to convert a fingering to a MIDI note, will Johan's code or my code be faster?

(Caveat: it's been a long time since I took my computer architecture class, so I may bork some of this!)

The Atmel AVR chip in the Arduino Uno has the following timing characteristics, according to

  • Arithmetic operations take one clock cycle, except for multiplication, which takes two cycles
  • Reading data from memory takes two clock cycles
Armed with this knowledge we can get a rough idea of how many clock cycles each method takes:

Johan's statement:

fingeredNote=startNote-2*LH1-(LHb && !(LH1 && LH2))-LH2-(LH2 && LH1)-2*LH3+LHp1-LHp2+(RHs && !LHp1)-RH1-(RH1 && LH3)-RH2-2*RH3+RHp1-RHp2-2*RHp3+12*OCTup

 I count:
  • 22 memory accesses (44 clock cycles)
  • 5 multiplications (10 clock cycles)
  • 22 arithmetic or logical operations (22 clock cycles)
So Johan's code should take about 76 clock cycles

My code's performance will depend on which fingering is selected. If we get lucky and the fingering the player is using is the first entry in our lookup table, we'll only need to do one comparison. If we're unlucky, we'll need to look through all of the table. Let's consider the worst case, because no musician wants their instrument to slow down when she plays certain notes.

So let's assume we have to look through all 72 entries. Each table lookup involves:
  • 2 memory accesses, to bring the two bytes of the mask into registers (4 clock cycles total)
  • 2 clock cycles to compare those byes
And finally, when we have a hit, there's a memory access to retrieve the MIDI note (2 clock cycles) and one to store it (2 cycles).

So, worst case, the table lookup is 72 * (2 + 2) + (2 + 2)  = 292 clock cycles

Checkered flag to Johan!

So, clearly, Johan's code is more efficient. But how much does it really matter? Let's look at the clock frequency of the ATMega328 chip found in the Arduino Uno - 16 Mhz.

That means that each clock cycle takes 1 / (16 * 10^6) seconds, or about .1 microseconds. Both of our algorithms will execute in less than 3 microseconds (0.000003 seconds), which is really really fast, compared to the 0.1 second delay that a human can perceive.

Understandability Wins

The take-away here is to realize that, in many or most cases where you're reading a sensor in your electronic musical instrument project, and then doing some computation, you will almost never need to worry about the efficiency of your algorithm*. Use the algorithm that is the simplest to implement, and that makes sense to you.

Hope this was helpful! In my next post I'll show how to use a cool capacitive touch sensor you can buy for $8 to make an instrument like the Akai EWI.

- Gordon

 *Computer science nerds: Yes, I'm intentionally omitting algorithms that have quadratic behavior here. On microcontrollers, even if the algorithm is quadratic, it's hard to make n very big, given typical microcontroller memory sizes.


  1. Very interesting reading! I was curious of exactly this! Was impressed how little time any of the algorithms actually take to execute. Your conclusion is of course on the spot right. Thanks! -Johan

  2. You got really close with your estimation of the number of useful fingerings on the EWI btw. This is the most extensive listing of useful fingerings I've found:

    That's 85 fingerings if I got it right.