Hacking the Alexa grammar

Amazon EchoFor Christmas, I got myself an Amazon Echo Dot (and I wasn’t the only one). For me, it’s been a fun and more convenient way to play music in our living room area, and I’ve been listening to more music as a result. I also had the idea that it would be nice to build some speech-driven interfaces to things.

It has been over a decade since I did speech recognition work. Speech recognition was used in one of the early projects that I did after I first left Uni, where I was part of a team that built a speech-driven personal assistant. It was a little ahead of its time, and never went anywhere.

Still, I thought I could put those slighty-rusty skills to use on the Echo, since Amazon provides a way to create Alexa skills (the name given to apps that run on the Echo behind the scenes). My idea was to use the Echo to provide a way to help the kids with maths, since they love to talk to “Alexa”.

Last week, my skill was published on Amazon’s list of Alexa skills. It allows someone to say “Alexa, tell me now if 1 plus 1 equals 2”, and will respond by saying that they’ve gotten this correct (or not). Unlike the basic Alexa functionality of doing maths, where someone might say “Alexa, what is 1 plus 1”, this skill forces the speaker to offer the answer and have it checked. This should be useful to anyone wanting to test their maths, and it supports addition, subtraction and multiplication. Basic users would probably use small numbers, but advanced users can use large numbers – the skill supports it all. Not negative numbers or zero, though!

Doing the coding behind this was straightforward; it was some simple Node.js code that runs on AWS Lambda. What was less straightforward was sorting out the grammar to use.

In speech recognition, the word “grammar” refers to the set of different phrases that an application can recognise at a point in time. A simple grammar is one that consists of just the phrases “yes” and “no”. A complex grammar might include every product for sale on Amazon itself and different ways to order them. The grammar is used by the speech recognition engine to improve its recognition, since it doesn’t need to always listen for every possible word in English, but only the specific words that are contained in the grammar.

To develop an Alexa skill, you need to hack together the basic Alexa grammar, together with an “invocation name”, and then the grammar that the skill itself can recognise. (Here, I’m using the word hack in its art-of-programming sense, not in the computer-intrusion sense.) Usually, the invocation name is a pronoun, e.g. “Dog Facts”, “Starbucks” or “GE Podcast Theatre”. However, it can be any set of words, and there is alternative dog fact skill that uses the invocation name “me a dog fact”.

This last one doesn’t seem to make sense until you remember that there is a grammar that comes before the invocation name. It starts with a “wake word” (one of “Alexa”, “Amazon” or “Echo”), then a variety of commands based around words like “tell”, “ask”, “start” or “open”. So, the invocation name gets added to this grammar, e.g. “Alexa, tell” + “me a dog fact” which makes a lot more sense.

Amazon publishes a list of constraints relating to invocation names. For my application, it would have been easiest to develop it using an invocation name like “Math Test” and then users would interact with it like “Alexa, ask Math Test to check if 1 plus 1 equals 2”. However, I wanted to see if I could do something that was easier for users.

Initially, I tried out the invocation name “me if”, which would produce nice interactions like “Alexa, tell me if 1 plus 1 equals 2”. However, using “if” violates one of Amazon’s constraints around invocation names, so I needed to find something else. That’s how I ended up with “me now” as my invocation name. Interactions become slightly longer, but still workable, like “Alexa, tell me now if 1 plus 1 equals 2”. To make this approach obvious to users, the skill is named “Tell Me Now”.

Now, I just need to get the kids to speak to Alexa about maths instead of music.

Three Impossible Things

I am sharing three examples of things that I was impressed to find existing. As they exist, they are clearly not impossible; a more accurate word might be inconceivable. Until I came across them, I had no conception of this stuff, and learning about them simply makes me glad. It also reminds me not to assume that something’s impossible just because I’d never heard about it.

Drilling a square hole

It turns out that you can drill a square hole, if you use a drill bit that’s based on a Reuleaux triangle and mount it on a special chuck. Such a thing was built by a guy called Harry Watts in 1914 and apparently you can still get them from the Watts Brothers Tool Works. The resulting hole has slightly rounded corners for practical reasons, but it still has four straight edges at 90 degrees to each other.

Assemble “Stonehenge” without a crane

A retired carpenter has shown on his site how he was able to assemble two vertical pieces and a capping piece (a la Stonehenge) by himself and without a crane. He also demonstrates some techniques that might have been used to move heavy stones in ancient times for other projects. Exactly how they did this will be a mystery, since they didn’t document it and aren’t around anymore, but it’s interesting to see simple techniques that would have made it straightforward.

Sharing a cake fairly

Of course, it’s easy to share a piece of cake two ways, while maintaining fairness (or “envy free”, i.e. no one feels someone else has a bigger piece) – one cuts, the other chooses. But, how to do it for more than two people? Well, in 1995, Brams and Taylor published a procedure for sharing between any number of people, involving cutting more pieces than necessary and taking turns trimming them. Assuming the people involved understand the proof, they should be happy that a fair distribution of the cake has been made, even if they each risk ending up with multiple pieces of different sizes.