Easy & Powerful Alexa Skills with TypeScript and ask-gib

tl;dr

The ask-gib library empowers you to code Alexa Skills with minimal plumbing and maximal speech dynamics. You still have to create and configure both your AWS Lambda function and your Alexa Skill in the AWS developer portal, but when it comes to coding, using ask-gib grossly simplifies the entire process with:

  • TypeScript & vscode —makes developing JavaScript apps enjoyable, with strong typing, great intellisense, refactoring, and more. Very active community and works well cross platform.
  • FuncyAlexaSkill— write self-contained “transform” functions with plumbing handled for you. You list an array of functions against the incoming intents, and if your transform applies you return the next state. If it doesn’t, then return null. Ridiculously simple.
  • Lex— data helper for alternative-driven data. With Alexa Skills, you don’t want to always say “Hi” every single time. Sometimes you want to say “Hi”, sometimes “Hello”, and maybe a “Howdy”, with possibly a dash of “Ahoy There” thrown in for kicks. Lex enables you to design your data so that your calling code is the same, but you are provided at runtime with alternatives. You can even control the probability of alternatives for easter eggs and many other extras. Oh, did I mention that it also does i18n? You can start with no language explicitly specified and then add alternatives when you decide to branch out and translate.

ask-gib — a little context…

When I started fiddling with Alexa Skills, I used one of their original Space Geek demo apps. I wasn’t super familiar with JavaScript and I was used to typed languages. I was actually a little more familiar with TypeScript, so I modified the demo, put a bunch of work into the type definitions going through the (then) entire JSON API reference, and I locked down the skill architecture quite a bit. Thus was born ask-gib.

It mainly was only useful if you happened to prefer TypeScript over the official plain JavaScript helper library, which at that time was basically a repo of (very well-written) examples. I encapsulated this behavior in AlexaSkill. But you still had to know about requests and wire up a bunch of things, and you still had to write plumbing for data, extracting slots, etc. It was very lightweight but nothing to write home about.

NB: You can find the current, very well-written, robust, and official Alexa Skills Kit lib here. I prefer the conveniences of TypeScript with auto-complete and strong-typing, and I certainly prefer the overall syntax and feel of the code of ask-gib. I like to dumb things down and idiot-proof them for my future coding self — plus I write a ton of documentation as I go (but I still need more!). 😃


Getting Func-y (sort of)

For the past few years I’ve been slowly delving into the functional world, first studying Erlang and then Elixir — each being a beautiful functional language (OK, Erlang’s syntax being beautiful may be a stretch, but the parallelism of the BEAM and overall design is absolutely gorgeous IMO).

I’m still far from a functional guru, but I took inspiration from the simplicity of functional composition and reduced cognitive load while still being within the OO paradigm. So I built the FuncyAlexaSkill with the aim of ridiculous simplicity when creating new Alexa Skills, while still allowing for arbitrary growth of complexity.

Here’s how it works. When authoring your skill, you simply write functions in a pipeline associated with intent names.

"MyIntent": [transformA, transformB, ...]

When an incoming intent (or launch request) is received, each transform in the corresponding pipeline array gets called. This makes it plainly obvious to the programmer the order in which each is executed.

Here is the signature of each SkillTransform transform function:

export type SkillTransform = { 
(stimulus: Stimulus, history: SkillState[]):
SkillState | null;
}

So inside each function, you can examine both the incoming state (stimulus), and all past state (history) and decide if your function applies. If it does, then you produce the next state. If it doesn’t, then you return null. All of the plumbing for persistence of this stuff is handled for you, including storing session information in a DynamoDB table. You just have to write the transform functions.

NB: There is a 24 kB response size limit for Alexa Lambda functions, and FuncyAlexaSkill maintains a complete history in session of your skill’s states. So for most Alexa Skills that require persistent session state, you’ll need to create a DynamoDB table and initialize the skill class with its name. The actual persistence plumbing is done for you.


Trivial Example

So say you have a WelcomeIntent which indicates that the user just launched your Alexa Skill. At the simplest level, you could write something like this:

transformWelcome: ask.SkillTransform = (
stimulus: ask.Stimulus,
history: ask.SkillState[]
): ask.SkillState => {
    // Retrieve data from our lex, a powerful data helper
// for providing alternatives.
let welcome = lex._('welcome');
    // Use SpeechBuilder's fluent interface to compose the 
// output and reprompt OutputSpeech.
let output = ask.SpeechBuilder.with()
.ssml(welcome.ssml)
.outputSpeech();
    let reprompt = ask.SpeechBuilder.with()
.ssml(welcome.ssml)
.outputSpeech();
    // Create an interaction object that contains all of the
// interaction state for this request/response cycle.
let interaction: ask.Interaction = {
stimulus: stimulus,
type: "ask",
output: output,
reprompt: reprompt
}
    // Create the next SkillState which will be accessible in the
// next request's history.
let nextSkillState: ask.SkillState = {
id: h.generateUUID(),
interaction: interaction,
location: location
}
    // And we're done!    
return nextSkillState;
}

Each transform function follows the same basic structure:

  1. Check incoming stimulus and history. If doesn’t apply, immediately return null.
  2. Build up what you want Alexa to say, and optionally produce other content like a card.
  3. Create state objects for this Interaction and the entire SkillState and return the SkillState.

So for #1, this is the simplest possible function, in that it doesn’t examine any state of the current stimulus or the past history. It just assumes if you have gotten here, you want Alexa to say the welcome ssml and prompt.

For #2, we use Lex to retrieve the “welcome” lex data. But note that this is not just a single piece of static data. Lex is very powerful and I’ll dedicate the next blog to it alone. But for now just realize that this will retrieve a single datum from all available data alternatives that correspond to the id of “welcome”! This includes not only i18n localization, but alternatives within a language to produce variety and avoid sounding monotonous and scripted. For example, this example might have randomly picked from your defined data <p>Welcome!</p>, <p>Welcome friend!</p>, <p>Welcome amigo!</p>, etc.

The SpeechBuilder provides you with a fluent interface for building up OutputSpeech objects. You can add ssml, text, even weave existing OutputSpeech objects. TypeScript + vscode auto-complete makes this a breeze to work with.

Once you have your interaction data (which could also include card title/content), then you simply build the Interaction and SkillState objects and that’s it. Everything else is handled for you.


Starter Template and Code Excerpt

You can use the modalGib as a starter template to check things out. It also includes some convenient npm scripts for building your bin.zip file to upload to the Lambda function, as well as the vscode config. With these in place, you can create the bin.zip file with a single ctrl+shift+B(build) command.

NB: The first build requires an existing bin.zip file. You can create this manually, or tweak the build tasks for a more robust deploy scenario. This has sufficed for my needs so far.

And here is an excerpt from actual code in modalGib (minus some of logging).

constructor(appId: string, dynamoDbTableName: string) {
super(appId, dynamoDbTableName);
   let t = this, lc = `ModalGib.ctor`;
try {
// Order is important! The first transform that handles
// the stimulus + history wins. Transforms that do not
// apply will return null as the next skill state.
t.transformsByName = {
"GetModeIntent": [t.transformGetMode],
"AMAZON.HelpIntent": [t.transformHelpDefault],
"AMAZON.RepeatIntent": [t.transformRepeat],
"AMAZON.CancelIntent": [t.transformGoodbye],
"AMAZON.StopIntent": [t.transformGoodbye],
"ThankYouIntent": [t.transformGoodbye],
}
t.transformsByName[t.getLaunchRequestName()] =
[t.transformGetMode];
} catch (errFunc) {
h.logError(`errFunc`, errFunc, lc);
throw errFunc;
}
}
// further down the file...
transformGetMode: ask.SkillTransform = (
stimulus: ask.Stimulus,
history: ask.SkillState[]
): ask.SkillState => {
let slots =
stimulus && stimulus.intent ?
stimulus.intent.slots :
null;
let dayOrNth = t.getDayOrNth(slots);
let dayNumber = t.getDayNumber(dayOrNth);
let mode = lex._('modes', {
specifier: data.modesByDay[dayNumber],
lineIndex: 0
});
let speak =
lex._('daysMode', {
keywords: [ dayOrNth.toLowerCase() ],
capitalize: "uppereach",
vars: { 1: dayOrNth, 2: mode.text }
});
    let output = ask.SpeechBuilder.with()
.ssml(speak.ssml)
.outputSpeech();
    let interaction: ask.Interaction = {
stimulus: stimulus,
type: "tell",
output: output,
}
    let nextSkillState: ask.SkillState = {
id: h.generateUUID(),
interaction: interaction,
location: "home",
}

return nextSkillState;
}

Summary

And so, if you’re interested in creating an Alexa Skill, and especially if you’re like me and prefer TypeScript with all of its compile-time safety and auto-complete goodness, ask-gib could provide you with much reduced plumbing with FuncyAlexaSkill, some real power for creating dynamic, alternative-driven data with Lex, and more 😄

Alexa, Simon Says Happy Coding!