Learning BotTalk — Creating Complex Alexa Skills with Simple Markup Language

18 min readMar 23, 2018

Problem

As I began with skill development — I had a great teacher — Charlie. He was much more experienced in Alexa Development, so he quickly pointed at couple of simple drag and drop solutions I could use to create my first simple Alexa Skill.

And although the approach of that tools is questionable (Amazon voice user design guidelines encourage skill developers to separate conceptual mind-mapping and actual implementation of the skill logic), I fairly easily could create my first simple fact skill.

From the first time Alexa answered what I wanted it — I was hooked. I wanted more! Unfortunately I hit a stone wall of limitations pretty quickly.

Variations

“What if I want to introduce variations in my skill’s responses”, — I would ask Charlie. “We all know how boring Alexa can become, when she repeats the same thing over and over again. Over and over again. Over an over again. Well, you get the point. Can I do something about it?”

“Hm… well”, — Charlie would sigh — “you really need either Lambda or your own server for that”.

Context and Sessions

“And what if I wanted to store some important information between sessions? Say, I have a crypto currency skill that asks the user the first time what currency is she interested it. And next time she opens the skill — the skill could just jump to the current price — without asking for the currency name. Is that possible?”

“Hm… you’d need a Lambda and a database for that” — Charlie would sign again.

Testing

“Ok, but please tell me there is some automated testing! My skill is getting pretty complex, and I’m so tired of manually going through it. Is there a way to test the responses? So I could be sure — there are no dead ends or endless loops in the dialog flow?” — I would ask in frustration.

“Hm… well…”

Solution

There were just too many restrictions. It seemed everything beyond simple trivial skills required programming, servers, databases and manual testing.

And moreover — if you were ready to make this step — and set up your own server — you still ended up with a messy architecture. You code needed to respond to Alexa requests as well as handle your own business logic.

There had to be a better solution! But there was none.

So we’ve built it.

BotTalk lets you create complex Alexa skills with simple markup language. How simple? Well — stick around — and you’ll see for yourself!

Creating Your First Skill

To create your first skill just head over to https://bottalk.de and click on the big yellow button “Login with Amazon”

After you authorize BotTalk you’ll be redirected to the main screen. There you’d find all your skills. If you’re using BotTalk for the first time this screen will be empty for you. But not for long. Go ahead and click on the “Create new skill” button.

You’ll see the three-step wizard that will help you create your first skill.

In the basic settings step you’ll be asked to give your skill a name and choose a language for your skill. At the moment Alexa is supported in 3 countries: so you could could US, UK or Germany as your target:

In the templates step you can choose a template for your skill. Those templates are based on the real skills we created with BotTalk and published on Amazon.
For the purpose of this tutorial we’ll choose an empty template. But feel free to poke around other templates for reference or inspiration:

Finally you’re presented with the confirmation step — just skim over the details of your skill. You can go back and change a name, language or template. Or press “Finish creation” button if everything is the way you want it to be:

Skill Editor

The skill editor consists of three tabs:

main.yml: This is where the main logic of your skill resides.
intents.yml: This file stores your intents and their utterances.
slots.yml: You guessed it — you put your custom slots in here.

Even when you choose an empty template for your skill, BotTalk still creates a basic structure in these files. This is done for your convenience. This structure is a bare minimum for a skill to actually pass Amazon’s certification process. So technically — even an empty template is a ready-to-ship skill!

Let’s examine this structure in greater detail:

The skills in BotTalk are written in YAML format. This is a human friendly yet very powerful data serialization standard. Which allows you as a skill developer to describe your skills in a beautiful, structured and logical way. If you ever written a Ruby on Rails or Symfony applications, you should be pretty familiar with YAML.

If you didn’t — don’t worry, there are only couple of things you need to know before you start. YAML is very sensitive to indentations. And you can’t use tabs, only spaces. That’s basically it! But if you occasionally make a mistake — we’ve got you covered. Every time you press “Save” BotTalk will check your YAML file for formatting errors — and tell you exactly on what line it found them.

Let’s take a look at the first four lines of the main.yml file:

The first line is YAML-specific delimiter: three dashes. Followed by the directive to include contents of two other files: intents.yml and slots.yml.

Scenario

The actual scenario starts from the line 5 with the directive scenario:

This section describes the basic settings of the scenario — most of which are generated from the form you submitted while creating the skill. But let’s get though each of them for the better understanding of their purpose:

name: This represents the name you’ve given your skill in the step one of the wizard.
locale: Remember when you were prompted to choose between US, UK or Germany — this answer is stored here.
category: The category your skill will be published in. Amazon requires each skill to have at least one category. That’s why BotTalk is choosing Education and Reference category by default. Consider this as a placeholder. You can always choose the actual category you want your skill to appear in — when you publish it on Amazon.
invocation: This is the “code word” that in combination with “Alexa” would start your skill. By default BotTalk will use the name of your skill as the invocation.
examplePhrases: Amazon requires each skill to have from one to three example phrases. BotTalk will create one phrase for you. We’ll extend the example phrases when our skill will become more complex.

Step One

That was the basic structure of each scenario. What follows are the steps. An empty template would have only 3 steps that are required by Amazon for each skill. Let’s take a closer look at each step:

name: Each step must have a name. The steps are linked with each other using names.
actions: Actions are the main building elements of the step. In this example the first action we see is sendText action. This action describes what Alexa would say to a user. After that there is an action getInput which is empty. This represents that Alexa is waiting for a user to say something.
next: What exactly Alexa is expecting a user to say is described by the next directive. Here you connect each possible intent with the corresponding step. Alongside your custom intents there are 3 built-in intents Amazon requires every skill to respond to: AMAZON.CancelIntent, AMAZON.StopIntent and AMAZON.HelpIntent.

Step Two

BotTalks conveniently places them in the first step for you. As well as creating corresponding steps — so you don’t forget to.

Two of those intents link to the step named Exit. Let’s examine how this step is structured:

There are two things that are different from the step we’ve described before:

It has an endpoint directive which value is true. This is a shortcut that essentially means — this step could be entered from anywhere in the dialog flow. You don’t have to manually “expect” AMAZON.CancelIntent, AMAZON.StopIntent after each step. Just place them once in the first step. And don’t forget to mark the corresponding Exit step with endpoint: true.
This step does not have the next directive. Which actually makes total sense, when you think about it. This is the last step of your scenario. There is nothing that is happening next — thus no steps that are linked to it.

Step Three

Let’s have a look at the Help step:

It’s also marked as entrypoint: true — your users obviously need to have an ability to ask for help at any given time. Besides that there’s a simple sendText directive. But what is more interesting — in the next section — there is your first custom intent ok_great. This intent is connected to the first step — Initial step.

So essentially if a user says “Help”, Alexa will say “Here is the help text” — and wait for user to say something.

But what utterances, or actual phrases are behind ok_great intent? Glad you asked! Come over to the intents.yml tab!

Intents

Here all intents are structured in a beautiful matter. Each Intent name — for instance ok_great could have several utterances — in our case OK and Great. These are created for reference by BotTalk, we will extend them further in this tutorial.

Hello, World!

Now you understand the basic structure of BotTalk Scenario, let’s go ahead and customize the empty template.

To follow the long tradition, we surely want our first skill to greet the world!

Save the skill. And let’s jump right into the Test section.

Testing

After you hit “Run Test” button BotTalk’s automatic tester will launch. The skill will be invoked with the invocation phrase you used before. And our first step will be executed — Hello, World!

After that the tester will randomly choose one of the three next steps we provided. If you remember, those are built-in Amazon intents: AMAZON.CancelIntent, AMAZON.StopIntent or AMAZON.HelpIntent.

As you can see on the screenshot in our case it chose to say Help, invoking AMAZON.HelpIntent. Then the skill jumped to the Help step, said Here is the help text and waited for the reply.

BotTalk’s Tester “said” Great and invoked our custom ok_great intent. As you remember this was linked back to the first step.

That’s why the skill once again said Hello, World! and waited for the user reply. This time the BotTalk’s Tester chose to reply with ‘Cancel’, invoking

AMAZON.CancelIntent which was linked to the Exit step.

Rerun the test multiple times — to watch the Tester choose another route, another utterance. This is a great way to test for dead ends, endless loops and simply a correct way of skill development. Testing is great!

To the Moon!

Ready to deploy your simple skill to Alexa? No problem! We’re excited too. Just head over to the Deploy section and hit the “Deploy ” button:

First time you run the Deployer — it will check if the skill already exists in your Amazon Development Console. If not — it will create it, initiate the interaction model and deploy to Alexa.

You are all done here! You can go ahead and test your newly created Skill in the Amazon Builder — under the Test section:

Congratulations! You have just created your first skill in BotTalk. From zero to deploy. With the beautifully structured logic and, of course, testing.

We’re sure you are exited and want even more. The following section is exactly right for you!

Creating Complex Skills — Crypto Currency Price

Following best practices of Voice Interface Creation — to create complex Skills — we need to first start with a use case, then draft the mind map of the dialog flow. And only after that start writing the logic.

Mixing this steps is a bad practice. And that’s why BotTalk encourages the separation of these three distinct phases of skill creation.

Use Case

I as a user want to quickly check the price of my favorite crypto currencies. I want also to know how much price changed from the last time I checked.

After you defined the use cases — it’s time to move to a mind map.

Mind Map

This will actually help us along the way in creating steps of our scenario. And in visualizing the dialog flow.

Amazon best practices guide suggests we start with the main logic path, and then add the advanced variations as we move along. So that is exactly what we’re going to do.

Logic

Let’s start with the first step — as you can see in the mind map — the first thing the skill does is asks the user in which coin is she interested in:

Ask User For Coin Name

We’ve made three changes to our first step:

Renamed the step into Ask User For Coin Name for the convenience.
Edited the sendText action
Added custom intent get_coin_name that will be linked to a step we’re going to create next — Make Api Request get_coin_name Intent

get_coin_name Intent

But before we jump to the next step, let’s get a closer look at the intents.yml tab — and define the utterances for the get_coin_name intent.

As you can see we first define a pretty simple utterance: just the name of the coin. Notice how we put the coin_name in the curly braces? This how BotTalk identifies the slots.

After a simple utterance, we go to more complex ones, that represent a more realistic and lively way our users may express their intentions: How much is {coin_name} or What is the current price of {coin_name}.

coin_name Slot

Let’s head over to the slots.yml tab and actually define the coin_name slot we’re using so heavily in our utterances. For the sake of this tutorial we decided to go with the top 10 of the crypto currencies. You can, of course, add more:

This is pretty straight forward syntax, but the interesting thing is: there were hundreds of coin names in our original skill. Did we put them there all by hand? Of course not. That would be boring. And when programmers are bored they create automated solutions!

That’s the beauty of BotTalk’s YAML structures. We just wrote a simple script that ran over a list of coins and packed them into this simple slots format.

Remember that trick when you need to work with the loooooooong lists of custom slots!

Make Api Request

We’re all set for our next step. But before making an actual request, let’s just create a placeholder for this step. This way we could actually test if our intents and slots are working:

This step’s structure should be familiar to you, but there is one thing that is new. In the double curly braces there is coin_name variable. The double curly braces notation comes from Twig Templating Engine and is used to display variables values. But where does the variable iteslf come from?

BotTalk automatically saves the slot value into the variable with the same name. So after a user answered in which coin was she interested in — the coin_name slot was filled. And you can now use this variable anywhere in your scenario.

For now we’re just testing — that’s why we’re playing back the name of the coin the user mentioned.

You can go ahead and test it in the Amazon Builder or in your Alexa device yourself, but since we changed the intents.yml and slots.yml we would need to regenerate a model. We do that by running the Deployer once again:

After the deploy process if finished you can see the results in Skill Builder:

It’s working! Alexa is correctly understanding different utterances we provided as well as getting the correct slots out of them. Moreover — we see that the variable *coin_name* we’re using in our test response is parsed correctly.

We now can move forward!

We’ll be using this API for asking a current price of crypto currency:

https://min-api.cryptocompare.com/data/price?fsym=BTC&tsyms=USD

The response is pretty straight-forward:

{
  "USD": 10603.63
}

But if you look at the format the cryptocurrency is used by this API — you’d notice the problem.

We’re asking our users to tell us the coin name in the human-friendly format: Bitcoin, Ripple, Ethereum. The API, on the other hand, is using coins’ identifiers: BTC, XRP, ETH. So in order for us to use this API we would need to make some conversion first.

setContext Action

And that’s how it’s done:

There’s a lots of things happing here, but worry not, we’ll explain everything!

Let’s start with the new action called setContext — it is essentially used to create your own custom variables. The first parameter setContext action takes is from. It is the value of the variable you are created. And it can contain any Twig operation.

In our case we’re using Twig’s if tag in order to map each coin_name to the corresponding coin identification. So the result of this operation will be something like BTC or XRP or ETH.

We take this result and save it into to argument of the action. This is essentially the name of your variable. In our case we called it coin_id.

To test this we then use the action you’re already familiar with — sendText — so that Alexa could say outloud that we correctly mapped the coin_name into coin_id.

In the end we’ve put next directive to link this step with the Ask User For Coin Name step. It’s done for convenience — so you could test several coins names without restarting the skill every time.

Save the scenario — and let’s test in in Skill Builder:

Wow! Isn’t that wonderful! We’ve been able to convert the human-friendly coin names into their id.

getUrl Action

Now there’s nothing that could stop us from doing that very first API request:

Let’s examine what is happening here. First, we introduce a new action called getUrl. The purpose of this action is to make a GET request to a third-party API. It has one parameter — url — which is self-explanatory. But where the JSON from the API get stored? This is a great question! If you look at the next action — sendText — you will notice a new variable called urlResponse. This variable stores the last API call that was made.

As you might recall the JSON that comes from the API looks like that:

{
  "USD": 10603.63
}

So in order for us to get the actual dollar amount — we need to dig one level dipper into the JSON: urlResponse.USD.

Are you ready to test if API requests are working? Head over to developer console and start the skill, enter couple of coin names to get the latest prices:

Great! Seems that everything is working! Let’s take a look at our mind map once more to establish where we are:

Ok, after we’ve said the coin price we need to ask a user if she want to know the price of the next coin. Let’s do it in the next step!

So after the price was said, we need to send user another text with sendText action and then are waiting for her reply with getInput action.

There are two possible answers to this question — either yes or now. We define them in the next section. So if the user says yes we link to the first step — asking her which coin is she interested it. If the no is the answer — there is nothing else we can do, but just say good bye — link to the Exit step.

One more thing before we can test this — we need to add yes and no to our intents.yml file:

One more thing before you go to the amazon developer console. Every time you touch intents.yml and slots.yml files you need to rebuild your model!

So head over to the Deploy tab, hit Deploy and wait:

All set for testing! Let’s give it a try:

That’s starting to look very nice! There is one little thing we need to add in order to complete the main path of our logic:

In the last step we need to tell the user she can open our skill anytime to get the latest price of the last coin she was interested in.

Last Step

Let’s rewrite the Exit step really quickly:

There are two things happening here. First, we use coin_name variable that we set before. Second, we’re introducing the random Twig function. It takes an array of strings and chooses one of the values at random.

It’s a very good practice to introduce this variety in your skills. Since the repetitiveness is the one thing that humans find annoying very quickly. And you don’t want your users to be annoyed, right?

Quick test before we call it a win:

Great! But as you can see, next time we open Crypto Medium skill it is starting all over again. Let’s fix this and make our skill really smart!

Remembering Things Between Skill Invocations (Sessions)

Just a quick reminder of what we want to achieve:

We want our skill to take a different route when the user answered the questions once. The second invocation of the skill should introduce a slightly easier use case. After all, you said which coin you are interested in — there is no need to repeat it once again.

Let’s take it one step at a time. First, let’s check if this is the first time the user opened the skill or not.

compareContext

In order to do this we’re using compareContext action:

So the first thing we need to do is to put some “check” steps before our initial step. What does the step above does? Well, if the user had once said which coin he’s interested it — we already saved the coin_id for this coin.

That’s where compareContext action comes in. It takes two arguments: var and is_equal — both of them can contain any Twig expressions.

So what the code above does is it’s checking if the coin_id variable is empty. But since we can’t be sure that this variable was ever created (this could be the first run of the script), we put a default directive there. So if there is no variable called coin_id it will just be the empty string.

And in both cases we’re checking if the coin_id is empty, and if it is we’re displaying ’true’ as a value. After that we’re checking if the expression in the var parameter equals true with is_equal parameter.

There are two possible outcomes. The positive outcome will undergo another check — in the step Compare with old price. The negative outcome will mean that the user is starting a skill for the first time. Thus — we’ll just redirect her to our Initial step.

Uff, that’s a lot of explaining of the simple logic! Let’s test if all is working:

Great, now let’s replace the placeholder text “Not the first time! Doing comparison here!” with the actual logic.

What do we need for this? In order to compare two prices — the current one and the one that was the last time the user launched the skill.

So, the first thing we need to do is save the last_price every time we make an API request. We do that by introducing another setContext action:

Just to check if it’s working, let’s include that variable in our placeholder text in the Compare with old price step:

Save and give it a try:

Ok, it seems to be working! The price gets saved in the last_price variable. Moreover! It’s stays saved between session.

Now let’s do the last part — actual comparison of the prices:

By now you should be familiar with all the syntax from the listing above!
One nifty thing we added is in the next section. Although we’re asking a user yes/no question we can’t be sure that is exactly what she’s going to answer. Especially when she used the skill for some time — and is familiar with what it’s capable for. For example, she just might say — “How much is ripple?”. And so we’re putting our get_coin_name intent here just in case!

Let’s give it a try!

OK, great! Now that we’ve finished the skill it’s time to submit it to Amazon and share with friends!

Conclusion

Building Alexa Skills is fun! And we’re hoping we could give you a little taste of how easy it is to build a complex skill with BotTalk!

Join BotTalk Community on Facebook

Subscribe to our YouTube Channel

And please do give us a clap-clap =)

Learning BotTalk — Creating Complex Alexa Skills with Simple Markup Language

Problem

Variations

Context and Sessions

Testing

Solution

Creating Your First Skill

Skill Editor

Scenario

Step One

Step Two

Step Three

Intents

Hello, World!

Testing

To the Moon!

Creating Complex Skills — Crypto Currency Price

Use Case

Mind Map

Logic

Ask User For Coin Name

get_coin_name Intent

coin_name Slot

Make Api Request

setContext Action

getUrl Action

Last Step

Remembering Things Between Skill Invocations (Sessions)

compareContext

Conclusion

Written by SmartHouse Technologies