Human-as-a-service

How to make human/machine integration reliable with JSON files

Innovating means trying out lots of ideas — and many of them will fail. We therefore like to test our ideas before we spend too much time implementing them properly. This often requires to blend manual labor into automated processes. Here is how we make this human/machine integration reliable.

Human/machine API

Right at the start of a project, we define an “API” between the humans and the machine. This is typically a simple JSON file with sample data. With that in place, the developers can start to write code against the “API” while the humans are working on filling out the file with real content.

We try to avoid changing the format of the JSON file until the end of the project but, being pressed, we prefer to quickly decide on the API (within 1h max) than loosing too much time over-engineering it.

Tips & tricks

We learned a couple of lessons how to structure such JSON files that should be filled by humans. Let’s demonstrate these based on an example.

For machine-to-machine communication this could be an efficient data exchange format:

[
{
“id”: “abc”,
“ts”: 1318250880000,
},
...
]

The developers of the producing and consuming applications would look up the meaning of the various fields e.g. in a JSON schema. We found that this lookup step doesn’t work well if the JSON file is filled by a human, especially for non-developer who’re often unfamiliar with JSON schemas.

Instead, we would store the same content in a more human-friendly format:

{
“groups”: [
{
“id”: “abc”,
“title”: “New products”,
“createdAt”: “2011–10–10T14:48:00Z”,
“comment”: “Has been deleted from log tracking system”

}
]
}

To highlight the changes:

  1. Add an intermediate field to clarify array content: By adding the root field groups, it’s clear to the reader that the file contains a list of groups.
  2. Denormalize to avoid lookups: The group title isn’t required by the consuming application (which could look it up based on the group id) but it helps human to scan through the file and find back entries.
  3. Clearly name fields: createdAt is more specific than timestamp and certainly easier to understand than the abbreviation ts.
  4. Use human readable formats: The ISO timestamp 2011–10–10T14:48:00Z is far easier to understand for a human than 1318250880000, the number of seconds since 1970.
  5. Add a comment field for the creator: The field comment has no value for the consuming application but it allows the human creator to store notes in place.

Case study: Manual input from Data Science

For us, manual input often comes from Data Science. For example, our data folks lately found a way to classify if a posted idea is well structured. We concluded that we could use this to send users an email, pointing out the “bad ideas” and give concrete advices how to improve them.

At this point, this is just a feature idea. We don’t know if users will appreciate such an email and if they would really act on it. So, we want to test out the idea cheaply.

For the test, the Data Science team would manually analyze the data for a couple of test groups. Afterwards, we wanted to inject this manually sourced data into our automated email sending process.

The Data Science team agreed with the email developers on the following sample JSON file:

{
“groups”: [
{
“id”: “123”,
“title”: “My test group”,
“members”: [
joe@example.com”, “sue@example.com
],
“bad_ideas”: [
{
“content”: “Make things better”,
“reason”: “Too generic”,
“improvement“: “Ask yourself which steps you would take“,
“created_at”: “2011–10–10T14:48:00Z”
}
]
}
]
}

Based on the sample data, the developers started to adjust the email sending process. In parallel, disconnected, the data scientists did their magic to detect bad ideas for the test.

Once both teams were ready, we could easily connect the two sub solutions, and sent out our test emails.

Happy coding!

Want to learn more about coding? Have a look to our other articles.