How to Generate Synthetic Data with Sketch

Owen Schoppe
Aug 18 · 7 min read

Excited about synthetic data? Maybe you’ve read my article about how design fits into the machine learning process, and the power of synthetic data. When you don’t have access to a clean public dataset, generating synthetic data can be a powerful way to bootstrap your models. And synthetic data has its advantages. You spend much less time on data cleaning and can reduce bias during data generation by including a balanced number of examples. And Sketch randomly distributes images and text values to avoid bias.

Want to create some synthetic data of your own? This tutorial shows you how to create synthetic data to train a classifier that can answer that age-old question, “what is good design?” by predicting whether a design is good or bad. You can then extend these techniques to any UI component — for example, training a model to classify component types, or classify components by state. The possibilities are endless.

As an example, let’s generate a sample dataset of product tiles.

Image for post
Image for post
Example of good and bad tiles

Setup

This tutorial relies on Sketch from Bohemian Coding; I recommend installing the latest version (≥63), along with two plugins that can make this process more efficient:

Add Sketch Data

Now we need to collect the input data we’ll use to create our synthetic data. We’re creating product tiles, so we need product names, prices, and product images. To quote the Sketch documentation, “To create your own text Data source, create a plain text (.txt) file with each data value on a new line. For a new image Data source … create a folder with all the different images you want to use inside and add it via the Data tab in Preferences.”

To start, create a folder next to your Sketch file called Project Data. Create two plain-text files in TextEdit, named product_names.txt and product_prices.txt. Add ~25 entries in each, and save them in the Project Data folder.

To avoid introducing bias, make sure the values you create express a full range of data. For example. if all of your prices are under $100, or all in one currency, your model won’t perform well if it faces data that exceeds these limits.

Next, add 25 product images to a folder called product_images inside the Project Data folder.

To add these files as Sketch data sources, open Preferences and click the Data tab, then the Add Data button. Select the two text files and the folder. Click Done, and voilà!

Adding Sketch data sources

Create “Good” Tiles

Down to business. In your Sketch file, create a variety of product tiles. Each one should use the same data, but vary in design, while still adhering to your definition of good design. If you have a well-defined brand standard, stick to its rules. Don’t worry, you’ll get an opportunity to break the rules soon.

We aren’t using nested symbols here, but I recommend them if you want to incorporate sublayouts, such as inline list price and original prices. Don’t use an artboard for each product tile — it’s easier if they are regular groups. As you build your tiles, be consistent in naming product image, product name, and price layers. Layer names become labels in the synthetic data you’ll use to train your model.

Basic product tile with layer names

Let’s start by drawing a single tile. Remember to bind each text layer to Sketch data — when you add a text element for the name, use the Data menu to fill the layer with a product_name from your text file.

Sketch Data menu

To create an image, draw a rectangle, then use the Data menu to select a product_image data source. This fills the background with an image, and crops it to fit the rectangle — making it easier to handle images of different sizes. Make sure to group your layers.

We want to use Sketch’s resizing and smart layout properties to ensure that each tile scales gracefully for every text value in your datasets. Use these properties with any nested symbols as well. Carefully consider edge cases — any bugs will reduce the accuracy of your model.

To create each new tile, duplicate this first tile — your layers will be named correctly and your data attached properly. Make as many tiles as you can; 50 is a solid start.

Giving tiles unique names

Now it’s time to organize and name your tiles. Select all tiles, select Align Left, then Tidy. Set vertical spacing to 50px. This should give you a neat single column of tiles. Reselect all tiles and select RenameIt, then enter Product%NNNN to give each of your tiles a unique name. Add enough Ns to ensure a unique name for each tile.

Layer names

Create “Bad” Tiles

The data to train your classifier should have as many bad tiles as good ones. These are product tile designs that don’t meet your design standards. Got well-defined brand standards for fonts and colors? This is your opportunity to break the rules.

Start by selecting all the good tiles. Duplicate them into a new column below the original one, far enough away that the divide is clear.

With all the good tiles selected, click RenameIt and enter %*-Good. This appends -Good to your original names. Do the same for the bad tiles, this time with %*-Bad.

Tagging bad tiles

Now comes the fun part. For each bad tile, break the rules in a new way. Make the text overlap, or mess up the alignment. Get creative! As you do this, keep track of how you “break” each tile, and add that data to its name — for example, Product0001-Bad-misaligned.

Stacked columns and layer names

As with the good tiles, do your best to support the range of data in your text files. If you overlap text, for example, make sure it will still overlap even if a shorter value is used. All the bad tiles must be bad!

Duplicate

Time for your hard work to pay off! Select all your tiles, then in the Craft plugin, select Duplicate. Since we’ve been organizing into a single column, use the horizontal checkbox and set the number to match your number or images and product names.

Next, delete the layers and groups that Craft added. Select the whole matrix of tiles you’ve made, and click Ungroup twice. Then delete the Duplicate Control layer.

Duplicate menu

Now select all the tiles again. In the Data menu, select Refresh Data.

Magic! You should now have a massive wall of 2500 unique product tiles.

Sketch Data menu — Refresh Data

Export Images

Ok, let’s save these tiles so you can start training models.

First, let’s use RenameIt to give each tile a unique name, so it saves properly. Select all tiles, click RenameIt, and enter %NNNNNN-%*. This prepends a unique ID to each tile, while preserving the template name and other labels.

In the properties panel, mark all tiles as exportable. In Sketch, you can choose export resolution; here I recommend 1x.

Click Export Selected and watch the data flow.

Later, when processing these images, you can parse the final file name, using “-” as a delimiter to get the unique ID, template ID, quality label, and reasons why a tile is bad.

JSON Data

In machine learning, images are great, but structured data is super useful. Thankfully, Sketch files are all JSON under the hood. The trick is to unzip the file. (If you’re using any symbols in your Sketch file, now is the time to detach them. The simplest way is to duplicate your file and delete the Symbols page to flatten the whole file.)

On a Mac, pop open your Terminal. Type unzip and a space. Drag your flattened Sketch file onto the terminal window, and press Enter. You should see the contents of your Sketch file in the same folder you’ve been working in.

Inside the new Pages folder, you should see a .json file with a long name, which contains all the JSON describing your tiles in a property called layers — including layer names showing which properties belong to which layers. (For more on file schemas, see the Sketch documentation.)

File system with JSON data

And that’s it. You’re now the proud owner of several thousand rows of synthetic training data. As you can see, it’s relatively easy to add more rows or more data — so grow your dataset to your heart’s content.

Happy modeling!

Thanks

Sönke Rohde, Jessica Lundin, Michael Sollami, Alan Ross, Brian Lonsdorf, David Woodward

Follow us at @SalesforceUX.

Want to work with us? Contact us at uxcareers@salesforce.com.

Check out the Salesforce Lightning Design System

Salesforce Experience and Design

A collection of stories, case studies, and ideas from…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store