Excited about synthetic data? Maybe you’ve read my article about how design fits into the machine learning process, and the power of synthetic data. When you don’t have access to a clean public dataset, generating synthetic data can be a powerful way to bootstrap your models. And synthetic data has its advantages. You spend much less time on data cleaning and can reduce bias during data generation by including a balanced number of examples. And Sketch randomly distributes images and text values to avoid bias.
Want to create some synthetic data of your own? This tutorial shows you how to create synthetic data to train a classifier that can answer that age-old question, “what is good design?” by predicting whether a design is good or bad. You can then extend these techniques to any UI component — for example, training a model to classify component types, or classify components by state. The possibilities are endless.
As an example, let’s generate a sample dataset of product tiles.
This tutorial relies on Sketch from Bohemian Coding; I recommend installing the latest version (≥63), along with two plugins that can make this process more efficient:
Add Sketch Data
Now we need to collect the input data we’ll use to create our synthetic data. We’re creating product tiles, so we need product names, prices, and product images. To quote the Sketch documentation, “To create your own text Data source, create a plain text (.txt) file with each data value on a new line. For a new image Data source … create a folder with all the different images you want to use inside and add it via the Data tab in Preferences.”
To start, create a folder next to your Sketch file called Project Data. Create two plain-text files in TextEdit, named product_names.txt and product_prices.txt. Add ~25 entries in each, and save them in the Project Data folder.
To avoid introducing bias, make sure the values you create express a full range of data. For example. if all of your prices are under $100, or all in one currency, your model won’t perform well if it faces data that exceeds these limits.
Next, add 25 product images to a folder called product_images inside the Project Data folder.
To add these files as Sketch data sources, open Preferences and click the Data tab, then the Add Data button. Select the two text files and the folder. Click Done, and voilà!
Create “Good” Tiles
Down to business. In your Sketch file, create a variety of product tiles. Each one should use the same data, but vary in design, while still adhering to your definition of good design. If you have a well-defined brand standard, stick to its rules. Don’t worry, you’ll get an opportunity to break the rules soon.
We aren’t using nested symbols here, but I recommend them if you want to incorporate sublayouts, such as inline list price and original prices. Don’t use an artboard for each product tile — it’s easier if they are regular groups. As you build your tiles, be consistent in naming product image, product name, and price layers. Layer names become labels in the synthetic data you’ll use to train your model.
Let’s start by drawing a single tile. Remember to bind each text layer to Sketch data — when you add a text element for the name, use the Data menu to fill the layer with a product_name from your text file.
To create an image, draw a rectangle, then use the Data menu to select a product_image data source. This fills the background with an image, and crops it to fit the rectangle — making it easier to handle images of different sizes. Make sure to group your layers.
We want to use Sketch’s resizing and smart layout properties to ensure that each tile scales gracefully for every text value in your datasets. Use these properties with any nested symbols as well. Carefully consider edge cases — any bugs will reduce the accuracy of your model.
To create each new tile, duplicate this first tile — your layers will be named correctly and your data attached properly. Make as many tiles as you can; 50 is a solid start.
Now it’s time to organize and name your tiles. Select all tiles, select Align Left, then Tidy. Set vertical spacing to 50px. This should give you a neat single column of tiles. Reselect all tiles and select RenameIt, then enter Product%NNNN to give each of your tiles a unique name. Add enough Ns to ensure a unique name for each tile.
Create “Bad” Tiles
The data to train your classifier should have as many bad tiles as good ones. These are product tile designs that don’t meet your design standards. Got well-defined brand standards for fonts and colors? This is your opportunity to break the rules.
Start by selecting all the good tiles. Duplicate them into a new column below the original one, far enough away that the divide is clear.
With all the good tiles selected, click RenameIt and enter %*-Good. This appends -Good to your original names. Do the same for the bad tiles, this time with %*-Bad.
Now comes the fun part. For each bad tile, break the rules in a new way. Make the text overlap, or mess up the alignment. Get creative! As you do this, keep track of how you “break” each tile, and add that data to its name — for example, Product0001-Bad-misaligned.
As with the good tiles, do your best to support the range of data in your text files. If you overlap text, for example, make sure it will still overlap even if a shorter value is used. All the bad tiles must be bad!
Time for your hard work to pay off! Select all your tiles, then in the Craft plugin, select Duplicate. Since we’ve been organizing into a single column, use the horizontal checkbox and set the number to match your number or images and product names.
Next, delete the layers and groups that Craft added. Select the whole matrix of tiles you’ve made, and click Ungroup twice. Then delete the Duplicate Control layer.
Now select all the tiles again. In the Data menu, select Refresh Data.
Magic! You should now have a massive wall of 2500 unique product tiles.
Ok, let’s save these tiles so you can start training models.
First, let’s use RenameIt to give each tile a unique name, so it saves properly. Select all tiles, click RenameIt, and enter %NNNNNN-%*. This prepends a unique ID to each tile, while preserving the template name and other labels.
In the properties panel, mark all tiles as exportable. In Sketch, you can choose export resolution; here I recommend 1x.
Click Export Selected and watch the data flow.
Later, when processing these images, you can parse the final file name, using “-” as a delimiter to get the unique ID, template ID, quality label, and reasons why a tile is bad.
In machine learning, images are great, but structured data is super useful. Thankfully, Sketch files are all JSON under the hood. The trick is to unzip the file. (If you’re using any symbols in your Sketch file, now is the time to detach them. The simplest way is to duplicate your file and delete the Symbols page to flatten the whole file.)
On a Mac, pop open your Terminal. Type unzip and a space. Drag your flattened Sketch file onto the terminal window, and press Enter. You should see the contents of your Sketch file in the same folder you’ve been working in.
Inside the new Pages folder, you should see a .json file with a long name, which contains all the JSON describing your tiles in a property called layers — including layer names showing which properties belong to which layers. (For more on file schemas, see the Sketch documentation.)
And that’s it. You’re now the proud owner of several thousand rows of synthetic training data. As you can see, it’s relatively easy to add more rows or more data — so grow your dataset to your heart’s content.
Follow us at @SalesforceUX.
Want to work with us? Contact us at firstname.lastname@example.org.
Check out the Salesforce Lightning Design System