Shyp Engineering - Medium

A Unique Journey in Search of Keys

Derek Barnes — Wed, 24 Aug 2016 18:17:45 GMT

Introduction

At Shyp we’ve begun undertaking an extensive maintenance project on our primary API’s database, changing its tables’ primary keys from a prefixed uuid stored as text, to utilizing the database’s native uuid type, and adding the prefix in the application layer. This blog post will give some general background on primary keys, outline why we made this change, the process of preparing our application for this change, and how we’ve gone about migrating the existing data.

Primary Keys and UUIDs

Relational databases store records as rows in tables. Each table has a strict definition of the columns that each row entry will be made up of, along with their type. For example, you might have a contacts table defined as:

first: text
last: text
birthday: date

Then, you might have a table that looks like:

| first   | last     | birthday   |
|---------|----------|------------|
| Lindsey | Homer    | 1984-06-24 |
| Melvin  | Anderson | 1987-01-23 |
| Jessie  | Bell     | 1957-09-16 |

In this case, the table does not have a primary key — the only way to query it is based on the information it contains. Sometimes this is appropriate, but often this becomes difficult when other information needs to be associated with a record. For example, we might want to have a notes table where we could add a note for each contact, but there’s no way to clearly associate a particular contact with a note, as it’s possible that two contacts can share the same name and birthday. So we need a way of uniquely identifying the record with what’s called a “key”. Some database designs use another attribute about a record, such as a person’s Social Security Number, in order to provide a way of looking up the record; this is called a “natural key”. However, this can be problematic when we can’t guarantee that the value of the natural key is truly unique, and so we need to utilize something else in order to identify records.

Since relying on intrinsic properties of a dataset is problematic for establishing the identity of records in the set, databases are often designed using synthetic primary keys. These keys are guaranteed to be unique by the database system, and are used for the sole purpose of looking up a record. The most common approach is to use a sequential integer. In our contacts example, this might look something like:

| id | first   | last     | birthday   |
|----|---------|----------|------------|
| 1  | Lindsey | Homer    | 1984-06-24 |
| 2  | Melvin  | Anderson | 1987-01-23 |
| 3  | Jessie  | Bell     | 1957-09-16 |

When the responsibility of generating the primary keys is entirely in the domain of the database system (as is often the case), this works well. In a distributed system, however, you might want to have an identifier for an entity without relying on the database to generate that for you. In this case, you can take a randomized value and format it in a consistent way; one category of these identifiers are known as UUIDs, which are 128-bit values, usually formatted as a hyphen-separated hexadecimal string. For example, “5c3ad9ab-a0fe-474c-87a2–38fc818d2b03” is a UUID. There are several variants of UUIDs, but they all have the same format, and the variants have to do with how the values are generated. V4 UUIDs are truly random, and are what Shyp uses for primary keys.

In addition, we prefix a short identifier to the id at Shyp. Although this is a less common practice, it’s not a quirk that’s unique to us (Stripe does this too). For example, a pickup is prefixed with “pik_”, and a shipment is prefixed with “shp_”. This is useful from an application and API perspective: Our operations apps can scan a QR code containing the id of a number of things (a shipment, a container, a warehouse worker’s badge) and route the UI appropriately. It’s also useful for debugging, as we can readily know the kind of data we’re dealing with just from its identifier.

These ids have historically been stored as text in our PostgreSQL database. While this works and has no impact on lookup speed, it’s an inefficient use of space. This is especially important with primary keys, whose indexes should fit in memory. And while disk space is cheap these days, RAM is comparatively expensive, and as we scale we’d like to keep these costs under control. There is also no real reason to have the database store these values as text, since PostgreSQL supports UUIDs as a first class data type, and at the database level, you always know what you’re querying for. The different representations are quite different size-wise. A text representation of a uuid (with hyphens) is 36 bytes (or 288 bits) — a little more than double the actual byte size of a UUID. An index for these values reflects this difference as well, as we will later see.

If there’s such a big difference, why did these get stored as text in the first place? Shyp’s API was first built on Sails, and Sails’s ORM, Waterline does not have built in support for handling UUIDs — it simply treats them as text. We’ve maintained our own fork of Waterline, which we’ve stripped a lot of features from, as well as removed all other parts of Sails from our API. In any case, the first step in this endeavor was to prepare our API and Waterline to handle UUIDs.

Planning

Changing the type in the database, is fairly easy in some sense — simply change the type:

ALTER TABLE users CHANGE COLUMN id SET TYPE uuid;

This sort of schema change could cause problems in a production environment, particularly when the table is so large that running the query will take more than a second or two, but it’s a good start in terms of getting the application code prepared for using the prefix-less ids.

In integrating with the existing codebase, we need to make sure the assumptions around prefixes are being challenged. For example, in my initial exploration I added a migration for one of our tables, ran the unit tests, and everything passed (great, ship it!). Since I knew that the prefixes weren’t being added I had to explicitly define the assumptions around our prefixing behavior.

First, I wrote an integration test that attempted to create a record, asserted that the newly created record was serialized by the application with a prefix, and finally, could be searched with that record. This ensures that the boundary around having prefixes or not is flexible; Ideally we’d go with one or the other, but we have a living codebase with many teams, and it’s impossible to change everything overnight while continuing to deliver new features.

In other words, not only did we want to handle UUIDs, but also we thought it would be nice if we could have our models handle prefixed UUIDs, and simply ignore the prefix when running queries.

In order to handle this, we added some UUID type coercion functionality to our fork of Waterline. This was pretty straightforward, as Waterline already has built in type coercion functionality; that is, if you give a model a string for an integer primary key, it will attempt to coerce it to an actual integer.

This results with the following Waterline query

User.findOne('usr_abc123') // ...

only using the UUID in issuing the query

SELECT * FROM users WHERE id = 'abc123';

instead of using the prefix

SELECT * FROM users WHERE id = 'usr_abc123';

(The latter query would throw an error in Postgres, because the value is not a valid uuid).

Next, we needed to add the prefix in the models. This is relatively straightforward; All models in our system have “toJSON()” invoked when being serialized, so we simply override the id serialization here, essentially adding the prefix if it’s not present on the id.

As a result of all this, we’ve got our models covered on both ends, compatible with our prefixed UUIDs internally, and our integration test passes. One nice quality of this approach in the model layer is that our code will be compatible with the schema as we migrate.

Data Migration

Now that our codebase will work with the prefix-less UUIDs, we need to migrate the existing data. This is a little tricky, because while it affects an entire table, it needs to happen without interrupting service. It also needs to be easily reversed in case something goes wrong.

To do this, we can break the migration into 5 steps:

Create a new column that will be the eventual new id column.
Fill any new records with the prefix-less id via a trigger on the table.
Backfill this column with the prefix-less data.
Create an index (concurrently) on that column.
Swap the new column in, and keep the old one (in a transaction).

Our migration was on the “trackingevents” table, which records a shipment’s various updates from the carrier.

This is what the migration looks like in SQL:

ALTER TABLE trackingevents ADD COLUMN newid uuid;

CREATE FUNCTION shyp_copy_newid() RETURNS TRIGGER AS $$
  BEGIN
    IF NEW.id IS NOT NULL THEN
      NEW.newid := regexp_replace(NEW.id, '\w+_', '')::uuid;
    END IF;
    RETURN NEW;
  END
$$ LANGUAGE plpgsql;

CREATE TRIGGER shyp_copy_newid BEFORE INSERT OR UPDATE ON trackingevents
  FOR EACH ROW EXECUTE PROCEDURE shyp_copy_newid();

UPDATE trackingevents SET id = id;

CREATE UNIQUE INDEX CONCURRENTLY trackingevents_newid_idx ON trackingevents(newid);
CREATE UNIQUE INDEX CONCURRENTLY trackingevents_oldid_idx ON trackingevents(id);

BEGIN;
  DROP TRIGGER shyp_copy_newid ON trackingevents;
  DROP FUNCTION shyp_copy_newid();

  ALTER TABLE trackingevents DROP CONSTRAINT trackingevents_pkey;
  ALTER TABLE trackingevents RENAME COLUMN id TO oldid;
  ALTER TABLE trackingevents ALTER COLUMN oldid DROP DEFAULT;
  ALTER TABLE trackingevents ALTER COLUMN oldid DROP NOT NULL;

  ALTER TABLE trackingevents RENAME COLUMN newid TO id;
  ALTER TABLE trackingevents ADD PRIMARY KEY USING INDEX trackingevents_newid_idx;
  ALTER INDEX trackingevents_newid_idx RENAME TO trackingevents_pkey;
  ALTER TABLE trackingevents ALTER COLUMN id SET DEFAULT gen_random_uuid();
COMMIT;

Then if everything goes well, drop the old id column:

ALTER TABLE trackingevents DROP COLUMN oldid;

And if something goes wrong, the old column can be swapped back in:

BEGIN;
  UPDATE trackingevents SET oldid = 'trk_' || id WHERE oldid IS NULL;
  ALTER TABLE trackingevents DROP COLUMN id;
  ALTER TABLE trackingevents RENAME COLUMN oldid TO id;
  ALTER TABLE trackingevents ALTER COLUMN id SET DEFAULT 'trk_' || gen_random_uuid();
  ALTER TABLE trackingevents ADD PRIMARY KEY USING INDEX trackingevents_oldid_idx;
  ALTER INDEX trackingevents_oldid_idx RENAME TO trackingevents_pkey;
COMMIT;

Following those steps, we migrated the table. The resulting index size was a little less than half the existing one. Not bad!

Next Steps

There’s been a lot of work going into this. We’ve migrated two of our largest tables, and any new tables use uuids for their primary keys.

As of writing our database has 67 tables, ten of which have uuid primary keys.

Some of these have foreign keys, so the foreign key needs to be changed as well. The setup, such as backfilling to a temporary column, will be the same, but the transaction of swapping the columns out would be slightly different. Let’s pretend a table called “trackingeventdetails” existed and had a foreign key pointed at the “trackingevents” table’s id. We’d have to write something like:

BEGIN;

-- New! Drop the foreign key reference to the table
ALTER TABLE trackingeventsdetails DROP CONSTRAINT "trackingeventdetails_trackingEventId_fkey";

-- Same as above
ALTER TABLE trackingevents DROP CONSTRAINT trackingevents_pkey;
ALTER TABLE trackingevents RENAME COLUMN id TO oldid;
ALTER TABLE trackingevents ALTER COLUMN oldid DROP DEFAULT;
ALTER TABLE trackingevents ALTER COLUMN oldid DROP NOT NULL;

ALTER TABLE trackingevents RENAME COLUMN newid TO id;
ALTER TABLE trackingevents ADD PRIMARY KEY USING INDEX trackingevents_newid_idx;
ALTER INDEX trackingevents_newid_idx RENAME TO trackingevents_pkey;
ALTER TABLE trackingevents ALTER COLUMN id SET DEFAULT gen_random_uuid();

-- Now we have to migrate the table pointing here as well
-- (pretend we have a backfilled column)
ALTER TABLE trackingeventdetails RENAME COLUMN "trackingEventId" TO oldTrackingEventId;
ALTER TABLE trackingeventdetails RENAME COLUMN "newTrackingEventId" TO id;
ALTER TABLE trackingeventdetails
  ADD CONSTRAINT "trackingeventdetails_trackingEventId_fkey"
  FOREIGN KEY REFERENCES trackingevents(id);

COMMIT;

Overall, it’s pretty similar — we just add the modifications to the foreign keys, and since it’s in a transaction, to the outside it appears as if nothing happened.

There are more complicated situations we can get into with regard to our foreign keys, but thankfully these are on our smaller tables. If you have any questions or improvements, please let us know! Also, a shout out to Braintree for posting their summary of high volume operations in Postgres. It’s been a great resource for this project as well as some of our ongoing feature work.

A Unique Journey in Search of Keys was originally published in Shyp Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Speeding up Javascript Test Time 1000x

Kevin Burke — Mon, 18 Jul 2016 17:53:49 GMT

Eight months ago when I ran our core API tests at Shyp, it took 100 seconds from starting the test to seeing output on the screen. Today it takes about 100 milliseconds:

Why Bother?

There are a lot of different demands on your time, why is a speedy test framework so important? Here are a few reasons we put a premium on being able to run tests quickly.

Deployments get faster. Our build/deployment server runs every test before a deployment. Making our tests faster means that we spend less time between pushing code and seeing it live in production. In situations where production is broken and we need to push a fix, faster tests mean we can get that fix to production more quickly.
Gains accrue over time. We run tests 400–500 times a day; a ten second improvement in every test run translates to an hour of saved developer time, every day.
Slow tests lead to context switches. Context switches are harmful. Distractions are everywhere with Slack, push notifications and open-plan offices. A ten or twenty second delay between kicking off a test run and viewing the results means you might be tempted to check email or Twitter or do any other activity which causes you to lose focus on the task at hand. Faster tests mean you can stay focused more easily.
Our target is to get feedback within 100ms; that’s about as much time as a UI can take before it stops feeling responsive and your mind starts to wander.
Fast tests lead to better code. We are subconsciously biased against things that are slow. Amazon famously found that 100ms of latency costs 1% of sales. You might want to refactor a module, but subconsciously decide not to, because refactoring implies running the tests, and the tests are slow. You might not write an extra test you really should write, because it means running the test suite again, and the tests are slow. You might decide not to be curious about a test anomaly, because narrowing down the issue would require running the tests, and the tests are slow.

For these reasons, it’s important to us that we start to see test output in under 100ms. Fortunately, we were able to hit that goal. How did we do it?

Measuring Performance

The first step to making anything faster is to measure how fast or slow it is. At a minimum, you’ll want to measure:

How long it takes before the first test starts running
How long it takes to perform any global, per-test setup/teardown actions
Minimum amount of time to do a database read (for us, 4ms)
How long each test takes. If you run your tests with mocha, I encourage you to set the — slow flag to 2 (milliseconds) so you can clearly see how long each test takes to run.

There’s an awesome Unix tool called ts (available on Macs via brew install moreutils) that will annotate every line of your test output with a timestamp. Pipe your test output through ts like so:

mocha test/api/responses/notFound.test.js | ts '[%Y-%m-%d %H:%M:%.S]'

And you’ll get test output annotated with timestamps with millisecond precision; all you need to do is find the right place to put console.log lines.

[2015-04-19 21:53:45.730679] verbose: request hook loaded successfully. 
[2015-04-19 21:53:45.731032] verbose: Loading the app's models and adapters... 
[2015-04-19 21:53:45.731095] verbose: Loading app models... 
[2015-04-19 21:53:47.928104] verbose: Loading app adapters... 
[2015-04-19 21:53:47.929343] verbose: Loading blueprint middleware...

We observed right away that a) our Node framework requires a ton of files before starting a test run, b) require is synchronous, and really slow in Node (on the order of milliseconds per file imported), and c) There were a ton of stat() syscalls to try and load modules in places where those modules did not exist.

The latter problem is documented in more detail on my personal blog, and there have been two promising developments in that area. First, Gleb Bahmutov developed the cache-require-paths library, which helps Node remember where it found a file the last time it was imported, and avoids many wasteful/incorrect file seeks. Gleb observed a 36–38% speedup when loading an Express project — our speedup was closer to 20%, but we are still really glad this tool exists.

Second, Pierre Ingelbert submitted a patch to io.js to avoid extraneous stat() syscalls in directories that do not exist. This patch was part of the io.js 2.3.1 release.

We run our tests in a virtual machine, so our development environment matches production. The core api project is shared between the host machine (usually a Mac) and the VM. Loading the test suite means that a lot of files in the node_modules directory are being loaded. If that node_modules folder lives inside the folder that’s being shared, the VM will have to reach across the system boundary to read it, which is much slower than reading a file inside the virtual machine.

It was clear we needed to install our node_modules folder somewhere inside the VM, but outside of the shared folder. Node’s NODE_PATH environment variable provides a mechanism for loading a folder saved elsewhere on the filesystem, but our Javascript framework hard-codes the location of the node_modules folder in its imports, so it failed to find the files we had placed elsewhere.

Instead we installed the node_modules folder elsewhere and symlinked it into place. Here’s a bash snippet you can use to replicate this behavior.

pushd ~/api 
    npm install --prefix /opt/lib/node_modules 
    ln -s /opt/lib/node_modules ~/api/node_modules 
popd

Savings: This sped up test initialization time by a whopping 75%, or one minute and fifteen seconds.

More Specific Regex

We used to specify a test to run by typing mocha -g ‘testname’ at the command line. Unfortunately, mocha always loads a mocha.opts file if it is present, and our mocha.opts file hard coded a filename regex that matched every single test file in our system (100+ files). We added a new test runner and instructed people to manually specify the test file they want to run.

Savings: This sped up test run time by about 50% (10–13 seconds).

Stubbed database reads/writes

The old test framework would do authentication by writing a valid access token to the database, then reading/returning the valid access token in the controller. We introduced a synchronous signIn test helper that stubbed these two network calls.

Savings: 10–20ms across ~600 tests, a 6–12 second improvement.

Batched writes

In some instances we need to instantiate test environments with users, drivers and other objects. Previously the test helper would write one record, read the id, and then write a record that depended on it. By generating ids up front, we were able to perform multiple writes at the same time.

Savings: 10–20ms across ~200 tests, a 2–4 second improvement.

Faster test cleanup

Between each test, we delete all records from the database. The helper responsible for this would open one database connection per table and then each one would call DELETE FROM . Not only would this occasionally hang with no stack trace, it meant that the speed of the cleanup operation was the same as the slowest DELETE query.

Instead we grouped all of the deletes and sent them to the database in a single connection (e.g. Model.query(“DELETE FROM users; DELETE FROM pickups; …”)). Some reading online indicated TRUNCATE would be faster than DELETE; plus, it lets you use one command for everything, e.g. TRUNCATE TABLE users, pickups, …. For large datasets it is likely faster, however we observed this to be much slower than DELETE for our small test data sets, on the order of 200ms per action.

An optimization we’d like to implement in the future would only issue DELETEs for tables that have dirty data, which would also let us avoid issuing a DELETE if a test didn’t hit the database. Currently we’re not sure about the best way to hook into the ORM and determine this.

We’re also interested in running every test in a transaction. Unfortunately the ORM we use doesn’t support transactions, and we are very worried about upgrading it.

Savings: Clearing the DB used to take 14–30ms per test, now takes 3–11ms, a ~20 second improvement.

Don’t Load Sails

Our Javascript framework (Sails.js) needs to require every single model, controller, and service before it runs; this is the slowest part of the test run by far.

Where possible, we try to avoid loading Sails when writing/running tests. We try to implement most of our business logic in plain Javascript classes or modules, outside of models and controllers. If this logic operates on objects in memory, you can use fake model objects in your tests — var user = {email: ‘foo@bar.com’}, and avoid loading Sails entirely.

Avoiding Sails isn’t possible in every situation, but we’ve found it’s a useful design principle — it helps us separate our business logic and our data storage layer, and will make it easier for us to move off Sails in the future.

For more on this technique, you should check out Gary Bernhardt’s excellent Destroy All Software series of videos. He gave a great introduction to his “don’t load Rails” style of testing at Pycon in 2012 (summary here).

Tighter Editor Integration

Most of the time the test you want to run is the same one that is right underneath your cursor, so manually specifying the filename at the command line wastes time. The awesome vim-test plugin makes it incredibly easy to map leader commands for running the current suite of tests or the current file. It worked on literally the first try; I’ve rarely been so impressed with a piece of software.

Some of our team members use Sublime Text, and we haven’t figured out how to get their editor integration set up yet. It’s on our to do list for this quarter.

Savings: One context switch and ~1–3 seconds per test run.

Avoid Reloading All Dependencies Every Test Run

After all of this we were able to get test run time down to about 7 seconds. Most of this time is spent waiting for v8 to parse/require ~500 files.

We can go much faster if we loaded every dependency once when we sat down at the computer, and then only re-required the files that had changed. You can do this by hacking with Node’s require.cache. I don’t recommend it for production, but it works just fine.

Anyway the end result of this is a command line client and a server called Lucifer. Start Lucifer when you sit down at your computer. When we change a file, we call lucifer invalidate [filename] (or configure our editors to do this for us), which invalidates the cache for that file, and then re-requires it. Then you can kick off a test run by calling lucifer run [filename] from the command line (or from your editor). When you kick off a test every dependency is ready to go, which means your tests start in about 100ms.

Savings: Test initialization time went from 6–7 seconds to 100ms, a 60x speedup.

Caveats

This is similar to the approach taken by spork in Ruby — keep a server up and reload changed files. The same failure modes that apply to Spork — increased complexity, subtle inconsistencies, failure to reload files — apply here. In particular, we’ve found that sometimes Node doesn’t wipe the cache for files that have changed, so new code won’t get loaded. This is probably a result of our limited understanding of how and where Node caches loaded dependencies. We always do a clean run of our test suite on our build server before deploying to production.

Still, it’s been very useful in certain situations — if you’ve flushed out a method or a controller and are writing several tests in quick succession, the only file that’s changing is your test file, so the chance of subtle dependency breakage is low.

Lessons

Your slow test suite isn’t hopeless! But improving it is probably going to take a concerted investment. You also have to get know your stack really well, which is probably a good idea anyway. Look at all of the places we had to inspect to find performance improvements:

File reads in a virtual machine
The require function in Node
Test workflow
Use batching to avoid database connection overhead
Editor integration

You don’t know what will be slow until you measure it. You’ll also want to know how to use basic profiling tools for your stack:

the ts command
Strace/DTruss (not covered in depth here, but extremely useful for observing system calls/timings for a running process)
Logging / timing for queries that hit the database
Logging and profiling your test run time

We’re hiring!

Our team of ten engineers powers four warehouses, four mobile apps, hundreds of drivers and loads of pickups every day. We’re looking for people who are curious about the tools they use and eager to improve their productivity day in and day out. We’d love to hear from you; you can’t waste our time by getting in touch.

Errata

Some odd things I found while profiling:

Accessing the .stack property of an Error object can block a process for up to 100ms. Stub your logger before running tests and you can see a quick win.
Some imports are extremely slow; require(‘faker’), for example, tacks a 60ms penalty onto your test runs. If you don’t need the locale data, you can speed this up quite a bit by only loading locales that you need.

Originally published at shyp.github.io on July 12, 2015.

Speeding up Javascript Test Time 1000x was originally published in Shyp Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

How we Automate Mobile and Web Projects at Shyp with CircleCI

Dan Rummel — Wed, 13 Jul 2016 20:36:28 GMT

At Shyp we have 4 mobile apps and a relatively small team. To be effective and move fast we automate everything we can: testing, building, deployments, (we even automate waiting, seriously, come to the event on Thursday to learn more about that). We have a customer facing app for Android and for iOS, an iOS app for our couriers (Compass), and an iOS app for operations team in our processing facility (Anchor). Each of these exists in its own github repo and each repo is connected to CircleCI. In addition, we have three repos that contain various code collections shared by the iOS apps. As you can imagine, CircleCI saves us a ton of time, here’s how we do it.

With CircleCI you often have a choice to configure your project settings either through the web or in the yml file. We try to do configuration in the yml file when possible in order to review and track changes. Here is how we setup a yml file for a typical iOS app project.

Setup

https://medium.com/media/a99a4eb1b36774e3d500cb8b165b0421/href

The general — branches -ignore section tells CircleCI to skip processing any commits on branches that start with “WIP-”, which allows us to push Work In Progress branches to github without worrying about failing builds or tests.

In the machine section we explicitly name the Xcode version and set an environment variable. The SKIP_SHYP_DEV_VALIDATIONS variable concerns a mechanism we’ve kludged into our Podfile that does some validation of the development environment whenever we pod install (for example making sure our git hooks are installed in the local working directory). On CircleCI this validation would fail, but it doesn’t matter in that context so we skip it.

Under notify — webhooks we set up a ping to our Slackerbot app. This hook looks for CI failures, figures out which developer made the commit and at what step in the CI process it failed. If it looks like a legitmate build or test failure (as opposed to an occasional CircleCI hiccup), it maps the github name to a Slack nick and posts @devnick — for breaking CI to our slack channel. This “minus minus” deducts from their Slack score. The criteria for legitimacy is that the failure is within the build or test steps: (step_name =~ /xcodebuild/ || step_name =~ /gradlew/). We already use the Slack integration in the CircleCI Project Settings to report build failures, but that lacked sufficient public shaming.

Testing

https://medium.com/media/70789fad6dc718214e644ba4a5f17ade/href

The test — override section is split into two steps. The first does a build and the second runs the tests. The CircleCI support team suggested this split to make it easier to track down intermittent build issues which were happening early on when iOS support was still experimental. We probably could combine these again but it doesn’t hurt much to leave them separated, and might be helpful if issues arise again. The output of each is teeed into a separate artifact log.

As developers we tend to build and test against the most recent iOS version throughout the day, so we felt it wise to run the automated tests against an older build destination. Our apps generally still support iOS 8.0. The oldest simulator versions CircleCI provides are 8.4 so we select that, on an iPhone 5. We pipe to xcpretty for better display on the website, and junit style test reports.

Deploying

https://medium.com/media/ecb0731dfd1c7c02368cb7dd0857b27b/href

In the deployment section we match on certain branch names to do a deploy. Most deploys are for internal testing so they use our Enterprise certificate for code signing. Only the release — x.y.z branches use the AppStore code signing. Here is the deploy script:

https://medium.com/media/b0612f92c98eff335a2901365d53ebe3/href

This current deploy script illustrates both the old and new way of doing code signing on CircleCI. The new way is used for the appstore builds. In this case the code signing certificate is uploaded to CircleCI and they encrypt it on their end for storage, and decrypt it and install in a local keychain when running a build.

The old way involved encrypting the signing certificate p12 file yourself, including the encrypted p12 in your repo, setting the decryption password in the CircleCI Project Settings as an environment variable ($P12_PASSWORD here), and then decrypting and installing it yourself in the deploy script. The new way is nicer.

During the deploy we use the CIRCLE_BUILD_NUM as a fourth semantic versioning field appended to the marketing version as the build version number. This makes it easier for testers to attribute app behavior to particular CI builds. We also use CIRCLE_BRANCH to update the app name with build number and the distinctive part of the branch name. This typically gets truncated on the device home screen, but it is good enough to be helpful for our QA team.

The final .ipa (iOS) or .apk (Android) binaries are uploaded to our S3 storage, or TestFlight for app store release builds. Our S3 bucket has a web frontend which allows installation directly to device.

For our Compass and Anchor buids we deploy both Beta and Live versions (these are versions with different bundle ids) at the same time. We field-test the Beta builds in one processing facility for a few days before rolling out a new live version nationwide. It’s useful to be able to have the Live and Beta versions co-exist on the field-testing devices at the same time. If a new version has a production issue the employee can always fall back to the current Live version in an instant. It’s better to simply build and store the Beta and Live versions together to make absolutely sure code changes don’t sneak in between the two.

We are pleased with the CircleCI’s iOS features and support. Making things like fastlane and xcpretty available is quite helpful. Also, they tend to keep up with the latest Xcode versions. Whenever a problem arises, the support team has been responsive. In particular the ability to ssh directly into the machine during a problem build to investigate the situation ourself has been able to speed up problem identification a few times. CircleCI is a vital component of our mobile engineering process at Shyp.

Automating Web projects with CircleCI

We rely heavily on CircleCI to test, build, and deploy our Web projects here at Shyp. We’ve got quite a few projects that are deployed by Circle, including:

Shyp.com (our main marketing site and signup/onboarding flow)
Various internal tools for Customer Experience and Operations teams
Our tracking site (“track.shyp.com”) and referral site (“get.shyp.com”)

All of these sites consist of static assets (JavaScript, SCSS stylesheets, EJS templates, images) that we compile and optimize as part of the build process, and some contain additional server logic for dynamic pages and routes.

Testing

Whenever a new pull request is opened against the Shyp.com repository, we run:

Static analysis linters (eslint, stylelint)
A complete build (producing bundled assets — SCSS, JS, EJS). This acts as a basic sanity-check and is also used when deploying a built copy (we’ll get there later on).
Hundreds of unit tests for frontend and server code
Server integration tests — we spin up a local web server and make some sample requests to validate things like routing, redirects, middleware are being applied correctly.
UI Automation tests using Selenium WebDriver

These tasks are all broken down into Makefile targets and shell scripts, so our circle.yml block ends up looking something like:

https://medium.com/media/5699e3387cf696d0934e4a5fd5d964f6/href

We recently set up the UI automation test suite, and have been using this both to assert that our interface is behaving correctly, and to record screenshots at various points in the UI. CircleCI containers provide a chromedriver binary, so using the node.js selenium-webdriver module is as simple as:

const webdriver = require('selenium-webdriver');
const driver = new webdriver.Builder()
  .withCapabilities({ browserName: 'chrome' })
  .build();

We’ve added a small helper function so that we can take screenshots from anywhere in our test suite:

function saveScreenshot(filename) {
  const dir = process.env.CIRCLE_ARTIFACTS || '/tmp';
  const screenshotPath = `${dir}/${filename}`;

  return driver.takeScreenshot().then((image) => {
    return writeFileAsync(screenshotPath, image, 'base64');
  });
}

Any files we add to the CIRCLE_ARTIFACTS directory are persisted by CircleCI in their “Artifacts” tab, so we can easily click through from a GitHub pull request to review them:

In the future, we’d like to possibly embed key screenshots as comments on each pull request, and run some automated visual regression tests.

Deployment

Many of our projects use CircleCI’s Heroku integration directly, but some have more complicated needs. One example is one of our internal tools, which we deploy statically to S3 using their ‘website hosting’ feature.

Rather than setting up bucket configuration by hand and risking inconsistencies as we add deployments, we’ve committed the configuration files to the repository and use CircleCI to apply them to the buckets we deploy to:

https://medium.com/media/4bc095e64f986173be7455cd60b2a2fb/href

This lets us manage the configuration with source control and add new deployment environments simply by creating more buckets.

Another example is a project that we deploy to Heroku, but which needs to compile some static assets before it can be served. We didn’t want to check the compiled code into source control, and we don’t want to run the compilation on Heroku since the build process is fairly slow and involves pulling in some private dependencies. Instead, we compile the bundle on CircleCI, create a Git commit, and push it to the Heroku deployment apps ourselves:

https://medium.com/media/3d78d72e57cb70da15e73ff67b846537/href

We’ve been very happy with CircleCI for all of our CI and deployment needs!

How we Automate Mobile and Web Projects at Shyp with CircleCI was originally published in Shyp Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.