Continuous Delivery for Android at QuizUp

Berglind Ósk
QuizUp Blog
Published in
11 min readFeb 12, 2016

At QuizUp we use continuous delivery for our mobile clients. But on the road to the smooth process we have had some bumps, some expected, others not. To give you the context of our user base, we currently have 42 million users that play 5 million games per day. A bit over half are on iOS, but recently our DAU (Daily Active Users) have been over 50% on Android! That we can actually thank to a recent popularity in Brazil.

But why is it so important to ship often, and especially on mobile? In general, it’s much better to ship smaller incremental changes. This reduces time spent on releasing, and the cost and risk of broken code. At QuizUp, we want to get our code out to the users as fast as we can.

The main difference with releasing on iOS vs. Android is the review time in the App Store. Apple reviews every update, which takes an average of 7 days and can end with rejection. We fortunately don’t have to deal with that delay on Android. We upload to the Play Store, and it’s out in the wild within two hours. This means we are releasing around every two weeks on iOS vs. every week on Android. Another difference of releasing on mobile vs. web is that on web you can deploy new code to your website, and bam, it’s out there for everyone in the blink of an eye! Ok, maybe in a couple of blinks, but you get what I’m saying. On mobile, when you release a new version, you have no guarantee that people will actually update to the latest version, so there is always this trail of old versions out there. Versions with old, bad code instead of your new and shiny good code. So by releasing often, you also increase the chances of people having more recent versions. Google fortunately set the Play Store to automatically update apps shortly after we released QuizUp for Android. Probably because of that, we have a pretty good pick up rate for Android, with an average adoption rate of 7 days.

Here’s a rough timeline of the lifespan of QuizUp, not counting the smaller QuizUp apps we created before. We released QuizUp only two years ago, which doesn’t sound like a lot, but in the mobile world, it really is a long time. So much has changed since then, not only in our organisation, but in the mobile development environment and mobile market in general. A year after our initial release, in May 2015, we launched a big revamped version, QuizUp 2.0. Before the release of QuizUp 2.0 we realised we had to make our release process smoother so we could ship more often. At the time, the release process was all manual.

After the initial Android release, we split our discipline teams up to form two feature teams, and one release team, that was responsible for the stability and quality of the build before it could be released. The release team controlled the so-called release trains, that were scheduled at certain intervals. The feature teams then aimed at getting on board a release train with their new releases. If a feature team missed a train, it just took the next one. In theory, this sounded pretty solid, but in practice it was a really slow process. When a feature team missed a train, there was a lot of pressure to catch the next one and sometimes neither feature team even made it. At this time we were also stuck in the mindset of having the same release train for Android and iOS, which meant that a new Android build had to wait until the iOS app had been approved. Since we hadn’t turned full force into test driven development, we spent a whole week of manual testing and fixing small bugs that came up, but for larger ones the code was sent back to the feature team for fixing. I think we only released a new Android version every 3–4 weeks, which we thought was a shame, but our process at the time didn’t offer a better solution. Another discovery from this experience was that the mindset of people just isn’t right when they are not responsible for the end result of the code’s reliability. That just happens on a subconscious level.

We did some improving iterations; we organised the manual testing much better around specific changes, we had the responsibility of releasing within the feature team where the team scheduled a release after each sprint and we implemented our shake and report system. The shake and report system works like this: If you are a team member or a tester, you can shake your device and send an email to us which contains screenshots, logs, device and player info. This makes debugging a reported bug so much easier. We have now made this feature an opt-in for all our users. We also started to automate the process of distributing a daily in-house build and soon we also changed that to an hourly debug build to get the code even faster to the hands of our QA department and more people. And importantly, when we were working on our 2.0 version, where we basically rewrote everything, we made it a goal to be very test driven. Writing unit tests has the benefits of a better structured, better written and more reliable code.

Another thing we started to do to be more confident in our builds was to utilise Play Store’s Beta group. In the Play Store, when you upload a version of your app, you can upload it to either Alpha, Beta or Production. I’m just going to talk about the Beta group here, not the Alpha, but the same practices apply for both. Adding people to the Beta group is done by creating a Google+ group, and give it permission for the Beta. This is so simple and so effective. We get a lot of valuable feedback very fast and see the power of having our users test for us. Users that both think differently than the in-house staff, and have more variety of devices. In fact, you can download QuizUp on almost 10.000 different types of Android devices! Thus we decided to always use the Beta group to test our build before releasing. The only thing that bothered us with this arrangement was that the Google+ page we used became very noisy. What we learned from this was; give people a platform to complain about something and they will, and also they will complain about everything else. We have now switched to open Beta, which means that anyone can register as a beta user by clicking a link, without us needing to accept them like we had to do when using the Google+ pages. The cons of this are that we don’t have a concrete overview of our beta testers, which can come in handy when we need to communicate with them. Sometimes we write special release notes for the Beta build, but we soon realised that most of the Beta users are also active in our own QuizUp Feedback topic, which we now use to relay information.

The power of a Beta group

But still, this wasn’t enough. There was still too much time going into manual testing and the release process had too many steps, done so rarely, that people felt uncomfortable doing them. Even on Android, when you can pull back an update rather quickly, it is stressful to release to millions of users if you are not confident in what you are doing.

We wanted to have this process as smooth as possible and to be able to release as often as makes sense while still assuring the quality of the build. We had grown a lot more confident in our code after we started doing test driven development so we decided to ship a version of a potential release candidate to Beta every day. We have also been iterating on our agile processes so our QA is very quick to verify that tasks work. So even if something is broken in the Beta, we fix it as soon as we find out. That should happen within 24 hours, and we will automatically release again the following day after we decide that it can be be acceptable to have a small bug in the Beta build for 24 hours.

We also wanted to identify all the steps of distributing in-house, to Beta and releasing and see if we could automate it all! So for both our hourly builds and the Beta build we have to:
* Build all the tests. One of the downsides of programming on Android is the build time. Running Android tests that require the Android SDK and a device to be connected to the build server takes a long time. Running all of our Android tests with a clean build takes about 20 minutes. Last year, Android finally offered writing jvm tests that are pure Java and are a lot faster so we write only jvm tests now. I can’t stress enough what this speeding of building tests has sped up development time and helped maintain good mental health! Recently we have also been able to build targeted to Android 6.0, which is also a lot faster. We have started to do that locally and are looking into doing that on our build servers as well.
* Obfuscation for the release build. It’s very easy to reverse engineer an Android apk file, and we don’t really want people to do that and use as fake QuizUp apps or cheat so we obfuscate the code before releasing, which usually takes another couple of minutes. We also do other sorcery like stripping out logs other than error logs, shrinking resources and etc.
* In-house distribution with HockeyApp, both the release version and an identical debug version. With the debug build we get more logs and it contains all kinds of cool developer options. We still need the hourly release version as well for our QA-team to use for testing, since the obfuscation can cause bugs.
* Translations are updated before shipping to Beta. In case you didn’t know, QuizUp is available in 5 other languages besides English.
* Tag the version on github.
* Release notes added in all languages.

Release pipeline flow chart

Feature branches are merged into master after code review. Jenkins builds hourly debug and release builds, that are sent to HockeyApp, and daily builds that are also sent to the Beta group through Play Store.
We have now automated all of these processes and in theory could release automatically to production every day.

Continuous Integration health dashboard

In reality, we release weekly which is done manually by clicking a button in the Play Store and Amazon App store. The reason for that is simply that we have not automated the process of verifying if a build is healthy enough to be released, though that is totally doable. We have made some progress on that front, like using this very cool build dashboard that hangs in our space. It highlights a build in red if something CI job failed. We have a rotating weekly role within the Android team of being responsible for a release. We call that role “Having the release hat” (and yes we have a physical hat). That person watches both the health of the latest release and the beta builds health by watching crash numbers, error dialogs and other metrics we track. This process is very similar for the iOS team and we feel it’s very important that every single one of our mobile developers is comfortable with releasing. On top of that we have a great QA and support teams who watch user reports, filter out all the noise and let us developers know if something is horribly broken.

Now you might think you know all there is to know about our release process, but I have still not mentioned A/B tests! We ship almost all new features with A/B testing, which means that we have a large control group, group A, and a smaller experiment group, group B. Only the ones in the experiment group get the new feature. This approach is very powerful in terms of backing our decisions up with data and to be able to tweak features for better results before they’re released to a larger group, to everyone or reverted. With this we can also control the release of features without actually releasing a new build. This has also proved useful in cases where a feature was broken in some way. In those cases we can just turn the test group completely down so the feature won’t bother users.

Lastly, I want to share with you a few mind boggling problems we encountered during our conversion to continuous delivery.

Semantic versioning was a struggle! You all know this, X.Y.Z, major version number for large changes, minor version for new features, patch version for bug fixes. How does this make sense when releasing a new version daily? After discussing this back and forth, we realised that the users really don’t care, this was only a problem in the head of us developers.

So we ended up with appending the build number to the normal semantic versioning system and updating that with every daily release to Beta. Then when we release to production, we just update the semantic version like normal, and only use that version number when talking about the release number.

There is no beta build. The beta build becomes a production build, which makes it difficult to know afterwards what was a release build. It has been difficult to communicate to people, that there isn’t really a beta build which can contain some special feature. This has complicated things a few times when working with external tools that look at beta builds and release builds as a separate package. Also in-house when looking at data where we want to compare beta builds to the release build and having features only in the beta build. For example, we allow all users to shake ’n’ report, but reports from normal users go to support while reports from Beta users have more urgency so they go directly to QA. For that feature, we check the timestamp of the build, and if the build was created within 25 hours, we assume you are a Beta user. This feels a bit hacky, but works.

Release notes. One big pain with releasing has always been getting relevant release notes, and translating them to all languages in time for release. This seemed to be a stopper to our automation process so we had to really think about how important descriptive release notes are. Do people read them? Is all of this time consuming trouble worth it? There are different opinions on this, and while we all agree that descriptive, fun release notes are really cool, they weren’t really worth all the trouble. Especially since we are releasing features under AB tests and can’t include those in the release notes. We had a period last summer with very boring static “Bug fixes and stability improvements” release notes that we incorporated into our automation process, but it really bugged us, so we have now a lot of fun copies in all languages that basically say the same thing, but in a more fun manner. Every once in a while we do include descriptive release notes, which is still a manual process, but we are almost there to have that automated as well.

“We’ve made a few changes to make QuizUp harder, better, faster, stronger.”

“A bunch of bugs were harmed in the making of this update.”

Video Version

I talked about this at an Icelandic tech conference called UT Messan.

--

--