Building software to run elections in Sri Lanka (and beyond)

National elections in Sri Lanka are conducted by the Election Commission (EC) of Sri Lanka, https://elections.gov.lk/web/. In fact, they have a bigger mandate- they can run elections not only for national issues but for anything .. they are designed to be an election-as-a-service organization. Of course the more important elections they are run are the Presidential, Parliamentary, Provincial and Local Government elections of our country.

Since 1980s, the EC worked with University of Colombo (and University of Kelaniya for the last local government election) to use software to add up and disseminate results. The way that worked was by the university bringing in a team a few days before the elections and entering all data from the Colombo office. Data would come to Colombo via faxes.

The EC also built a bunch of systems internally for managing the elector register (that’s what you hit when you check whether you’re on the list) and for various other functions. Those are on-going systems and not “for the moment” solutions like the tabulation systems done with the universities.

Agreeing to build a platform

The first discussions with the Election Commission IT team took place in March 2018. We met at the WSO2 office (I was still working there then) and we sketched out a vision for how we could design a platform that would allow them to manage the entire election process via software from beginning to end:

That document we started is available here.

However, due to various reasons that first start fizzled out and we couldn’t get going then.

Later last year, Wasantha Deshapriya became the Secretary to the Digital Ministry. Wasantha had been at the ICT Agency for like 11 years prior to that (and then was Director General of SLIDA for a while) and was a key person in many e-government projects in the country.

As his name suggests, Wasantha happens to be the brother of Mahinda Deshapriya, the (in!)famous Chairman of the Election Commission. [On a side note, one of their other siblings is Sunanda Deshapriya, a human rights activist and a very vocal critic of pretty much every government! Oh what I would give to be a fly on the wall in a family “discussion” of politics.]

The recent local government elections results delivery had gone a bit awry because of tabulation & dissemination system issues and the Election Commission had a bit of egg on their face about that. (That was not as simple as saying “bad code” but rather much more due to the mismatched expectations.)

So Mahinda Deshapriya was rather skeptical about trying something new, but he knew he needed some software solutions to help run elections. The first meeting was in the neighborhood of “this IT crap” ;-).

I have known Wasantha for many years and he knew about LSF and that we were trying to help government digital technology in various areas, including elections. After many months of discussions, with Wasantha and Prof. Rohan Samarajiva, Chairman of the ICT Agency, playing critical roles in building confidence in LSF, and after multiple discussions with all three commission members and various leaders of the organization, we finally signed an MOU with the Election Commission in November or so last year. Per the MOU, LSF would become the technology partner of the commission to help them build the software they needed and to help them build the capacity to own it themselves. And LSF would not receive any money from the government and would find its own money and do it.

The MOU can be seen here.

And so we started.

The ICT Agency, the Digital Ministry, University of Colombo (which had partnered with the EC since 1980s) and later SLCert all became part of this effort. University of Moratuwa also agreed to providing (volunteer) CS/IT students to be in all the district result centers to provide technology assistance but that was scrapped last minute due to concerns by district level leaders.

Going beyond the various orgs who lined up to support this effort, someone had to write the actual code! LSF is currently funded by donations from a few organizations (99X and WSO2 only right now) and can’t afford to hire a bunch of people to write code. In any case, we wanted this to be truly an open project with lots of contributors as this is a country critical system that needed to be trusted by everyone — the more open, the more transparent, the better. LSF hired a core team of 3 people for the 3 main systems we were doing (incidents, nominations and results; more later) but we knew that’s not enough to write everything and to do all the security & deployment related work.

We openly called for help (I tweeted and posted on LinkedIn) in various specializations and we had so much help that we even set up a volunteer team to manage volunteers! Sri Lanka’s software engineers are very civic minded for sure, and we had help from several outside of LK too. Sustaining volunteer contributions is hard, but there were many who walked in and never left and its safe to say if not for that we would never have made it.

This is not a blog about the people who helped implement this. I will either write a separate blog about that or find another way to appreciate all of them. There were close to 100 people who helped put this together so I don’t want to mistakenly leave anyone out or not acknowledge the key people who made it possible.

There were of course a small core team inside LSF that worked incredibly hard to get this done. In addition to the core LSF engineers, that team includes key people inside the Election Commission, cleverly managed and coordinated by Bandula Ranathunga who was tasked to manage this project on their side. On the LSF side, Sherazad Hamit, who leads the LSF Code for Sri Lanka initiative, played the same role and lead our fearless band of (mostly young .. not counting me!) engineers who constantly joked about going to jail or getting beaten up if we screwed up. Without those two, there’s no way in hell we’d have got it done — so thank you Bandula & Sherazad!

As part of our capacity building program for the EC, we helped interview and recruit 8 new people to the EC via the LK government “IT Service”. (It was really interesting to understand how government recruitment works and to be part of the interview process actually.) The new people, along with existing technical team members, will become the anchor of this work over time. They were fantastic and really came along with us on the journey; we look forward to helping them become the total owners of the system we’re building with them.

The software that was used for the 2019 Presidential Election

Well as with all software projects, we didn’t get everything we had hoped to do done in time for this election. We had broken the work down into several areas:

  • Nominations
  • Incidents
  • Results Tabulation
  • Voter Counting
  • Elector Data Analytics
  • Deployment
  • Identity
  • SQA
  • Security
  • Performance
  • Docs
  • Training
  • Operations
  • Staffing
  • Bulk Config Data Entry
  • Results Dissemination
  • Results Website

We identified the EC owner, the EC technical project manager, the LSF technical project manager, the business analyst person to help clarify the problem, the person responsible for architecture and the contributors to each area. For some of them we had no names :-) .. still got some stuff done!

The list above doesn’t even count everything — for example we needed to deeply integrate with the existing elector registry for some stuff.

But we built a lot of it! Here’s now the explanation of what we did get done in the last few months of writing code (since about July I’d say).

The system (as of PRE2019)

The nominations system (written using NodeJS and React) was ready but, even with 35 (!) aspirants for president, it was not worth trying to use it for this election. The problem is a lot harder with parliamentary elections (for 196 seats), provincial council and local government elections. The latter draws on the order of 100,000 aspirants! The system is designed to let political parties manage their nominations right in the system and do direct online submissions as well as generate the paperwork that is still required by law.

The staffing system was in early stages of development. This component is critical as the EC manages on the order of 400,000 people for the duration of the election. They need transport, payment and the issuing of all kinds of things. Obviously we didn’t use this.

The greyed out Issuing/Receiving is actually close to being done and I just forgot to mark that as under Development! (These images are from the “handover” presentation we did for the EC last Wednesday.) That’s a mobile app that is used to issue ballot boxes, ballot books etc. and to also get them back at the counting centers. The system lets you take pictures of the seals etc. so that we can improve accountability and transparency. The app is written using Google Flutter. Anyway, we didn’t use this either.

Tabulation was the master of ceremonies on the day of the election. That’s where people entered tally sheet data from all 22 electoral districts (see my blog on how vote counting works if you want to understand more) and where all reports were generated, reviewed / confirmed, printed, signed, re-verified and issued out to media as a result. That system was of course ready and indeed was the fantastic master of ceremonies! This is written using Python and React.

Dissemination is what receives a result from the tabulation system and gives it out to media. We set up both a push mechanism (with software that media could download and run) and a registered media-only, access controlled website where they could download any issued result. All results were made available to media in JSON, XML, HTML (for human readers — as requested by radio media) and the scan of the official signed result document. This system has a server component written in Ballerina and a program we gave to media which is also written in Ballerina.

Then the final piece of the election puzzle is displaying results to the public. For that we had a separate app (at https://results.elections.gov.lk/) that also has all the old election results and was tapped into the results being pushed out by the dissemination system and gave a running update on the results. That is a NodeJS app with a React front end.

Of course behind the scenes right throughout (and live for about a month now) is the incident management system. This is where all election related incidents (complaints) are logged. The system also has a public complaint page (at https://incidents.elections.gov.lk/report) and is used by both the EC staff as well as related organizations like the Police. The daily numbers you saw saying there were this many incidents of that type were coming from this system. This is written in React and Python.

Testing the system

A system that is so sensitive (especially results tabulation) had to go through several iterations of verification. We ran 7 mock elections using real 2015 Presidential Election result tally sheets for one district at a time where staff from the Election Commission sat and entered data, did the verifications and ran the flow through. Each one gave us useful insights on what worked, what parts were shaky and so on.

And in this process, about 2–3 weeks ago we rewrote the entire front end of the tabulation system, the most important part of the system! It had become just too unwieldy after repeatedly adjusting to meet requirements that we only understood over time.

The scariest part was that we were not able to test the whole flow all the way to media delivery at all. All those pieces didn’t get completed and integrated until the very last minute, plus we had the issue of not being able to test with real 2019 names/parties as that could result in possible “computer jilmart” accusations (imagine finding a DB dump with test data with real candidates).

So, as you can imagine, we were scared shitless until the first result went all the way through to media.

Yes it did successfully, close to 2am on Sunday November 17th. Phew.

Development, CICD & deployment

The system is designed to be deployed on Kubernetes for all the services and with a MySQL cluster in VMs and an NFS file system for storing uploaded images (scans of docs).

We created 3 environments — a development environment, a staging environment and of course the production environment. More on the latter later.

All of these are on Lanka Government Cloud (LGC v2, https://lgc.gov.lk/), which offers OpenStack managed VMs. We got a whole bunch of VMs and set up K8s on some of them and got going.

For CICD and overall management of the K8s clusters, we used a fantastic tool called Platformer (see https://platformer.com/). This is an Australian company started by Sri Lankan guys and with the core technical team in Colombo and they really made the process of managing our CICD and even production environment such a breeze. (More on the latter later when I say how the day & night went.) Awesome product and great team of people who made these aspects a breeze!

Checking system security

A system this sensitive needed to be checked by as many different parties as possible to give confidence to the EC that it was safe to use.

As part of the MOU, the ICT Agency took on responsibility to do a QA pass through the system. ICTA folks were also part of our various testing sessions and helped review usability of the system.

The EC also engaged SLCert to do a vulnerability assessment and penetration test of the system. They found it was good ‘nuf to go (complained about some textbook stuff but that’s a story for another blog on what practical VAPT should be).

Finally we asked a few selected, trusted members of the Colombo White Hat Security group to give it a go. They ran some DDOS attacks and various other attacks but were not able to make it through.

Phew.

We of course had lots of system monitoring and tracing in place that we monitored right throughout. We were confident we could run the system safely enough (and we did).

Managing identities, roles & permissions

The system required different users with different authorizations — from being able to enter data for a particular district to verifying data to validating reports to approving them etc..

We use the WSO2 Identity Server as the platform to manage identities. All users were issued personalized user IDs, a system generated password, a requirement to enter their NIC on first login and an OTP code sent via SMS to be entered on login. However, because most users had no access to mobile phones we had to issue pre-generated codes to them as well.

Each user was assigned particular roles for particular geographies. The latter was encoded as claims that went with the JWT after they logged in. As the apps were all SPA style, the WSO2 API Manager was used to front all APIs and enforce authenticated access.

Production deployment

The production deployment was designed with 2 disaster recovery (DR) sites in mind. We used LGC as our primary deployment and had a DR site in Digital Ocean ready to go in case of a problem in LGC and also had planned to have a DR site inside the EC itself. The DR situations we were ready for were the following:

We originally planned to use async master-master synchronization for the databases but later we backed down from that to a simple master-slave setup to get data sync’ed to DR sites as we just ran out of time to get that verified. We were also dumping data and moving copies to a separate location every 30 minutes. The data base sync model looked like this:

Each deployment looked like this, with fewer resources pre-allocated for the DR as we could scale that up if needed:

There was also a bastion host to access everything to further increase protection.

In the end we ran out of time to set up the DR site inside the EC itself. We used that hardware as a data backup only instead and felt we could install and decided to hope like hell that all Internet access would simply not go away.

It didn’t.

Dashboards

One key advantage of having a central system to record all results as they were being entered is the ability to see them before they’re final. So we built a whole bunch of dashboards for both the district levels and the EC to see what was going on. This was done using Grafana, a fantastic tool.

Releasing results

When the district level returning officers sign off on a result (after verifying that the system computed data and the manual additions match), it became available in Colombo.

Here the generated result document was compared to yet another manual verification (done with faxed tally sheets!). If they matched, the result was ready to be released.

The process was as follows:

We also had a dashboard that showed us where each result was:

Delivering results to media

The EC has a long standing model where they released the results to media in a machine processable form. They used to deliver it as XML and a text format. This time we gave it in JSON with the option to convert that to XML (or even to HTML for human readability).

When an authorized person certified the result, they’d click on “Notify” and then “Release”. That would HTTP POST the result to the results dissemination system which delivered the data to media.

Results to public

The results display system showed results to the public live as the data was being pushed to other media (this system tapped into the same feed):

This was running on GKE on Google Cloud. The load was significant (scaled up to 11 containers) but not a big deal, although the nginx ingress there was feeling quite stressed.

The day

Early on Saturday we created the production election in the system, created users, assigned roles (all automated obviously) and distributed access credentials to all 22 districts.

There were a few problems with login for various people (they had given the wrong NIC so couldn’t login, didn’t have the right credentials delivered etc.) but by the time tally sheets were ready to be entered we were all good to go. I think we gave out about 200 or so data entry level accounts (and not sure how many were used .. we should check).

With every result we were also sending an SMS to all the media folks. (We used Twilio and after paying a crazy amount we finally shut it off later.) However, the first result went with um an interesting title:

Notice anything awry about the first one?! We sent results for the “foo” election instead of “2019PRE” which was the code for the presidential election of 2019. Ooooops. Fixed from the next.

We did 12 production system updates that night live while the system was running. While most changes were minor, the process showed the power of deploying on Kubernetes — no users knew we did system updates!

We were monitoring all aspects of the system via Kibana and the Platformer tools.

The software was ready if we had to go to round 2 of the election(if no candidate got >50% of the votes), including with a custom dashboard for that, but luckily it didn’t need to. I say luckily because that would’ve added another 6–12 hours to the long run it had already been :).

In the end we ran the whole night (and until 3pm the next day) without any problems. It was cool to have the Director General login and verify the national result before it became final as that’s how the system should be working!

We set off on this path to help the Election Commission of Sri Lanka become an election-as-a-service organization. We’ve made some progress but we’re far from there yet! Also, the need and the problem is not unique to Sri Lanka — and part of LSF’s goal is to build software that is useful to anyone in the world.

We will be continuing this project, now fueled by the internal team of engineers as well. We always want more help — so if you want to join, please come to lsf-elections@googlegroups.com; https://groups.google.com/forum/#!forum/lsf-elections.

This is only one of many projects that LSF is doing under its Code for Sri Lanka initiative. More on that soon!

Making the world a better place with software