Cyberattacks on (and the defence of) the 2019 Macau Civil Vote on Chief Executive Election
This article is based on my presentation at the Taiwanese g0v Summit given on 05 December 2020. There is a related article on the political threats to the vote.
This article and the presentation only represent my view but not that of the organiser of the 2019 vote.
Experience from the 2014 Civil Referendum
The 2014 Civil Referendum was conducted with high rigour. The ID data of all voters were checked. The numbers generated were highly credible. Based on the experience from the 2014 vote, the volunteers and the leaders of the organiser — New Macau Association — wished that the vote could take place in a safe and low-risk environment. As a result, this vote was not named a “referendum”. Also, no ID data would be collected. Macau phone numbers were the only means of verification.
Design considerations of the voting application
Although the 2019 vote was expected to have a much lower risk for not processing sensitive data such as personal information, the voting application was designed under the assumption of processing sensitive data. Also, the counting of the votes and the release of results should be more transparent than it was in 2014.
The core business logic of a voting application could be very simple, just like the few lines of pseudocode shown here. Verify whether someone voted. If not, let the person vote. Some might wonder if it would be necessary to overengineer a voting application at all? I would say, a simple design like this would have dire consequences.
Firstly, it is necessary to record and throttle of the verification attempts of the same phone number if ID card number. The logic is commonplace. I think no elaboration is necessary here.
Secondly, data are encrypted on different layers. In case of a security breach on one of the layers, it would not jeopardise the security of the vote. There were three keys used in this encryption setting. Losing any of the keys would cause the permanent loss of the original data.
Thirdly, the verification of Macau phone numbers had to go through the short message service of Macau’s telecom networks. The overwhelming majority of mobile phone users in Macau use WeChat for instant messaging. So, I could not use securer services like WhatsApp for verification. What I could do was to send out verification messages from telecom operators outside Macau. The voting application rotated sender identifiers to avoid interception. Unfortunately, we realised that the messages sent from non-local numbers could not reach about 17% of Macau numbers. The Macau market was so insignificantly small that foreign telecoms did not bother to investigate the issue promptly. Finally, I added a fall-back mechanism to send verification messages through a local 4G modem for the messages without a delivery status update from the U.S.-based telecom operator.
Fourthly, the transparency of vote counting had to be enhanced. Every voter would receive a “query code” after successfully casting a ballot. The voters would track whether their individual votes had been counted in the final dataset. Also, every vote entry is linked cryptographically to the previous vote entry to form a chain.
When voting was open, the two-hourly turnouts were released in conjunction with cumulative digests. After the vote was closed, external parties could check the consistency of the dataset and the digests to independently verify whether the data had been altered.
The attacks and the defence
First, the attackers scanned the application for known vulnerabilities. Since the voting application was newly developed rather than based on an existing content management system, vulnerability scanning yielded nothing. From the simplified Chinese characters on the right-hand side, it was not hard to figure out the origin of the attack.
Second, I mistook that the limit on message sending had been present but was in fact not activated because of a change in the design during the development. The idea for resending a verification message was abandoned. So, an API endpoint was found to be exploitable to send short messages to the same phone number without a limit. Fortunately, the issue was discovered and fixed in 1.5 hours. However, in just 1.5 hours, the malicious requests consumed about USD 170. This amount of money was the only material loss arising from the cyberattack against the vote.
After fixing the issue of capping message deliveries, I did not want the attackers to be aware of the fact that the attack pattern had ceased working. I wanted them to believe the pattern was working. So, I intentionally kept the status code of success in response to malicious requests. Initially, the rate of malicious requests was just about 1 per second. I think it was acceptable to let the ineffective attacks continue to come through. The purpose of this is to consume their time in order to reduce the chance of the attackers changing the attack pattern.
Blocking IP addresses is an endless cat-and-mouse game as the attackers constantly change the exit points of the network flow. So, I abandoned IP address-blocking and turned the attention to the control of network flow. These were the settings of the network flow control at the beginning of the vote. The initial control was pretty lax. I am sorry for not keep a record of the final settings.
The “laisser-faire” strategy in response to ineffective attacks probably consumed the time of the attackers. I believe the attackers have been deceived by the fake success status code. The attackers gradually increased the rate of ineffective attack from 1 request per second to 260 requests. However, this strategy could not last long.
The rate-limiting mechanism of Cloudflare took seconds to kick in for the traffic from unseen IP addresses. Due to high-concurrency problem, the voting application could not necessarily stop the flood of requests in the first 0.5 seconds. The server processed multiple requests almost simultaneously. The first batch of requests passed the limit check before writing to the database. However, the limit could be surpassed after the first batch of entries being written to the database. The high concurrency problem had no impact on the integrity and the security of the vote because the verification and the subsequent attempts would not be successful. However, technically, it could be considered a small breach of the designated rate limit. There are solutions to high-concurrency problems. But the time needed to revise the application would be too much.
I made some interesting observations while work on the defence of the voting application. First, the Chinese attackers were on a 996-ish shift. “996-shift” is synonymous with exploitation in the Chinese tech industry meaning working from 9 am to 9 pm for six days a week. I found that the change of IP addresses would only occur from 10 am to 10 pm Chinese time. I guess this period was their shift. My time was British Summer Time. So, I could analyse the log and decide the best response to the attack carefully after the attackers getting off work. In other words, the time difference between the attackers and me was an advance to the defence.
The second observation is that the design had to strike a balance between ease of use and security. For example, reCaptcha — picking pictures to prove you are a human — would be easy for us but difficult for the older generations. Users would easily get frustrated after multiple failed trials. This is a piece of genuine feedback from a frontline volunteer. Turing test is an important mechanism to prevent the interference of automated programmes. But in the future, we might need to find a better way to prove humanity.
Reflection and the lessons learnt
Despite the success in the success in the defence against cyberattacks, the vote could not withstand the political threats.
From my observations, the suppression was discreet yet powerful. I guess the reason behind was to avoid attracting more attention. For example, why there had not been a massive DDoS attack? If it had been the case, the international news media would have paid some attention to the vote. So, technically, they pursued an attack pattern which was more time consuming and was with a lower success rate.
Overengineering the voting application may be worthwhile to provide more layers of protection. Moreover, the technical support provided by technologists remotely free from the interference on the ground could be integral to the security of the data. Yet, one thing may need reconsideration. More experiments will be needed to tell whether the laissez-faire response to ineffective attack attempts would work or create some problems.
Personally, the development and maintenance of an application take a lot of time. Indeed, in August 2019, I was busy with a law dissertation in parallel. Initially, I planned to take two weeks to help with the vote. It turned out to be four weeks. After the result announcement, there were less than 10 days to the deadline for submission. But the dissertation was just 60% complete. So, do not add a great responsibility on your shoulders when you have something else to be busy with already.