That thing about regulating fintech
I have recently written about the way N26 bank was pwned (by a whitehat, fortunately), and on Twitter I expressed my surprise that this happened to a (recently) regulated bank. Some of the push-back I got that classic banks’ websites and apps are not really more secure, and whilst the amount of vulnerabilities in the case of N26 was breath taking, I believe that ultimately this is true — with the important difference of course that a major bank will be able to survive even a major breach, whilst a challenger bank might well fold.
People do expect that regulated banks adhere to certain standards, and whilst this is generally true with respect to portfolio risk, capital coverage etc (with the well known exceptions the we all have live through) in my view neither the banks nor the regulators have really tackled their systems security risk — and certainly not to the same level that other risks have been taken care of.
Within the regulatory framework the risk of being pwned would fall under Operational Risk I suppose, which ultimately is a catch all for everything that is neither credit nor market risk. I’d argue however that it is worth singling this particular risk out in more detail, and to develop best practices that all players should adhere to.
Security standards in other disciplines
Safety standards in engineering
Most engineering disciplines have best practices and regulations in place that ensure an minimum level of quality, and that ensure that the inherent risks are reduced to an acceptable level. Those standards are usually drawn up by industry associations, and ultimately encoded into standards defined by the various international standardisation organisations. Those standards in turn the form the basis of the applicable regulations.
Those standards are generally designed by product line. The most stringent standards (for obvious reasons) are in aerospace, DO-178C. A general standard for safety related systems, covering both software and hardware is IEC 61508, which has been adapted for the automotive industry in ISO 26262. For pure coding there are the CERT Coding Standards which are a generalisation of the MISRA standards that had been developed for the automotive industry.
Best practices in computer security
When we look at software failures there are essentially two types
- those were the software suffers an unforced error, eg a self-driving car getting off the road or into an avoidable accident
- those were malicious actors — commonly known as hackers — are trying to trick the software into doing things that it should not do
Computer security deals with the second scenario, ie it tries to assess whether malicious actors would be able to take advantages from flaws in the system. It is important that computer security is assessed in a holistic manner, not with a narrow focus on software and hardware. Importantly, the behaviour of the typical user must be taken into account:
Arguably the most important concept in computer security is that of a threat model:
What am I worried that could happen?
Against what do I want to protect myself?
The naive answer to this is everything but this is not realistic. It is widely accepted that if a nation state level actor wants to get into your systems, and they are willing to spend the necessary resources, then they’ll probably be able to do that. And of course
So for a banking application a reasonable assumption might be to disregard both nation state attackers (because they have better things to do than to spend a lot of money to steal a small amount of money) and threats of physical violence (because this is the state’s job to protect against this, and in any case is rather difficult to avoid). However, someone attempting a man-in-the-middle attack on the connection, or installing monitoring software on someone’s computer should probably be part of the threat model considered — but this will be the topic of the last section in this article.
The key toolkit for computer security is cryptography. The main components of this toolbox are
- symmetric ciphers where participants agree on a common cipher and use it protect their data exchange from eaves dropping (interestingly there are algorithms — notably Diffie Hellman — that allow participants to negotiate a such cipher over an open line)
- asymmetric ciphers that allow everyone to encrypt messages (using someone’s public key) that only the owner of the key can decrypt (using his private key)
- hash (or digest) algorithms that allow to ensure that the content of a message has not been tampered with
- electronic signatures which use the same public/private key pair as asymmetric ciphers, and which allow the owner of a private key to sign any message, in a way that can be verified by everyone who has access to his public key.
Now this topic is extremely complex, and I’ll not go deeper into it, but it is clear that this addresses exactly the issues encountered in electronic banking, for example
- How can I ensure that no-one listens in?
- How can I ensure that the person or institution on the other end is who he/she/it pretends to be?
- How can I ensure that no-one can change instructions — or any other data for that matter — on the way?
Software safety standards in banking
The case for regulatory software safety standards
Now arguably banking is not as critical an application as say aerospace or automotive — if things go wrong the main risk is usually that money is lost, and money can be replaced. From this point of view, it is not clear that any regulation is proportionate: in perfect markets it is up to the customers and shareholders to decide which level of security they want (and how much they are willing to pay for it) and market forces do the rest.
There are a number of holes in this argument. To name but a few
- if the security breaches are serious and widespread, they can lead to the bank folding, which would lead either to depositors losing money, or the deposit protection scheme to be drawn upon, both of which regulation is meant to avoid
- access to banking is a core need in people’s lives; system failures can lead to situations that take a long time to sort out, and during this time people can face consequences that can not be easily rectified financially (eg going bankrupt, losing their home, not being able to start or continue university)
- interconnectedness of the banking system, and the connections that they have to payment systems etc means that systems failure in one institution could temporarily bring down eg the entire payments system, with all the knock-on effects this has to economy.
- cost efficiency: customer have no capacity to individually assess the safety of a bank’s systems, or rather, assessing it would be prohibitively expensive. Also there is a free-rider problem, so ultimately it is economically the best of adequate security standards are certified by a trusted body.
Integrating software safety standards into regulation
As I said above, software safety standards are — or at least should be — already be part of the operational risk assessment of a bank that is regulated under the Basel framework. Strictly speaking however regulators’ only concern under the Basel regulation with respect to software failures are whether
- those failures put the bank — and therefore its depositors — at risk, or whether
- those failures pose a systemic threat to the overall banking system, including payments and other ancillary systems
This addresses points 1. and 3. above, but it does not really deal with points 2. and 4. which are more to do with conduct as with prudential regulation. I will ignore those subtleties here (especially if the conduct and prudential regulator are not the same, like in the UK) but will simply posit here that the regulator has both an interest and the right to look at software safety.
The most natural route would be for the regulator to include this in the banks’ Pillar 2 review. In principle, regulators could do this on their, simply asking their regulated banks to add the respective chapter to their submissions under the Pillar 2 review process. For internationally active banks —and in particular for banks active across the EU — however this would be suboptimal when the requirements by home and host regulators diverge considerably. At least within the European Union it would be sensible if the EBA would manage and coordinate the process of coming up with best practices. I do not know whether it already has the power to do so — at least on an advisory basis — but it might be sensible if it was given an explicit mandate to do so by the EU.
In any case, there is a precedent for the EBA to deal with technical software standards: in the context of PSD2, banks have to provide API access to so-called PISPs (lightly regulated companies that initiate payments on a customer’s behalf) and AISPs (ditto that aggregate a customer’s account information), and the EBA is tasked with coming up with the standards for those (which they incidentally did whilst I was writing this article).
Some practical suggestions
Regulators need to tread a fine line between not impeding innovation by being too rigid in what they’d prescribe, and being too forgiving and having things fall apart, with outcomes that range from just bad to potentially catastrophic. The regulatory pendulum swings back and forth over time —in the early 2000s we had the infamous ARROW principle that lead to being Northern Rock receiving very little regulatory attention, until the day they collapsed, and in the aftermath banks have been swamped under a deluge of new regulations.
Fintechs have so far been treated significantly more lightly, but this will certainly change once they get more relevant as providers of financial services. It is important to find the right middle way here between over- and under regulation.
As I have said above, computer safety consideration would fit most neatly into the Pillar 2 document as a sub-section of the operational risk section, with the caveat that some of it is more conduct than prudential regulation and therefore is not really part of the Basel framework.
Be that how it may, let’s assume there will be a requirement by regulated companies to provide a section on computer safety during periodical review, and let’s discuss what should be in it.
As said above, computer safety needs to deal with two mostly independent categories, notably
- unforced errors that can manifest themselves without external intervention, and
- weaknesses that can be exploited by malicious third parties to gain advantages for themselves, or simply to wreak havoc
I’ll discuss those two aspects in more detail now.
Unforced errors have a hardware and a software aspect.
Hardware. On the hardware side this is mostly about having enough redundancy in the servers and other electronic resources to ensure that outages and data losses do not happen, to a confidence level that is commensurate with the potential to cause damage. Key considerations in this respect would almost always be backup procedures, and how long it would take to restore lost data from backups. In most cases backups would be handled by geographically separated secondary data centres that could act as fast fail overs if the primary data centers failed. All of this is of course basic data center security that almost every company — tech or not — has to deal with, so whilst being tedious and possibly costly to deal with, it is reasonably straight forward.
Software. On the software side, the aforementioned standards are a good starting point. Of course in banking systems, errors can be more easily forgiven than in the more safety critical applications mentioned above (with the possible exception of fundamental infrastructure systems like payment or trading infrastructure where outages could lead to a serious disruption). In particular, financial services applications are almost never time critical, so they are not subject to the design constraints that real-time systems are facing, allowing a much wider set of possible architectures.
Ultimately systems reliability on the software side comes down to testing — both unit and integration testing — and having a proper rollout procedure in place. Software faults are very similar to other faults that can slip into a financial institution’s processes, so the standard operational risk frameworks apply. Notably, it is important to maintain accurate statistics of those faults and their impact, which should form the basis of the regulator’s operational risk assessment, extrapolating the likelihood of larger system failures from the smaller failures observed.
Processing capacity. One area particularly important for financial services that straddles both hardware and software is (the security margin on) processing capacity: in distress situations — and in particular in idiosyncratic distress situations — there might be a significantly large volume than usual on a financial institution’s system, for example caused by customers who want to withdraw assets from that institution. It is generally important that those transaction volumes can be executed without the systems falling over, lest the panic spreads and the confidence in the institution is destroyed (there is an interesting story — which is possibly even true — of a bank in Hong Kong who suffered a run because there was a long queue at the store next door, and people thought it was people queuing to withdraw deposits from the bank).
As discussed above, exploitable weaknesses is very close to the area of computer security, so the key tool to work with is that of a threat model. An important point here: when we do threat modelling there are three questions we need to consider
- How likely is it that this threat manifests itself and is successful?
- Are our systems protected against this threat, and if no, could I mitigation them and at which cost?
- What is the potential cost of the threat manifesting itself?
Note that out of those three points, the first one is the least(!) important. Most of the time companies will have no idea how to even frame this question, and in any case the most likely answer will be “that is really unlikely to happen”, so focussing energy on points (2) and (3) is justifiable.
All regulated companies should be compelled to produce a comprehensive list of threat models. This should contain all threats that have ever manifested itself against any financial services company, plus anything else that seems a relevant threat (if in doubt, include it). Regulators should ensure that this list is centrally maintained and circulated, and companies should be required to address all issues on a comply-or-explain basis.
Note that comply does not mean that a company must be insulated against a threat — it must simply explain to their regulator what the vulnerabilities are and why they consider them acceptable.
Important threat models
To get this list started, I want to briefly discuss a few threats that should be present in the discussions of most financial service companies
Successful man-in-the-middle attack
An attacker is able to insert itself between a company and its customer and read all data traffic in clear text. Sub-scenario considerations are that the attacker is able to modify the traffic, and that he is able to operate the API independently using the information obtained.
The certificate system underlying https is not particularly robust, and successful MitM attacks are a real possibility. This is particularly true for browser-based end-points where the service provider does not control the root certificates that are installed on a customer’s computer — if even one of those is compromised then a MitM attack is possible. Certificate pinning is addressing this issue, but it is only a partial fix, notably because it relies on there having been at least one direct connection in the past. For bona fide apps (that can define their own set of root certificates) MitM attacks can be made significantly more difficult when a company chooses to use its own — and exclusively its own — certificate to secure the connection.
Generally https-based protocols do not attempt to identify the connecting party — it is only the authenticity of the server that is assured (with the caveat above), as well as the privacy of the connection. Customer authentication therefore happens in the payload. If a simple password or token mechanism is used then a MitM fully opens up the API to the attacker. Many banks implement a simple challenge-response protocol at login (‘please provide letters 3, 5, 2, and 8 of your password’) which provides some protection in case of a not too long-lasting MitM breach. Note however that even those systems tend to rely on a time-limited token after the authentication has been successful, which does give the attacker a time window during which he can operate the API.
Again standalone apps are in a slightly better place here as they can actually contain some kind of authentication information, like a shared secret, or a public/private key pair that can be used for authentication. However, even this needs a secure channel to exchange information about those authentication keys, so either we need to rely on the https connection not being MitMd all the time, or send this information via a different means (eg letter, SMS, phone).
A good compromise is often to have two levels of protection: you accept that a successful attacker can read important information (passively, or possibly actively) but can not in fact take any action. Typically this is implemented using a second factor, for example
- a company-provided second factor device (eg an RSA keyfob, or a challenge/response device using a chip&PIN card)
- a customer-owned second factor device (eg smartphone app, SMS or even a voice call)
- a one-time pad, eg a collection TANs each of them to be used only once
- letters of the password that had not been used to log into the system, or similar information (name of first pet etc)
Some of those factors work better than others, and as usually there is a distinct trade-off between security and user experience.
The impact of a successful MitM attack can reach from annoying (if it only allows to read some irrelevant data; note however that this data could in turn be used to fuel other forms of attacks, eg when trying to authenticate with a call center agent) to catastrophic (if the attacker can fully operate the API over and extended period of time).
Customer device infected by virus
The device that the customer uses to connect to the service (smart phone, computer) is infected with a virus that has user privileges. Sub-scenario is that the virus will have root privileges.
This is a very high likelihood scenario — at any given point in time there will be a certain number of customers who will have their computers hacked. User-privilege exploits are typically easier than root-privilege exploits, so one might expect the number of customers infected with the former to be significantly higher than those infected with the latter.
For root-privilege exploits there is not very much one can do to protect other than using a genuine second factor (the letter-two-and-five-of-the-password style factors don’t help much here, given that the virus will be sitting around undetected for a long time in most cases). One exception are devices that have something akin to the iPhone’s secure enclave: if there is a sub-system that even root-privilege hackers can not take over then this system can be used to provide a secure second factor even if the device itself is compromised.
For user-privilege exploits there are some ways apps (but not websites) can protect themselves. The principle is similar to that of the secure enclave in that a second factor is provided by a process that can not be influenced by a user-level process. For example, there could be a daemon process that independently communicates with the company servers and that launches a confirmation prompt (with or without need for PIN/password) whenever a transaction is attempted, or that is even used for the login process.
The impact of those exploits can be dramatic, especially if they allow the attacker to make transfers, and/or take over the affected account.
Company systems infected by virus
Parts of the company’s systems are compromised and allow malicious third parties to run arbitrary code with user privileges. Sub-scenario is that the exploit has root privileges.
This is a vast scenario, in that many different parts of a company’s infrastructure can be suffering an exploit, and depending on the details the impact can be quite different. For simplicity, let’s assume four main system groups: front-end, back-end, database, and app development. The first three are obvious and I’ll take them in turns, so let’s start with the last one.
Apps. An exploit could attack the development or distribution platform for the banking apps, inserting a backdoor into the apps customers are using. Unless there is a genuine second factor in place that can not be influenced by the app this would be catastrophic as it would mean a complete take-over of all customer accounts.
Back-end. An exploit in the back-end is equally catastrophic — generally the back-end code can do whatever it wants, there are very few if any protections in place once this part of the system is compromised.
Front-end. An exploit in the front-end can be less catastrophic than one in the back-end, depending on the system design: if the front-end is fully trusted, both are equally bad. However, if the authentication is performed in the back-end then this exploit might be less catastrophic (it still is equivalent to an MitM attack on every single user on the system though).
Database. Depending on system design this is the one exploit that can be comparatively benign. For example, if the entire database is encrypted, then the attacker can not learn much. However, for performance reasons the database will usually be at best partially encrypted, in which case an attacker can learn something. Database attackers will not usually be able to execute external transfers, unless there is some kind of ‘operations to be scheduled’ table. They might however be able to change account balances internally, which could allow them to remove funds via an account they own or have hijacked.
Of course any of those attacks will be able to wreak major havoc on a bank’s systems and bring them down for extended periods of time.
An attacker has access to a customer’s registered email account
The account the customer used to register for the service is no longer under the user’s control. A sub-scenario consideration is whether the user can still read the emails coming into the account.
This is a very common scenario — every bank will have at every point in time a significant number of accounts where the account owner either has no longer control over the registered email account, or some third party can read (and possibly delete) emails sent to this account.
Any reset of credentials that relies solely on sending codes to an email account can not possibly be considered safe, and neither can a bank assume that emails sent to an account will reach the intended recipient. Sending an email to the registered address can possibly be part of a process for resetting credentials, but even this is questionable from a security point-of-view.
An attacker has physical access to a customer’s registered phone
The phone the customer used to register is no longer in her physical possession. A sub-scenario consideration is whether the phone is locked or not, whether the SIM is locked or not, and whether the customer is aware of it or not. This also covers the scenario where an attacker is able to obtain a replacement SIM.
Again, this is very common scenario. Phones are lost and stolen, and in many cases configured to display SMS’ on the lock screen so any reset code sent via SMS can be accessible. Also, it is often possible for attackers to obtain replacement SIM cards from mobile phone companies. Last but not least, some phones can be broken into even if they are locked — note in this context also the possibility to lift finger prints from various objects (eg, a glass) and use those prints to unlock a phone
The impact depends on what the phones are used for. If for example it contains a banking app, and the banking up is not protected via a password (or only protected by a finger print that has been lifted), then physical access to the phone can mean full control over an account.
If the phone is only second factor then again it depends on the scenario: eg if the second factor is an SMS, and the SMS can be seen in the lock screen, or the phone can not be locked, or a second SIM card can be obtained, then this second factor is compromised. So if it is used to make a payment then this payment will go through, and if it is used to take over the account then the account will be taken over.
Like email, phones make bad second factors. SMS’ can be very easily compromised, and — depending on the platform — even dedicated second-factor-apps can often be hacked when the phone is physically accessible.
An attacker has physical access to the provided 2FA device
The two-factor authentication device (including one-time-pads) provided by the company is in physical possession of a third party. A sub-scenario consideration is whether it is operational (eg a card-reader might need a card and the associated PIN to be operational).
On the assumption that customers keep this device reasonably safe, loss of this device is less likely than loss of a phone. An interesting aside here is that more widespread usage of a device actually might make it less secure: if it is meant to be used at every login then the customer might carry it with him all the time, whilst if it is only to add new payees, and/or make significant payments it might be kept in the safe.
For many devices (eg RSA tokens, one-time pads) physical access alone make it usable (one-time pads can even be replicated with a photocopier). For others, ‘something-you-know’ is required to unlock them (eg chip&PIN card based challenge & response devices).
Especially the latter devices provide a significant additional layer of security, which might be well worth the extra effort they introduce for the customer.
An attacker has access to a customer’s key biographical data
An attacker has comprehensive access to data relevant to the customer (birth day, social security number, address, mother’s maiden name, first school etc). A sub-scenario consideration is that passwords the customer uses on other services are known.
That threat is essentially about call center procedures, and other procedures to be followed when a customer has lost access, eg because he forgot his credentials, or because they have been compromised. Those are often the weakest part of the system: as most computers don’t take lightly recovery procedures that involve a lot of effort and time (eg, go to your nearest branch with your passport), and as secure procedure tend to be costly on the top of it, resetting credentials nowadays is often reasonably easy when only a few biographic details are known.
Further discussion of this issue without looking at specific procedures is not particularly useful, but it is clear that this is the Achilles heel of many banking operations, and an area where progress is direly needed.
An attacker has multi-factor access
A combination of all relevant ‘an attacker has access’ scenarios, including those mentioned above.
That is an extension of the points above — especially for account recovery often a number of different factors are needed, eg some biographical data, some account data (‘last two transactions’), or factor access (phone, email). Whilst every single one might not be an issue, a combination of those might be. For example, any procedure that relies on phone and email is probably not much more secure than one that relies on phone alone, as there is a decent chance that access to the phone might allow to take over the email address. Similarly, if access to the email allows to reset the phone number in the banking system it has not much value as a second factor.
It is useful to draw up a matrix account actions vs factors needed and to analyse carefully whether key actions (transfers, changing account credentials) are really adequately protected, taken all possible factor interdependence into account.
To conclude, this article was a quick tour-de-force on the issue of computer safety for regulated banks, and what best practices companies and regulators should develop to ensure mishaps like the N26 thing do not happen.
Given the size of the subject and the medium used (pun intended) there are probably more questions open than have been put to rest. I hope however that it is a good basis to develop robust procedures that’ll allow to design proportionate measures that keep a good balance between not stifling innovation and leaving customers — or the banking system — at risk.