Open Banking & Screen Scraping

Legislating for APIs is easier said than done!

8 min readMay 9, 2017

There is a legislative push towards Open Banking here in Europe. PSD2 comes into law on the 13th January 2018 and it defines a new type of regulated activity — Account Information Services (AIS).

These services have existed for several years and include accounting software that automatically imports your bank statements, personal finance dashboards that aggregate all your accounts into one place, etc. Many of them have been operating in a legal grey area due to the way they retrieve customer’s data: “screen scraping”.

Screen scraping banking data usually involves collecting a user’s banking credentials and then using those credentials to login and retrieve data from a bank’s customer-facing website (or the API powering the bank’s mobile app). Some banks explicitly prohibit sharing credentials with such services and some are purposely unclear in their Terms & Conditions. Most banks implicitly allow screen scraping as they choose not to block access.

The way that screen scraping is implemented at most fintechs is safe for the end-user, but it is not a long term solution.

PSD2 and the EBA’s regulatory technical standards aim to stop screen scraping and mandate banks (ASPSPs) to provide stable documented interfaces for third party firms to use. This should be good news for Fintechs, but the devil is in the detail. Recently a group of fintechs have published a “manifesto” where they worry that:

…if some of the proposed standards are adopted, specifically those in relation to how fintechs communicate with banks on behalf of the consumer, they will have a … critical negative impact on the future trajectory of innovation in Europe.

The key proposition of the manifesto is:
screen scraping is secure,
banks are bad,
PSD2 interfaces are theoretical,
so let us carry on screen scraping.

While I agree with some of the points in the manifesto I feel there are some fundamental issues have been left unaddressed. Legislating for APIs was always going to be difficult — I’m unaware of anything quite like PSD2 where private companies are required by legislation to provide APIs. However I don’t buy the argument that it will completely fail because banks are obstructive. Maybe I’m a naive optimist :-), but hear me out and please feel free to disagree in the comments! (I’ll mainly be looking at this from the viewpoint of account information rather than payment initiation, however many of the points apply to both.)

Its worth pointing out that in an ideal world this whole problem would be solved by the market. Banks would develop APIs because their customers wanted them and a common standard would emerge based on real-world implementations. But we’re not in an ideal world, banks are dragging their feet and so the regulators are forcing them to open up access. With that in mind lets look at the 3 (often conflated) issues with screen scraping or “direct access”:

1. Accessing data via an undocumented interface

Most fintechs currently either “scrape” the HTML from online banking interfaces or interact with the private APIs that banks use to power their online banking interfaces or mobile apps. While this may work and has allowed innovation, it is a workaround. No developer anywhere wants to work with an undocumented API — we only do it because there is no alternative. It often results in a bad user experience (especially when scraping HTML) and is expensive for fintechs to maintain.

2. Impersonation

In most current implementations the bank doesn’t authoritatively know whether they are interacting with the user or software acting on behalf of the user.

3. Collecting (and often storing) a user’s banking credentials

I think this is the main issue with screen scraping. While it can be done securely it isn’t a good long term solution.

It doesn’t allow users to grant fine-grained access
If a user wants to revoke access to a single service they must revoke access to all (by changing their password)
It goes against the usual security advice of not giving out your passwords

So with this split in mind lets take a look at some specific claims in the European Fintech Manifesto:

Direct Access is secure

Direct Access is a secure technology that has been used for the last 15 years by both European Fintechs and Banks to provide AIS and PIS services to millions of consumers… there hasn’t been, until this day, one single documented incident of data fraud or compromise of personal credentials.

This is a simplification.

Accessing an undocumented interface protected by TLS is “secure”.
Storing non-hashed passwords (common in a lot of screen-scraping) isn’t ideal. While these passwords should be encrypted and while to date there hasn’t been a breach; it would be hard to suggest this as a long term solution.

Secure Identification can take place with Direct Access

PSD2 requires that TPPs identify themselves vis-a-vis Banks (ASPSPs) when providing their services. By using the exact same identification mechanism, as the one requested for the dedicated interface in the RTS, Direct Access becomes Secure Authenticated Direct Access

This addresses the “impersonation” issue with screen scraping. I presume they are talking about the use of eIDAS certificates for identification which could be layered on top of screen scraping. We are working through a similar solution with FDATA which would allow the bank to know which fintech it was interacting with. However this only solves 1 out of the 3 issues with screen scraping.

Dedicated interfaces are theoretical

Moreover, well-functioning and always up-to-date dedicated interfaces only exist as a hypothesis so far. They have not yet been developed or tested, and they have yet to become a reality.

This detracts from the argument as it is demonstrably false. Banks such as BBVA and Česká spořitelna have been providing APIs for quite some time. Furthermore API’s and third party delegated access to data is a solved problem not a “non-proven technology”.

Screen scraping works well on other industries, e.g. online travel

Direct Access (using screen scraping) is a well-established technology that has been used and leveraged by other industries, of which we would highlight the following: Online travel…

In the travel industry screen scraping works to aggregate non-user data, i.e. flight prices. This cannot be compared to accessing private user data. The other 2 examples given are from banking not other industries which makes the argument even weaker. I’d rather as an industry we were honest and said: “we’re screen scraping not because it’s good technology, but because it’s currently our only option”.

Banks can choose whether or not to use a dedicated interface, we should be able to as well

Therefore, the only way to ensure that Banks (ASPSPs) have the right incentives to provide and maintain a well-functioning dedicated interface and that competition and innovation continue to grow, is to make them optional. It will be only through real and direct competition that we will be able to ensure that PSD2 objectives are achieved.

This is the crux of the manifesto and I’m tempted to agree with this statement. But I think there needs to be a discussion of potential unintended consequences, e.g.

If a bank has to provide access via screen scraping then what is the incentive for them to develop an API?
Would the bank need to document and provide notification of changes for both interfaces or just the dedicated interface?
Wouldn’t the consumer experience be even more complicated? With the API based access it is envisaged that consumers will have a page within their online banking where they can see which AISPs they have given access and can revoke when they want.

It’s the consumers data, so they can do what they want with it #GDPR

The PSD2 RTS however restricts the consumer right to use software to access and share his/her own data ex ante and as such violates not only the spirit and wording of PSD2, but also the fundamental data ownership principles in the GDPR and grants Banks the possibility to monopolise the consumers’ data.

I think that the issue is consumer protection and liability. PSD2 creates new regulated entities and a new liability framework. In the current unregulated market consumers can do what they want but there is no specific redress if something goes wrong when using account information services. As part of creating this regulated market, there unfortunately has to be regulation.

ASPSPs face regulation:

they can no longer tell their customers not to use AISPs,
they MUST provide access to EVERY registered AISP
they can only block an AISP with good reason and must immediately notify the FCA (or competent authority)
they have to provide and support a documented interface
they have to apply strong customer authentication

AISPs face regulation:

they need to use the documented interface that the ASPSP provides
they can only request data 4 times every 24 hours
they must gain explicit consent from the consumer

By requiring ASPSPs to provide access, the legislation has had to put limits and some definition on that access. The regulators would argue that these limits are there to safeguard the consumer not to reduce competition.

I argue that the most contentious issue in the RTS is not the interface but rather “Strong Customer Authentication” (SCA).

Strong Customer Authentication & Direct Access & Redirects

This wasn’t discussed in the manifesto, but I believe it is a major issue. The liability framework that PSD2 creates, depends on the ASPSP performing strong customer authentication.

As there will be no contract between the ASPSP and the AISP, I would argue that the only way that an ASPSP can be sure that SCA has been performed is for them to interact directly with the end-user, i.e. the AISP redirects the user to the ASPSP.

SCA requires 2 out of 3 of the following elements: knowledge, inherence & possession. If an ASPSP is confirming any these elements indirectly (i.e. with an AISP in the middle) then they have a lower degree of confidence that it is the end-user inputting the data. For example an AISP could save a users password and any time the ASPSP wants to perform SCA they could input the password on behalf of the user, and the user would then only need to provide 1 element rather than 2.

The ASPSP, AISP, PSU triangle is a classic 3 party auth problem that has been solved by standards such as OAuth & SAML. These standards allow a decent user experience for the PSU while having a clear separation of concerns (and liability) between the ASPSP and the AISP.

The Wrong Message

I believe the manifesto sends the wrong message. Rather than saying “let us keep screen scraping because its secure” we should be saying “we welcome the move to APIs but want to ensure a graceful transition away from screen scraping”.

We should be arguing for modern APIs protected by modern authorisation frameworks and not be fighting for the status quo.

(*) I’ve been thinking about banking and APIs for a while now. I took part in the UK Open Banking Working Group, I’m a contributor to the OpenID Foundation’s Financial API Working Group and the technical standards being developed by the UK Open Banking Implementation Entity. I am heavily involved with FDATA through which I have had significant interaction with the FCA, HMT, CMA and EBA.