Stopping (https) phishing

a socio/technical answer

Henry Story
Cyber Security Southampton
12 min readJun 30, 2018

--

As reported by the Anti-Phishing Working Group’s (@APWG) in their report released May 15th 2018, there was a 6 fold increase of phishing sites hosted on https that year. I explain here what that means, why it is happening and what is needed to stem the efficacy of these attacks.

Graph taken from the APWG Trends report Q4 2017

This came to my attention listening to the first episode of the Hacking Humans podcast by The Cyberwire. Indeed they start their show with it. It is worth listening to in addition to reading this, if only because I find that one sometimes gets a much better feel for the importance of a problem when hearing experts discuss it with passion.

The report goes on to describe an informal study PhishLabs made using Twitter, asking what people thought the green lock on an https connection meant. Close to 80 per cent of the people seemed to think that the icon means the site is legitimate and/or safe. A grave error.

I hasten to point out that this misunderstanding is not just due to a lack of education of the public, but arises in part because the browser user interfaces are misleading here. But neither does the fault just lie in the design of those bars, but it is also and mostly caused by a a crucial piece of security architecture missing from the web that needs to be there for the browser bar to be able to show the message they want to show.

The best way to grasp the problem is to consider this snapshot of a site with a green padlock. Everything looks fine and inviting. Facebook is a well-known company and this login page is especially widely known. They might expect to see this when travelling and trying to check some mail in a cyber cafe. However to see what is wrong you need to look a lot closer at the URL in the address bar (click on the image to zoom). This will reveal that the domain is not as expected www.facebook.com but rather www.facebook.com.infoknown.com . So the green lock is to tell the user that he correctly connected to the server at infoknown.com. The scam relies on the fact that many users will be mislead into entering their FB username and password in the form, which can then be collected by the owners of that domain. If the scammers don’t want to attract attention, the site will be programmed to immediately redirect the user to the real Facebook web site, where he can continue as desired. Having collected the password, they can now use that social network account to spread fake links to the friends of the victim, and more importantly, since many web services allow their customers to use a Facebook login to authenticate, the phishers can also now get access to those sites too.

We have a green padlock, but the website is neither safe nor legitimate. That is because the padlock can only tell us that we are connected to the site with the given domain. For a much more complete list of such scams with pictures see the July 2017 article by Patrik NoheThe Browsers Need to Stop Helping Hackers Phish”. There he argues one needs to completely get rid of the “Secure” words and the green lock, a suggestion that Google Chrome will follow in it’s 69 Release due September 2018.

Consider now a legitimate site beta.companieshouse.gov.uk. Here the lock icon in the URL bar of Chrome does not tell us anything more than it did in the illegitimate case: It tells us precisely that the website with the given domain name we are connected to has the private key of the public key shipped in the certificate by companieshouse.gov.uk, and that some trusted Authority has certified this. We know we are connected to the site referred to by the domain name. We trust that site if we are English because we know the convention that domains ending in .gov.uk are controlled by the UK government, just as those ending in .gov are controlled by the US government (a naming scheme revealing of the global ambitions of one of the actors). This naming convention has to be learned for it to work. Humans have to be taught this. Now it is somewhat reasonable to expect UK citizens to know about *.gov.uk. However it is not reasonable to expect them to know about all the other conventions.

Crossing cultural boundaries helps to reveal assumptions we take for granted. Consider www.unternehmensregister.de. The Web site there is written in German. On clicking on the “Secure” padlock, we can open a window that shows us the contents of the Certificate sent to us by the server. We find there the public key used by the web server, and we find in the Subject Alternative Name field the name of the domains that are certified. Also, we see that this has been signed by Digicert Global, whose public key comes with our browser. However many questions remain: Is this an official site, is this a private company, is this a fake site? If we look at the certificate there is no statement as to how trustworthy it may be.

If we use the Google Translator built into Google Chrome we see that unternehmensregister.de has a search box that can be used as explained in the text translated here:

Here you can search for free of charge and without registration for all important data on companies to be published and have access to the electronic trade, cooperative and partnership register.

So it seems to claim that it is something like CompaniesHouse… But could it not have scraped some real official site, republish the data waiting for the right moment to change information about a few key companies to set off a bear run on the stock market? How could we know? How would we know if this is different from infoknown.com? Especially as there is so much more money in faking company information than in stealing Facebook passwords.

Scepticism is raising its head here, once more, emerging from the graves to taint all reality with a ghostly spell. Where we thought we were treading firmly, we find nothing below our feet. A voice tries to comfort us that we do know something: the name of the web site we are on. Great! But such a small relief… We have no clear way of finding the relation of that website to institutions that we could call on to help us. People do, it seems, rely on this unstated information in their local dealings. But they would be foolish to generalise from this local knowledge to any global one: on the Web every site is potentially just a few clicks away.

What we need is a way to allow legitimate websites to be anchored in an institutional web of trust, linked to a web of nations spanning the globe, and for this data to be used by web browsers to inform us about what type of website we are looking at.

An institutional web of trust in which web sites can hook themselves to give their online presence some legitimacy.

To see this let us look at how a future improved browser would go about using such an institutional web of trust to improve the interface of the URL bar with extra verifiable legal information.

Let us start with a little-known company co-operating.systems, we may come across by following links, such as the one placed temptingly in this sentence. This website has an https URL generated by the free and convenient certificate authority letsencrypt.org that automates the verification that the server is at the IP address location specified in the DNS registry. The certificates given out are only valid for 3 months, but a handy certbot script provided by Electronic Frontier Foundation can fully automate the creation and update of the certificate. The green padlock tells us that we are on the site named by the domain name of the URL.

This is useful, but still very unsatisfactory. How can we tie co-operating.systems into the institutional web of trust pictured above? Well, one could place a link from the server to the companieshouse.gov.uk page describing that company. This could be placed in the X509 certificate served by co-operating.systems, but would require getting an OID 2.5.4 registry element accepted(*) and used by the Certificate Authorities, which may take a lot of time. Instead we can add a RFC 8288 LINKheader with the same name, whose value is the URL of the official page, and that would be sent along with every page that the site deems official.

$ curl -I https://co-operating.systems/
HTTP/1.1 200 OK
Date: Mon, 02 Jul 2018 07:45:18 GMT
Server: Apache/2.4.25 (Debian)
Last-Modified: Mon, 02 Jul 2018 01:29:11 GMT
ETag: "22fb-56ffa1fec7282"
Accept-Ranges: bytes
Content-Length: 8955
Vary: Accept-Encoding
Link: <https://api.companieshouse.gov.uk/company/09920845>; rel="registry"
Content-Type: text/html

The browser would then notice this LINK header and asynchronously fetch that document. The json(-ld) result would only need to be enriched with a link back to the domain(s) owned by the company as shown below. Pay attention especially to the domain attribute that I had to add to the result.

$ curl -u token https://api.companieshouse.gov.uk/company/09920845
{
"registered_office_address": {
"locality": "London",
"address_line_1": "2, Harlequin Court",
"address_line_2": "6 Thomas More Street",
"country": "United Kingdom",
"postal_code": "E1W 1AR"
},
"undeliverable_registered_office_address": false,
"has_insolvency_history": false,
"company_number": "09920845",
"jurisdiction": "england-wales",
"company_status": "active",
"has_charges": false,
"type": "ltd",
"company_name": "CO-OPERATING SYSTEMS LTD.",
"date_of_creation": "2015-12-17",
"domain": ["co-operating.systems","www.co-operating.systems"],
"accounts": {
"next_due": "2018-09-30",
"accounting_reference_date": {
"day": "31",
"month": "12"
},
"last_accounts": {
"period_start_on": "2015-12-17",
"made_up_to": "2016-12-31",
"period_end_on": "2016-12-31",
"type": "dormant"
},
...,
}

(The -u tokenis required at present by CompaniesHouse to access that page — which is odd, given that the information is openly available in human readable form at the parallel beta.companieshouse.gov.uk page. However it is straightforward to register and get a token.)

After this request, the web browser would have verified the first link in the chain shown in the diagram as arrows 1a and 1b.

But why would the browser trust api.companieshouse.gov.uk? After all that could also be a fake website. Or perhaps it once was the right place to look things up, but later the hostname was changed — as it is likely that it will be — and the data is still hanging around there because someone forgot to turn off the machine. We don’t want these servers hardcoded in the browser. The way to solve this intelligently is to use the same technique and have api.companieshouse.gov.uk point in a Registry header or in the content to the root of UK trust which would be gov.uk. That would in turn link to the registry root domain by specifying that CompaniesHouse was the official source describing companies for the UK. Developing the right high-level ontologies for this would, of course, require a W3C Working Group with technical representatives from the nations involved in setting this standard. With that standardised the browser could verify the second link 2a and 2b in the trust chain from co-operating.systems. For a UK citizen’s browser where gov.uk has been set as the root trust anchor, the verification would stop there.

But what about a browser owned by a German, Japanese, Russian, US, Chinese, … citizen? Why would they trust gov.uk to state what is the case about a random company? If that sounds implausible, think of it the other way around: why would a UK citizen trust the statement of one of these other countries root authorities? Indeed, how would the browser actually know that gov.uk is a root authority, and not just a fake website? Here we continue the process but in a peer to peer mode. We need the states involved to create a web of nations, where each having described itself, links to those it trusts to keep such information up to date. Links need not go both ways, nor be complete, and indeed at the beginning, they won’t. This part is illustrated in the diagram by the link formed by the two arrows 3a and 3b.

If the browser can verify such as chain of trust from the website to the user’s trust anchor, then it could notify the latter by showing a green padlock which when clicked could present a lot more information about the company than what we have seen in the certificates published currently. It could show whatever the institution wanted or was legally obliged to make public. It could show on a globe the location where the company was operating from. It could show the legal space if was liable under, the owners and founders of the company, it’s age, when it was bought, and much more. More immediately visible the favicon could instead of being self-asserted as it is now — allowing infoknown.com to serve Facebook’s icon — be instead published by CompaniesHouse or equivalent organisation, and added to the certificate, so that one could have official icons for each web site. This would make countries responsible for the uniqueness of the logo used. This enriched and legally backed live information set would allow trust to flow back into the web, and make all kinds of phishing attacks a lot easier to detect.

Picture of a web site owning an EV Certificate

This proposal should make it possible to automate Extended Validation (EV) Certificate creation, which certify that the web site is owned by a particular legal entity in a given country and what their address is. Automating the creation of this would help reduce their cost — which currently lies at anywhere between $450 to $2000 a year — and so grow their adoption. Furthermore, EV Certificates would no longer need to be limited to the fixed data they contained at creation time, but would be linking to live information about the company such as their stock market valuation, legal problems encountered if any, when they changed ownership, etc… as we saw above by looking at the data currently published by CompaniesHouse.

We now have all that is needed to make a good security UI and the ball would then be in the User Interface Design camp, where a major re-thinking of security UI in browsers is called for. The information about the website cannot be hidden behind one little padlock icon as it is now — which made sense when there was only 2 bits of information to display: domain name and security — but has to be shown in full before a user sees the content, as he waits for the download to complete, especially if he has not visited that web site in the recent past. This is essential in cell phones which have so little real estate, that even URLs are mostly invisible. One would need to show elements such as: a picture showing where on a globe the company is located, the name of the company, and as much important legal details as makes sense. This, perhaps combined with adoption of client certificate authentication, would make phishing next to impossible. The meta-data view should at all times be easily and intuitively accessible, especially perhaps when the user is on a page where he is asked to enter data into a form. For a suggestion as to how this information can be made available in a user friendly way see:

So, the answer to stemming phishing attacks understood as an art of hacking humans, is, I suggest, to make use of human institutions that have been built over centuries and ask them to publish information in a machine readable form about themselves and others so that browsers can signal these relations to the user reliably.

--

--

Henry Story
Cyber Security Southampton

is writing his PhD on http://co-operating.systems/ . A Social Web Architect, he develops in Scala ideas guided by Philosophy, and a little Category Theory.