Phishing in Context

Epistemology of the screen

Henry Story
Cyber Security Southampton
8 min readJul 15, 2018

--

We learn very young to distinguish between what someone says and what we believe. We distinguish between «Donald says “I am a great salesman”» and «Donald is a great salesman», even though if the second sentence is true, then Donald would be saying something true in the first. However, it is quite possible for the second one to be true, and Donald to have never uttered those words. And vice versa it is possible for him to utter “I am a great salesman” and for it not to be true. Thus we keep track of why we believe things. Was it my friend who sang the praises of the salesman at the door, or was it the salesman who told me that my friend had done so? Or did he perhaps prove this by showing me a document from the Trump University? This is known as keeping track of context. Who said what? What evidence did they show to support their claim? Perhaps he does not need to: if I buy his line then is he not a great salesman? If I am a Phish and I bite, does that make the Phisher great?

The URL bar in the browser frame tells us where we got the information from

On the web, we can find the same distinction. Our browser frame has two parts: the lower content generated from the URL shown in red in the black bar above. The URL gives us the context of the information. In green we get the verified company name responsible for the server, and that it is a US Company. However, what if the whole content including the URL bar had been generated by someone else, a technically savvy Phisher? That is what in 2002 ZE Ye and S Smith demonstrated at USENIX and which they described in their paper “Trusted paths for Browsers”.

At the time they were able to develop a web page that could remove all the real browser bars with JavaScript calls, and show only the fake ones. The resulting page would have been very convincing. (You can check out how this looks in your browser here). Having got someone to click on a link that brought them to a more revealing proxied page, the Phisher would have a complete view of what the user saw, what he was typing, and would also be able to change any element programmatically if needed. As a result, the user would be unable to distinguish the content and the context and would have been in the same position as the person confronted by a good salesman mastering the art of convincing people of their skills. It is as if the salesman were able to make it appear as if every sentence such as “Constable Smith was very happy with my work” really came from Smith. By adding some urgency in the deal, a signature could soon follow.

The precise nature of the error can be made clear using a doxastic modal logic with a says relation relating an agent to a proposition (developed in detail by Martin Abadi). The error consists in concluding from X says { S says P } that S says P, where the { } brackets delimit a context as in N3. One layer of context has been unwrapped automatically for us. So long as what S says is not entirely unbelievable, suspicions may not be aroused until it is too late.

What is interesting in this case is that the proposition is a visual one. In logic, we usually take the semantics of a string based statements to be the set of states or worlds (often called the proposition) in which that statement is true. However, the statements need not just be textual ones; they can be graphical objects like browser windows. These have a basic visual syntax that is cut into two parts with the top part in our browser consisting of the metadata referring to the agent S and the bottom part the content P, and the relation being something like says. We need not get hung up on the exact name of the relation here.

Now since it is possible to install new applications on any useful Operating System, it is always possible to get an application to look exactly like a browser and so run the same attack as the one demonstrated in 2002. If in our logical formalism we now take the application A into account we can see that in that situation the error would consists in getting us to conclude from A says { S says P } , to B says { S says P } where A is emphatically not well known Browser B. Given that we Trust B we would correctly conclude that S says P were B to have said it. But A did and so we were mislead.

Since software can do pretty much what it wants and has access to the internet, it is always possible to have software change its look and feel once running on the user’s computer. There may be some aspects of the UI that the program cannot affect, such as the icon, but this may in many circumstances be hard to spot. The problem is that once an application takes over a window or even a whole screen, it can decide most of the UI features, and make itself look like any other application, thereby misleading the user into making a mistaken attribution.

In a 2015 paper entitled “What the App is That? Deception and Countermeasures in the Android User Interface”, a team of researchers looked into what they called GUI confusion attacks on Android, that is ways Apps can mislead people into mistaking one app for another. They write:

What compromises user security (and we consider the root cause of our attacks) is that there is simply no way for the user to know with which application she is actually interacting

What is needed therefore is part of a screen that cannot be altered by an application, and is recogniseable by the user for this. If only one screen is available then it has to be divided into two, one of which is always under the control of the OS.

For this they inspired themselves of the UI design of web browsers and added a similar feature to their altered version of Android as shown on the picture to the left. By adding a clear space in the UI controlled by the OS they could place the official name of the author of the App. The paper contains the result of their psychological studies that shows that this was very effective in helping people spot such malicious apps.

On desktop or laptop screens it often happens that people want to give the whole screen over to an application for viewing videos for example. In that case one would need two screens: the main one that can be entirely controlled by the application and the other much more strictly by the OS. As it happens Apple’s Mac Book Pro now comes with a second screen appearing above the keyboard and known as the Touch Bar.

It looks like this is configurable by the application, but it could be designed by Apple in such a way that part of it, say the program application icon, remained always under the control of the OS. That button would be a hardwired ‘About’ button that would show the official icon as registered on some equivalent of a companyhouse.gov.uk, and on being clicked open on the main screen full details of the maker of the software, as given by a future institutional web of trust and detailed in a previous post on “Stopping (https) Phishing”. This official description of the company could be shown to the user on first opening the application and any time there is a major change of ownership or legal problem at the company producing the app. In that case, the Touch Bar could turn into a News Ticker for official information about that company, scrolling some important changes the user should know about.

For web browsers, there could then be a second button showing the official icon from the institutional web of trust of the website the user was looking at, and perhaps next to it the full URL of that page or at the very least the fully spelt out domain. When clicked this would open a page giving all the information about the owners of that website from the institutional web of trust, the same way as was done for the application. If this information is compelling enough people would find it natural to check it out from time to time.

This would then give us a clean and secure way for the computer to let us know that App says { S says MainScreenContent } in a way that could not be spoofed, using information from an institutional web of trust, tied to a web of nations.

For personal web sites that may not be part of an institutional web of trust, there should be a button that shows the user through what path of links he reached that page, integrating institutional web of trust information when it exists. This would go beyond the current tab history, to show also other pages that are linked to from that site that the user may have visited, so that he can get an information-geographical view of the site reached that is enriched over time. After all, until now the web has worked because people have linked to pages that they one way or another know to be serious or trustworthy, and this peer to peer relation of links created the web. Mining this information is what made Google famous with its Page Rank Algorithm. We don’t want to loose that richness of peer to peer trust, but strengthen it by integrating the legal system into the web and force phishers to get a few Phds to be effective, at which point they will find that there are a lot more interesting things to do in life.

For more see the evolving discussion on Twitter:

--

--

Henry Story
Cyber Security Southampton

is writing his PhD on http://co-operating.systems/ . A Social Web Architect, he develops in Scala ideas guided by Philosophy, and a little Category Theory.