Automated testing of AJAX content from 3-rd party

Usually when you are creating automation E2E suite with Selenium or just some REST testing you have access to developers to try and simplify your work. You can negotiate to add special ids to elements in the DOM so you don’t end up with flaky e2e failing every time when element was moved; you can add endpoint to the api to simplify authorization for tests or environment creation.

But when the service you develop suite for is dependent from 3-rd party, when content of pages is filled through sources outside of accessible scope, you don’t have such luxury like I mentioned above.

If you’re lucky 3-rd party won’t block your requests and will show content for your Selenium tests. But some of us are not lucky.

I have took a part in developing a suite for a service which simplifies PPC campaign advertisement placement. Though all pages and endpoints I’ve tried to test were owned by our team, big part of relevant content was supplied by a 3-rd party.

While working on the suite I ran into such problems as:

  • need to log AJAX request data in different browsers in E2E tests
  • need to recognize if content was loaded correctly and was shown

Log AJAX requests while browsing with Selenium

If you try to think about, in perfect conditions such problem could be classified as trivial. You just need to connect your Driver to a proxy and let it log everything you need.

But in reality 3-rd party service will probably stop responding to you if could notice even the smallest sign of proxy. Such 3-rd party i’ve ran into.

We tried to proxy traffic via Charles (at least to get some results) and Browsermob (as it is very simple to connect it to Selenium) but failed with both of them.

The problem is, Browsermob can send your requests further but will add some headers to requests which will allow to identify you as a proxy user. In particular, I’ve seen headers ‘HTTP_VIA’ and modified ‘HTTP_X_FORWARDED_FOR’ and that was enough.

You can try to modify those headers in ‘proxy_to_server’ method of Browsermob but it will add up its headers after yours. And while ‘HTTP_VIA’ can be removed, ‘X_FORWARDED_FOR’ will be added back even if you stated to remove it.

So to solve this problem I’ve used proxy named ‘mitmproxy’. It’s an open-source python project which can forward your requests and allow you to see logs of it.

It has wonderful console debugger and python api. For our needs it was necessary to save those logs somewhere else so we had proxy dumping data while working into ‘flow’ file which is later transformed into HAR.

To do so you basically need 2 console commands:

mitmproxy -w dump
mitmproxy -nr dump -s “ result.har”

After that you can work with HAR archive in your code and assert all you want.

Notice script. It’s the script which will transform your flow into HAR. It can be found in repository of the proxy.

Recognize advertisement

So now we come to part where it was needed to assert visibility and contents of 3-rd party content. There is a lot of methods to do it in normal conditions such as get element in DOM and look for attributes which would indicate correct behavior but in case of 3-rd party content may come to you in very different forms. Sometimes it’s iframe, sometimes it’s plain text and then followed by image.

How to recognize that content is the content and not some error when you don’t know how all possible errors could look like? Use computer vision. It’s 2017 and Internet is full of recognition tools. Some of them even allow you to operate via HTTP — send them a pic and they will tell you what’s on it.

I quite like Google Cloud for such tasks. They allow you to use their Vision API with simple request and all you need to is take screenshot with Selenium, use your favorite HTTP library, send the screenshot with it and get category list out of response. With that list you can assert and be sure what was on the page when test ran.

Though note that such services won’t tell you with 100% probability when content is the content. For example, it can tell you screenshot have advertisement from time to time but to get results be as reliable as possible look at category list you get and analyze it to cover all positive cases.

So in the end testing suite became dependent from proxy and recognition service other than Selenium (Selenide), browser drivers, Http and Json libraries but now I can say that it will assure all content is shown correctly and without errors for sure.