Appium with Image Recognition

Szymon Kazmierczak
3 min readFeb 8, 2016

--

It’s time for another Appium guide — this time it’s a simple implementation of image recognition to enable finding elements with images.

Code can be found in my GitHub repository -> https://github.com/Simon-Kaz/AppiumFindByImage

‘Neko Atsume’ Viewed in Appium GUI

In certain apps and majority of games, you will not be able to access the elements on the screen. When you load the app, and open the Appium GUI, you will be presented with a single Android View, rather than a layout with elements — see here for an example.

To work around this problem, I decided to implement OpenCV image recognition to enable finding elements on the screen using screenshots. To achieve that, I used SikuliX API.

Due to the nature of SikuliX API, the flow isn’t as straightforward as one would expect. You can’t do a direct comparison against the device, you need to run a remote comparison against a screenshot of the device.

The flow in my code is as follows:

  1. Take a screenshot of the device
  2. Compare the image of the element you want to find to the screenshot of the device
  3. If match is found, return coordinates of the centre of the element
  4. Use those coordinates to tap on the screen

The main methods in the OCR class:

  1. clickByImage — Main method you should be using. It allows you to find and tap on the element on the screen by passing in the path to the screenshot of the element. It aggregates all the convenience method into a single, easy to use method.
  2. takeScreenshot — convenience method that takes a screenshot and returns a BufferedImage for further processing.
  3. getCoords — requires screenshot as bufferedImage and the path to the image of an element we’re looking for. If match is found, the coordinates are returned in a Point2D object.
  4. elementExists — returns true if element is found on the screen.
  5. waitUntilImageExists — Explicit wait using Image Recognition. Waits for specified duration until a match for the specified image is found.

Usage

I have created some simple tests to showcase the framework in the OCRTest class.

As a huge fan of explicit waits (as opposed to implicit waits such as Thread.sleep), I created a custom explicit wait waitUntilImageExists that allows you to wait until your expected element is visible before acting on it.

Typical use case would be to:

  1. Wait for an element to be visible on the screen (waitUntilImageExists)
  2. Tap on the element using the clickByImage method.

The image comparison requires similarity value. In my example it is specified in the @Before method.

Settings.MinSimilarity = 0.8;

I found values 0.9 and above to be flaky, requiring a pixel perfect image. 0.8 with a small enough image of a unique element has proven to be the most efficient.

Shortcomings

  • screenshots/images need to be rotated to find the match

It’s possible to handle this in the code (see TestDroid’s approach). I haven’t done this (yet). Feel free to submit a PR :)

  • You need a set of screenshots for each device that you want to run the scripts on.

That’s the nature of testing with images. It’s possible to resize and adjust the images, but you will still need to lower the match % (which in turn will increase the chance of incorrect match). This is the main reason why image testing is not a high priority for me — the maintenance of the screenshots grows exponentially with each device/resolution added.

  • Even the smallest change to the UI will cause the image to be invalid

We’re using images of the UI to find elements on the screen. Any change, be it size, color, text displayed etc. will require us to update the screenshots. Note that this is why in regular apps the use of static, unique identifiers is highly recommended as the go to selector strategy.

Improvements (to be made):

  • make waitUntilImageExists return the coordinates so that you can chain the commands or even wrap both wait and click in a single convenience method.
  • Use SikuliX server to speed up image processing.

Feel free to suggest changes/improvements, be it via pull requests or comments.

--

--