If Appium had a Brain

Appium has a brain today — yours. What if these test frameworks had a brain of their own? What if you could ask that little brain to ‘click the search button’ for you. What if you could ask the brain to ‘verify this is the login page’? Oh, and while the brain is doing that go ahead and enter an email and password combination whenever you see the login page. What if the brain was able to do all of this out of the box, and worked on applications it has never seen before? What if Appium had a brain?

Appium and Selenium are the great software test frameworks of the day. They do all the hard work of communicating between the test case logic, and the world’s devices, browsers, and apps. However, they still require humans to hand-craft the method of finding all those elements, because of course, these elements are unique to each application. Test case automation built on these frameworks are dependent on both the cleverness of the test developer and product developer to make sure these elements can be found. These tests need even more cleverness to make sure they can still find the elements if they change slightly in a newer build. Using human brains to define these element selectors is a frustrating, and expensive aspect of writing test cases. What if we gave Appium a brain so, just a like a human, it could find these elements all on its own?

What would such a brain look like? How would it be trained? The brain would need to have ‘eyes’ to see the screen. Having eyes sounds like a simple screenshot, and Appium and can already do that. The brain would also have to be taught what a sign-in button typically looks like. That sounds a lot like how machine learning can be trained to recognize dogs and cats in pictures — but instead millions of pictures of cats and dogs, it needs pictures of buttons and text boxes. So, with a large number of element examples, a brain could be built to find these common elements in applications. A brain such as this could see pictures of the running application and find the elements the humans asked it to find. Seems like a pretty straightforward way to give Appium and a brain.

What if the brain was trained on a large number of common labels? Well, the brain could find the most exciting elements in any application.

How would the test author (Human, for now), ask such a brain to find something? Appium and already has an API for finding elements in the application. Today, APIs need to be told the IDs, CSS selectors, or XPaths for the elements (see http://appium.io/docs/en/writing-running-appium/finding-elements/). What if there was simpler a way to add a new AI-based locator strategy to the existing ‘findElement()’ function? Without such a brain, humans have to figure out those CSS selectors, or XPaths, for example:

driver.FindElement(By.XPath(“//*[@id=’search’]/ul/li[3]/h5”));

If Appium had a brain, perhaps the API calls could be:

driver.FindElement(By.Label(“search_box”));

How would a human test automation engineer’s work change with the addition of brains to Appium? Humans would be happy to be freed from the tedious work of finding all those element IDs, CSS, Selectors. These hypothetical future-humans could also stop arguing with developers to add all of those magic IDs to the app. They could also spend less time debugging their test code when these IDs, CSS Selectors, and XPaths in the product change. Humans would likely be a bit happier and have more time for test and code design.

What if this new brain was a bit slower than the hard-coded approach to finding elements? Test engineers would quickly learn to trade a little speed during execution for faster test authoring, and less maintenance. What if this new brain was faster?

How would such a brain be added to Appium? There are several straightforward approaches, from less invasive, to total integration. On the lighter side, a quick additional import to existing test code could overload the existing Appium and API bindings to add these new AI-based features. You could also imagine all this just being built directly into Appium. Much of Appium’s value lies in that that they are open source projects, so it is likely that the training data and training systems for such a brain would also be open sourced. If the data and training were open sourced — anyone would improve upon them. Anyone could add new elements for the AI to train on and recognize. Even Kaggle (https://www.kaggle.com/) competitions could be held in the community to build better brains in online competitions. You could imagine much of the work of software test automation becoming an open, reusable, collaborative exercise in Machine Learning.

What if every Appium and test engineer could leverage machine learning in their projects with a more straightforward API call? Would Appium engineers now be ‘machine learning engineers’ and get a corresponding pay increase? :)

How would test frameworks other than Appium and get a brain too? They’d likely get similar brain functionality with a similar extension of their element finding methods. What if every test framework had a brain? The same brain?

How soon could Appium have a brain? Perhaps an existing team working on such a brain could open source reference implementation. Perhaps a little work could happen to extend the Appium and frameworks to allow for pluggable brains. It seems it’s is just a matter of time.

What if Appium had a brain?

— Jason Arbon, CEO at test.ai
Bonus: https://goo.gl/N9Pdcm