Selenium AI Automation: Image Processing with Gemini

Vishal Mysore
8 min readApr 20, 2024

--

Tools4AI with Selenium can revolutionize UI validation by automating the process of verifying UI elements and ensuring consistency with design specifications. This approach goes beyond traditional UI validation methods by validating entire UI sections at once instead of examining individual elements. As a result, Tools4AI integration with Selenium can significantly streamline the testing process, allowing you to automate a comprehensive verification of web-based applications.
With this integration, you can leverage a combination of natural language and Java code to create Selenium test scripts in a more human-readable format. This simplifies UI testing and increases efficiency by allowing non-programmers to write test scenarios in plain English.

Selenium Integration with Tools4AI

Tools4AI’s integration with Selenium introduces a flexible way to automate UI testing. Instead of traditional Java code for Selenium scripts, Tools4AI allows you to define test scenarios in plain English, offering a more accessible approach to testing web applications. These English-based commands can be converted into Selenium code to automate web-based interactions and streamline testing.

Example of Selenium Test with Tools4AI

WebDriver driver = new ChromeDriver();
SeleniumProcessor processor = new SeleniumProcessor(driver);// Navigate to the website
processor.processWebAction("go to website https://the-internet.herokuapp.com");// Check if a specific button is present
boolean buttonPresent = processor.trueFalseQuery("do you see Add/Remove Elements?");
if (buttonPresent) {
// Perform a click action
processor.processWebAction("click on Add/Remove Elements");
// Further English-based instructions can be added
}// Check if checkboxes are visible and interact with them
processor.processWebAction("go to website https://the-internet.herokuapp.com");
boolean isCheckboxPresent = processor.trueFalseQuery("do you see Checkboxes?");
if (isCheckboxPresent) {
processor.processWebAction("click on Checkboxes");
processor.processWebAction("select checkbox 1");
}

In this example, the SeleniumProcessor processes commands in plain English and converts them into Selenium actions. This approach allows for complex interactions without manually writing Java code for each test. Tools4AI serves as a bridge between natural language and Selenium, making it easier to automate UI testing in a way that is both efficient and intuitive.

Screen Validations
One of the most powerful and unique features of Tools4AI is its ability to convert an image into a structured Java object. This capability can be invaluable for various applications, especially in scenarios where you need to extract data from an image or a screenshot, then manipulate or analyze it within your Java code.
By transforming an image into a Java object, Tools4AI opens up a wide range of possibilities:

UI Testing and Validation: You can convert screenshots of a user interface into Java objects to validate UI elements and their attributes. This feature simplifies the testing process by allowing you to automate the verification of entire sections or components without manually interacting with individual elements.

Data Extraction: Tools4AI’s image-to-POJO functionality can be used to extract data from images, such as scanned documents, infographics, or screenshots, and convert it into a structured format. This can be useful for creating data-driven applications, automating workflows, or extracting information for further processing.

Simplifying Automated Tests: By converting images into Java objects, you can write automated tests that operate at a higher level of abstraction. Instead of interacting with individual web elements, you can work with an entire data structure that represents the content of a web page or UI screen.

Examples

Example 1

The above picture can be converted to Java object with simple code in this way

GeminiImageActionProcessor processor = new GeminiImageActionProcessor();
Sales sales = (Sales)processor.imageToPojo(GeminiImageExample.class.getClassLoader().getResource("images/sales.PNG"), Sales.class)

The Sales pojo looks like this

@Getter
@Setter
@ToString
@NoArgsConstructor
public class Sales {

@MapKeyType(Integer.class)
@MapValueType(Double.class)
Map<Integer,Double> yearlySales;
}

If you dont want to create a Pojo you can get data in simple HashMap or Json String as well

log.info(processor.imageToJson(GeminiImageExample.class.getClassLoader().
getResource("images/sales.PNG"),"sales in 2013"));

and the response would be

INFO: {"fields":[{"fieldName":"sales in 2013","fieldType":"String","fieldValue":"58"}]}

Or you can get multiple values in this way

log.info(processor.imageToJson(GeminiImageExample.class.getClassLoader().
getResource("images/sales.PNG"),"sales in 2013", "sales in 2015"));

and your output will be

INFO: {
"fields": [
{
"fieldName": "sales in 2013",
"fieldType": "String",
"fieldValue": "58"
},
{
"fieldName": "sales in 2015",
"fieldType": "String",
"fieldValue": "67"
}
]
}

Example 2

We will convert this Pojo into Java code

FoodConsumption foodConsume = (FoodConsumption) processor.
imageToPojo(GeminiImageExample.class.getClassLoader().
getResource("images/PieChart.PNG"), FoodConsumption.class);
log.info(foodConsume.toString());

FoodConsumption Pojo looks like this

@Getter
@Setter
@NoArgsConstructor
@ToString
public class FoodConsumption {
@MapValueType(Double.class)
@MapKeyType(String.class)
private Map<String, Double> foodTypeToPercentage;
}

Output looks like this

INFO: FoodConsumption(foodTypeToPercentage={Rice Dishes=0.3, Leafy 
Greens=0.15, Soups=0.25, Root Vegetables=0.2, Hot Drinks=0.1})

Example 3

log.info(processor.imageToPojo(GeminiImageExample.class.getClassLoader().
getResource("images/FruitsSold.PNG"), WeeklyFruitSales.class).toString());

and our pojo looks like this

@Getter
@Setter
@NoArgsConstructor
@ToString
public class WeeklyFruitSales {
@ListType(DailyFruitSales.class)
private List<DailyFruitSales> dailySales;

// Constructor
public WeeklyFruitSales(List<DailyFruitSales> dailySales) {
this.dailySales = dailySales;
}

// Getter and Setter
public List<DailyFruitSales> getDailySales() {
return dailySales;
}

public void setDailySales(List<DailyFruitSales> dailySales) {
this.dailySales = dailySales;
}
}

and output is

INFO: WeeklyFruitSales(dailySales=[DailyFruitSales(dayOfWeek=Monday, 
fruitSales={Mango=12, Orange=10, Banana=5}), DailyFruitSales
(dayOfWeek=Tuesday, fruitSales={Mango=15, Orange=13, Banana=6}),
DailyFruitSales(dayOfWeek=Wednesday, fruitSales={Mango=7, Orange=9,
Banana=6}), DailyFruitSales(dayOfWeek=Thursday, fruitSales={Mango=6,
Orange=14, Banana=5}), DailyFruitSales(dayOfWeek=Friday, fruitSales
={Mango=19, Orange=17, Banana=8}), DailyFruitSales(dayOfWeek=Saturday,
fruitSales={Mango=19, Orange=21, Banana=10}), DailyFruitSales(dayOfWeek
=Sunday, fruitSales={Mango=15, Orange=21, Banana=9})])

Example 4

Imagine you’re testing an online library system, and you encounter a complex user interface with various elements representing different sections, like books, members, and other UI components. Traditionally, you would inspect each element individually to validate its content and functionality. This involves identifying and interacting with each UI element separately, which can be time-consuming and error-prone.
Tools4AI transforms this process by offering a unique feature that converts an entire screen or webpage into a Plain Old Java Object (POJO). This powerful capability enables you to extract the structure and content of a complex UI in one step, significantly streamlining the testing process

The POJOS look like this

@Setter
@Getter
@NoArgsConstructor
@ToString
public class LibraryScreen {
@ListType(Book.class)
private List<Book> latestBooks;
@ListType(Member.class)
private List<Member> members;
}

and

@Setter
@Getter
@NoArgsConstructor
@ToString
public class Book {
private String title;
private String author;
private String genre;
private boolean isAvailable;
// Constructors, getters, and setters
}

and

@Setter
@Getter
@NoArgsConstructor
@ToString
public class Member {
private String id;
private String name;
@Prompt(dateFormat = "ddMMyyyy")
private Date membershipStart;
private int booksLoaned;
// Constructors, getters, and setters
}

Please pay special attention to @Prompt(dateFormat = “ddMMyyyy”) as this will convert the date to the specified format automatically
Since this is a complex screen we can use Transformer

tring text = processor.imageToText(GeminiImageExample.class.getClassLoader()
.getResource("images/library.PNG"),"convert the entire screen to text");

GeminiV2PromptTransformer transformer = new GeminiV2PromptTransformer();

log.info(transformer.transformIntoPojo(text, LibraryScreen.class).toString());

and the entire screen will be converted to POJO

INFO: LibraryScreen(latestBooks=[Book(title=The Great Gatsby, 
author=F. Scott Fitzgerald, genre=Fiction, isAvailable=true),
Book(title=1984, author=George Orwell, genre=Dystopian, isAvailable=false)],
members=[Member(id=001, name=Alice Smith, membershipStart=Thu Jan 12 00:00:00
EST 2023, booksLoaned=4), Member(id=002, name=Bob Johnson,
membershipStart=Wed Feb 15 00:00:00 EST 2023, booksLoaned=2)])

Example 4

String jsonStr = processor.imageToJson(GeminiImageExample.class.getClassLoader().getResource("images/auto.PNG"),"Full Inspection");
log.info(jsonStr);

and the result is

INFO: {
"fields": [
{
"fieldName": "Full Inspection",
"fieldType": "String",
"fieldValue": "Starting at $99.99"
}
]
}

Or it can be converted into Pojo like this

@Getter
@Setter
@ToString
@NoArgsConstructor
@AllArgsConstructor
public class AutoRepairScreen {
double fullInspectionValue;
double tireRotationValue;
double oilChangeValue;
Integer phoneNumber;
String email;
String[] customerReviews;
}

Example 5

Imagine encountering the project report screen shown in the image you uploaded. Tools4AI can transform this screen into a Java object, allowing you to interact with its data in a structured manner for further analysis or testing.

We will convert this entire project report to POJO so that we can take action

@NoArgsConstructor
@AllArgsConstructor
@Getter
@Setter
@EqualsAndHashCode
@ToString
public class ProjectDashboard {
@MapKeyType(String.class)
@MapValueType(Double.class)
private Map<String, Double> featuresImplemented; // Map of quarter to percentage
@MapKeyType(String.class)
@MapValueType(Double.class)
private Map<String, Double> expenses; // Map of quarter to percentage
@ListType(ProjectStatus.class)
private List<ProjectStatus> projectStatuses; // List of project status entries
@ListType(Task.class)
private List<Task> tasks; // List of tasks
@ListType(String.class)
private List<String> criticalItems; // List of critical items
@ListType(String.class)
private List<String> blockers;
}

and

@NoArgsConstructor
@AllArgsConstructor
@Getter
@Setter
@EqualsAndHashCode
@ToString
public class Task {
private String assignedTo;
private String priority;
private String status;
private int completion;
}

and

@NoArgsConstructor
@AllArgsConstructor
@Getter
@Setter
@EqualsAndHashCode
@ToString
public class ProjectStatus {
private String projectName;
@MapKeyType(String.class)
@MapValueType(Integer.class)
private Map<String, Integer> statusCounts;
}

and then we call our image processor

GeminiImageActionProcessor processor = new GeminiImageActionProcessor();
ProjectDashboard projectDashboard = (ProjectDashboard) processor.imageToPojo(GeminiImageExample.class.getClassLoader().getResource("images/RAG.PNG"), ProjectDashboard.class);
log.info(projectDashboard.toString());

and here is the result

INFO: ProjectDashboard(featuresImplemented={1st Qtr=64.0, 3rd Qtr=11.0, 
2nd Qtr=25.0}, expenses={1st Qtr=25.0, 3rd Qtr=40.0, 2nd Qtr=35.0},
projectStatuses=[ProjectStatus(projectName=Neo, statusCounts={Issues=3,
Features=2, Backlog=4}), ProjectStatus(projectName=Wypal,
statusCounts={Issues=3, Features=1, Backlog=2}),
ProjectStatus(projectName=Dorake, statusCounts={Issues=1,
Features=3, Backlog=2}), ProjectStatus(projectName=Symphony,
statusCounts={Issues=2, Features=3, Backlog=1})],
tasks=[Task(assignedTo=John, priority=High, status=Done, completion=100),
Task(assignedTo=Smith, priority=Normal, status=In progress, completion=20),
Task(assignedTo=Zoya, priority=Low, status=Not started, completion=0),
Task(assignedTo=Ellie, priority=High, status=In progress, completion=40)],
criticalItems=[Order more RAM], blockers=[Server Upgrades, Core Processors])

Conclusion

Tools4AI’s image-to-Java object conversion feature provides a bridge between raw visual data and structured information, allowing you to process, analyze, and validate data in a way that is both efficient and accessible. It can streamline many tasks, from automated UI testing to data-driven applications, by enabling complex data manipulations with minimal effort.

Code for this article is here

🔥 𝐑𝐞𝐯𝐨𝐥𝐮𝐭𝐢𝐨𝐧𝐢𝐳𝐞 𝐘𝐨𝐮𝐫 𝐔𝐈 𝐓𝐞𝐬𝐭𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐓𝐨𝐨𝐥𝐬4𝐀𝐈! 🔥

Tired of manually inspecting every element in a complex UI? Tools4AI is here to change the game. This groundbreaking technology converts entire screens or images into structured Java objects (POJOs), enabling you to streamline your automated UI testing, data extraction, and data analysis. No more tedious element-by-element validation — Tools4AI does it all in one step!

𝐊𝐞𝐲 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:

🚀𝐂𝐨𝐦𝐩𝐫𝐞𝐡𝐞𝐧𝐬𝐢𝐯𝐞 𝐃𝐚𝐭𝐚 𝐄𝐱𝐭𝐫𝐚𝐜𝐭𝐢𝐨𝐧: Convert entire screens into Java objects for efficient data processing.
⚡ 𝐒𝐢𝐦𝐩𝐥𝐢𝐟𝐢𝐞𝐝 𝐀𝐮𝐭𝐨𝐦𝐚𝐭𝐢𝐨𝐧: Automate UI testing with a structured approach, interacting with POJOs instead of individual elements.
🔥 𝐄𝐧𝐡𝐚𝐧𝐜𝐞𝐝 𝐄𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲: Quickly validate entire UI sections to ensure design consistency and accuracy.

𝐔𝐬𝐞𝐬:

🏥 𝐇𝐞𝐚𝐥𝐭𝐡𝐜𝐚𝐫𝐞: Convert images of medical records into structured data to streamline electronic health records (EHR) management and automate data entry.
🛒 𝐑𝐞𝐭𝐚𝐢𝐥: Turn product images into Java objects to enhance inventory management and automate real-time stock tracking.
🏫 𝐄𝐝𝐮𝐜𝐚𝐭𝐢𝐨𝐧: Convert classroom whiteboard notes or scanned documents into Java objects for automated digitization and learning analytics.
🏭 𝐌𝐚𝐧𝐮𝐟𝐚𝐜𝐭𝐮𝐫𝐢𝐧𝐠: Automate quality control by converting product images into Java objects to detect defects and ensure compliance with manufacturing standards.
💰 𝐅𝐢𝐧𝐚𝐧𝐜𝐞: Transform financial reports or stock data into Java objects to automate financial analysis and streamline data processing.

#artificialintelligence, #selenium, #Java, #automation, #Gemini, and #OpenAI #AI

--

--