CV_POM: Revolutionizing Web and App Testing with Object Detection

A screen showing the CV_POM framework using object detection on a website

In the fast-paced world of web and app testing, staying ahead is not just a goal—it's a necessity. This is why we implemented object detection technology in website and application testing. Continue reading to learn more about CV_POM and find out how object detection can elevate your testing game.

CV_POM: An All-Inclusive Framework

Imagine a framework that transforms image inputs into a JSON output, creating a page object model that pinpoints all recognized elements, including object names and coordinates. We’ve created such a framework called CV_POM. This framework opens up a world of possibilities for developers and testers alike. Additionally, the CVPOMDriver lets you easily connect it to your favorite automation tools, making your testing process smoother and more efficient. The versatility of CV_POM extends beyond basic automation, offering a range of use cases:

  • Unique locators for UI elements. Users can leverage the JSON output to identify distinct locators for specific elements within the UI.
  • Effortless integration with automation frameworks. CV_POM seamlessly integrates with popular automation frameworks like Selenium or Appium. This integration allows users to automate interactions with the UI using familiar tools. With CVPOMDriver, you just need to overwrite a couple of methods in the 'CVPOMDriver' class. Once that's done, you can use it to find and interact with UI elements in your app. No need to worry about the specifics - it's that simple!
  • Platform-agnostic automation. As CV_POM doesn't rely on application APIs, it becomes a generic solution applicable to any platform or app combination. Users can employ the same APIs for automation across various platforms.
  • Workflow automation based on UI representation. CV_POM goes beyond traditional automation by enabling users to automate workflows based on the UI's visual representation. This unique feature allows for the validation of element stylings and placements, addressing a commonly overlooked aspect in many automation frameworks.
  • Optical character recognition: CV_POM includes Optical Character Recognition (OCR) functionality for individual objects. This feature further expands its capabilities for a variety of applications.

Fundamentally, CV_POM brings a revolutionary method to automation, providing an inclusive solution that meets the diverse needs of testers. The CVPOMDriver simplifies UI automation, works across all platforms and ensures your app looks flawless. When combined with the powerful features of CV_POM, you get a robust, all-encompassing solution that makes UI automation more effective and efficient than ever before. Give it a try and see how easy automation can be.

Unleashing the Power of Object Detection

Object detection, a valuable tool in computer vision, gives computers the ability to identify and locate objects within images or video streams. It's not just about recognizing objects; it goes the extra mile by pinpointing their exact position within the image. This system depends on deep learning models, like convolutional neural networks (CNNs), to analyze visual information and make precise predictions. When an object is detected, it not only reveals the object's name but also provides a confidence score.

Object detection stands out in handling a variety of situations, demonstrating exceptional adaptability when compared to other techniques, such as template matching and feature detection. While template matching relies on predefined templates for object identification, and feature detection concentrates on identifying key features, both methods face difficulties with dynamic content and variations in size and appearance. 

You might wonder, why bring object detection into the world of website and application testing? Well, existing tools for automating website tests often require access to the code, and that's not always feasible. Some websites use scripts that aren't accessible and the only way to analyze and test those websites is using the visual information that is provided. A good example of this is the Flutter Gallery website. And that's where object detection comes to the rescue.

To tackle this challenge, we've developed two types of models within CV_POM. First, a generic model perfect for testing overall website accessibility, capable of detecting 47 classes. Second, a personalized model with customizable classes for specific testing scenarios.

Seamless Testing with CV_POM

For generic model training, we used a diverse training dataset from over 20 different websites. This extensive exposure empowers the generic model to spot objects across varied layouts with impressive confidence and accuracy, as demonstrated in the examples below:

Images showing object detection using a generic model
Images on left: Original images Images on right: Object detection with a generic model

When performing web and app testing, one-size-fits-all solutions often fall short. This is why we are also developing personalized models for specific use-cases. Unlike the generic model, these models have a focused mission—identifying only the objects crucial to a specific testing scenario while overlooking the rest. We make sure these models work really well and give accurate results on different screen sizes and positions, as demonstrated in the examples below:

Images showing object detection using a personalized model
Images on left: Original images Images on right: Object detection with a personalized model

As we explore the CV_POM, let's reveal the wizardry behind the scenes. How did we attain those remarkable results, and what's the secret recipe that powers the creation of these models? Get ready for an exciting journey as we walk you through the process of development.

CV_POM Model Development Process

Our journey started with dataset gathering and testing. To train our models effectively, we need lots of different images. Images can be retrieved from browsers, apps, or any other platform for later use in model training, though the exploration of image retrieval is still pending.

After collecting all the images, the next crucial step is labeling. We use the open-source, web-based image and video annotation tool, Computer Vision Annotation Tool (CVAT). This step is time-consuming but absolutely essential. Initially, we focused on developing a generic model capable of detecting 47 objects. We labeled around 1300 images with multiple objects in each of them, ensuring precision and eliminating human errors. As we dove deeper, we moved on to developing specialized testing scenarios, only labeling objects with custom names relevant to specific testing needs. The image count for this approach varies based on the testing use case. The second model, tailored for specific scenarios, has remarkable accuracy (close to 99%), outshining the first approach (with 83% accuracy).

With labeled data in hand, the training phase begins using the YOLO architecture. YOLO (You Only Look Once) is an object detection algorithm employing a single neural network to predict bounding boxes and class probabilities for objects in an image. 

The results? Impressive accuracy scores, seamless object detection, and a framework that's set to transform web and app testing.

Pros and Cons of Object Detection in Web and App Interfaces

The integration of innovation technologies often comes with a range of benefits. Some of them are mentioned below:

  • Adaptability to changes: Websites and applications often experience updates and modifications. Object detection algorithms can adjust to these changes, guaranteeing the durability and relevance of the testing process over time.
  • Automation: With the ability to automate the identification of objects, object detection significantly reduces the manual effort required in testing. This results in more efficient and faster testing processes.
  • Flexibility: Object detection can identify a wide spectrum of objects on a webpage, including images, buttons, forms, and dynamic content. This adaptability allows for more inclusive testing scenarios.

While object detection offers immense potential for web and app interfaces, it also comes with challenges, such as:

  • Ethical considerations: It's crucial to act responsibly when using object detection. Developers and organizations must handle user data responsibly and ensure that their object detection implementations align with relevant regulations and best practices.
  • Data security: Considering that object detection involves processing and analyzing data, securing it is crucial. Strong security protocols should be implemented to protect user data from potential breaches or unauthorized access.
  • Privacy concerns: Implementing object detection involves handling user data, leading to privacy concerns. Developers and organizations must proceed with caution, making sure that user information is handled responsibly.

Object detection, when managed with caution, can truly transform the digital landscape into a more inclusive and efficient space for users.

What's Next?

We have released CV_POM as an open-source framework under AGPL-3.0 license. We invite all developers to dive in, explore, and contribute to making it even better.

But that's not all! Our team is working on improving optical character recognition and developing additional tools, including a large language model for codeless automation testing scenarios, inspector tools, and more.

Let's transform web and app development together! Stay tuned for the exciting tools coming your way. Happy coding! :) 

Subscribe to our newsletter

Sign up for our newsletter to get regular updates and insights into our solutions and technologies: