Automated Functional Testing for Audio & Video Apps

Person on video call

According to insights into the global app economy, social networking, messaging, and communication apps are the most popular type of apps in 2021 by number of downloads. And the popularity is only expected to increase. So the question is—how can you tap into this ever-growing user base and make sure your app achieves a high level of success?

To stay on top of the app leaderboard, you need to provide quality services to your user base and always keep their best interests in mind. Specifically, your application must have various functionalities, an easy-to-use user interface, high audio and video quality, and an outstanding performance. In some of our previous blog posts, we have explained how we test the quality in video conferencing applications, looked at functional testing for live streaming applications and addressed performance in short video applications. In this article, we will look at how we perform automated functional testing for audio and video applications.

But what exactly is functional testing of an application? In simple terms, functional testing is checking all of the app’s functionalities against predetermined specifications and determining whether they work as intended. It is not necessary to look at quality or performance metrics while doing functional testing. The main goal is to catch all the bugs that do not allow the user to perform certain actions within the application. When done periodically for newer versions of the application and looking at the core functionalities, this process is called smoke testing. Automated functional testing is an effective way to save time and resources, ensure consistency between test runs, secure higher test coverage, and increase reliability of test results.

man on video call with colleagues

How do we set up automated functional testing?

There are many ways to set up an automated test suite. The one we have chosen to explain in this article is a combination of Cucumber, which is a behavior-driven development tool, and Py-TestUI, an internally developed testing framework based on Appium and Selenium, written in Python.

As our goal is to test core audio and video functionalities, we also need to simulate multiple users to test audio, video, and screen sharing on a call. For an additional user we can simply add one more Appium or Selenium driver. However, if we want to simulate, for example, 10 additional users, then there is an option to create a Selenium Grid with the necessary Selenium nodes in Docker. Each node, a simple Docker container, hosts one additional user.

The next thing we need to accomplish is to build the feature file containing all the steps that must be done in each of the testing scenarios. For this, we use the Gherkin language. Its simple structure and understandability benefits the client—the test suite structure is high-level and comprehensible to anyone.

As we follow the Page Object design pattern, after the feature files are complete, we are ready to build step definitions—in this case written in Python. A step definition is a function that references a step in the feature file and connects it to interactable elements in the application under test by simply calling functions from page objects. We define elements and actions with them in page objects to separate the element logic from the scenario logic.

feature file to step definition file to page file

Let’s look at an example. In the feature file we would write the following steps: “When User unmutes themself” and “Then User is heard by others”. Subsequently, we would write step definitions that reference those two steps and in those we would call functions from items like page object files. For example, in the first step definition (“User unmutes themself”) we would call the “callPage.pressUnmuteButton()” function and in the second step (“User is heard by others”) we would call something like the “helperFunctions.isAudioHeard()” function.

With Py-TestUI it is possible to set up TestUIDriver for many platforms—we can set it up on Android and iOS by using Appium automation. Similarly, we can set it up on a browser and Electron app by using Selenium. And, of course, we can set up a Jenkins Pipeline or TeamCity Build Chain to execute tests more easily when we have made a dedicated setup with devices that are under test.

How do we test audio?

When testing audio and video applications, we need to make sure that every aspect of their functionalities work. Let’s start with audio. When testing audio, we need to make sure that while on a call, a user can talk and the other user can hear them clearly. Additionally, we need to confirm that the user can mute themself and the other user cannot not hear them. To do this, there are various tools we can use to determine if the audio is sent or not.

One way to check this is using the FFmpeg tool. With FFmpeg’s “volumedetect” filter we can easily determine whether or not audio is present in an audio file. First, we would record an audio sample and then we would run the FFmpeg command with the necessary filter to obtain the output for the sample. Then we would validate that the volume level is high enough, confirming that the user is able to send audio. In a similar fashion, when the user is muted, we would need to validate that the volume level does not reach a level that would be considered a normal audio volume.

Another tool we can use to test audio is the SoX tool. With the SoX stat effect we are able to obtain the volume level for the recorded audio sample. Just like we would do using the FFmpeg tool, we would then evaluate this result and validate whether the audio sample contains nothing but silence or has some audio as well.

How do we test video?

To find out if the most important video functionalities work, we use image recognition algorithms implemented in Py-TestUI that come from the OpenCV library.

example of image recognition

To validate that other users can see the video—on a video call—from the user on the device being tested, we film or feed our sample video. We then use an image recognition algorithm to compare the expected video with the actual video. There are two outcomes from this validation. The first outcome is that we can confirm that other users see the correct video. The second outcome is that we discover that other users do not see the expected video. In this case, we need to investigate more deeply to find out if there is a bug in the application.

We also need to verify that the user on the device under test is able to receive videos from other users. Once again, the image recognition algorithm is used to compare our custom video—sent from other users—to the videos that the user receives on the device being tested.

If there is an option to share the device’s screen, then this functionality must also be validated. Namely, the user should be able to share their screen and others should be able to receive this stream. This applies the other way round as well—the user should be able to see when other users are sharing their screens. To implement screen sharing functionality validation, we use the image recognition algorithm to compare expected video with the actual video.

Another aspect to look at during testing is how different applications display videos during calls. For example, is there an option for a dynamic/focus dominant view or grid view? We would also need to validate that these views work as intended in different scenarios.

To wrap up

An automated test suite can be made with various tools and using all kinds of different frameworks. The best choice, however, depends on the necessity for certain functions of these tools. In this article we looked at functional testing for audio and video applications. We went over the different tools we use to validate different aspects of functionality. For instance, when validating a video call where every user can see other users’ videos, the image recognition algorithm that is implemented in Py-TestUI becomes quite useful and proves to be one of the best options to choose in this situation.

Of course, familiarity with different tools also plays a big part when considering which tools to use when performing functional testing for audio and video applications. Maybe the QA engineer tasked with developing the automated test suite is more familiar with the Java programming language. In this case, they should choose a Java-based testing framework, such as our TestUI. Or if the engineer prefers Ruby and YAML syntax for writing tests, they can use our Testray automation framework. This automation framework is supported on a wide range of platforms—it can even run Windows and Mac applications.

That just about sums up how we perform automated functional testing for audio and video applications. Our engineers are experts in the field and we do absolutely everything in our power to improve our clients’ products. Your application is meant to be seen and heard in its true form. Learn more about our audio and video quality testing capabilities and get in touch to discuss your project.

Subscribe to our newsletter

Sign up for our newsletter to get regular updates and insights into our solutions and technologies: