How We Automate Audio and Video Quality Testing
The popularity of video conferencing applications has never been on such a high level. People use video conferencing applications regularly for studying online, for workflow discussion, or for streaming a new game to friends. The most important thing that users want from their video conferencing application is great audio quality without huge delays and with some improvement functions such as echo cancellation or noise suppression, as well as good video quality without freezes during the call. That’s why the quality should be controlled.
A very important aspect of audio and video applications is that quality standards are becoming more rigorous each year. For example, video quality expectations ten years ago were on a much lower level to what they are today. Namely, people were happy enough if the video just worked and they could see each other. However, advancements in technology and changes in video capabilities over time have resulted in users’ expectations rising. Previously, video calls were between two people without any advanced features, whereas now users are often having video calls with multiple participants and using various features, such as screen sharing features, and on various devices with different screen sizes and aspect ratios. Therefore, users expect to see and hear everyone in high quality—without any exceptions.
This shift in user expectations when using video conferencing solutions means audio and video quality testing is more important than ever. In this blog post we will explain how we automate audio and video quality testing and why this type of testing is crucial.
Why do we need to automate audio/video calls?
Creating a team of manual testers would be a good idea if your goal is to test an application as close to the real scenario as possible; however, automation methods have some great advantages—they can execute a generous amount of tests and are much more convenient.
We cannot automate everything—that’s true. Nonetheless, an experienced automation engineer can adapt the automation solution to simulate the behavior of real users in order to obtain results that can reflect the actual user experience. Choosing the right setup, adjusting evaluation algorithms, and choosing the right tools, open an opportunity to test many application features on different platforms including network limited tests, as well as obtaining reliable data for the client.
While manually testing the audio and video quality of applications is possible, the main problems with manual testing are that human and time resources are pretty limited. Manual testers cannot perform tests all day long for the whole week or month. Also, manual testers should always be aware if the testing process is correct. Specifically, whether the network connection is accurate or whether the media capture and feed was executed at the right moment. This requires a lot of focus from manual testers, and we cannot ignore the fact that mistakes can be made during test execution, so it might take additional time, but again, what about automated testing? The main advantages of automated testing over the manual testing process are that tests can be executed with a minimum downtime and the test setup, more often than not, can be expanded which leads to a large number of executed tests and, consequently, more test data.
How do we automate audio and video quality testing?
So, what does the automation workflow for A/V testing look like? For us, the best solution for audio and video test automation is a tool that has been developed internally—TestRay. TestRay is based on Selenium and Appium and allows us to run automated tests on Android, iOS, Web, Windows and Mac applications and supports multi-platform test automation.
Automation takes on many roadblocks and challenges, including security features such as captcha, anti-automated instances, or two-factor authentication. A manual tester would not be bothered by such precautions from the developer, yet automation engineers need to outsmart or work around these issues to find the best solutions.
To automate audio and video quality testing, we start things off by establishing the connection between different users who can use different platforms that the application supports. Namely, we look at UI interactions. We make our moves the same way a user would by using the user interface—users need to log in, join the right meeting or call each other, confirm that the process has been performed in the right way, and check if the call functions and settings are chosen correctly. TestRay is the glue that holds it all together. It allows us to automate the UI interactions with Selenium and Appium scripts in the most convenient way, which leads to a quick coding process.
The next step is the test process itself or more precisely, file playback and recording. Network limitations are set if needed, video and/or audio recording starts, and network trace is captured. At the same time, the media feed starts—audio and/or video media are sent from sender to all the call participants, depending on the test scenario, for example, only audio tests or audio-video tests. Network limitations can depend on how many users take part in a call and how many users send video to other participants.
Different tools can be used for video recordings for different platforms. For video recording on desktop applications we use ffmpeg, for Apple products (Mac and iOS) we use QuickTime, and for Android we use Screen record CLI. Audio recordings can be done using SoX.
The last step in our automated testing process is to analyze the data. When the test process comes to an end, all processes come to a halt and data is gathered. Users disconnect from the call, video, audio and network trace capture stops and saves in sorted folders so that all the test data is stored in the correct place. Then the data analysis part begins: audio/video/network analysis scripts start analyzing all the media files. This way, the data about the overall call quality can be asserted and provided to the client. Media analysis is carried out using different algorithms and tools, for instance, video data is received from Full Reference (i.e. VMAF, PSNR, SSIM) and Non-Reference (i.e. BRISQUE) analysis algorithms, audio analysis goes through POLQA or VISQOL algorithms, and network traces can be analyzed by using the Wireshark CLI tool called Tshark.
The analysis provides us with important data about the audio/video content. It allows us to not only get the quality values of the video or audio, but also gives us an opportunity to see this information in a deeper way. We can evaluate image and sound quality, video frame rate (FPS), video or audio delay, stalls and freeze time, synchronization between the audio and video, and resolution detection.
When these three steps can be automated separately, the automation objective can be considered as completed.
How do we set everything up?
Automation test setups may be very different; however, the “classic” test setup looks like the setup in the image above. One setup may include both desktop and mobile testing. There are two essential roles for each test scenario: receiver and sender.
Both devices are connected to audio cards. For the sender, the device audio card works as an audio input device that allows the sender to send an original audio during the call. For the receiver, on the other hand, the device audio card works as an output device that receives audio from the call. Using the results, we can compare original and received audio and get all the audio quality metrics.
Video recording happens differently between desktop and mobile tests. The sender device records a screen that plays back the test video and sends this to the call. The receiver device records the self-screen, so the recorded receiver video can be compared with the original one. For desktop testing, video is usually imitated by virtual cameras (with applications like OBS or ManyCam).
Both sender and receiver are connected to particular routers, so both devices can be individually limited, and network traces can be captured for each device separately.
All the test data is stored on a large disk because the size of audio and video files can be quite large. All the stored test data can be shown to the customer in a report portal where all the test data can be sorted by date, application, platform, and other parameters.
Stored data is available in graphs. There is a possibility to sort the separate graphs for each metric, for each application under test, for network conditions, and dates. The image below shows FPS results for different applications under different network limitations. We can see that the violet application (last Unlimited column) shows the best performance in FPS, however, we can't see this application in other limitations, which means that this application freezes or even crashes at 2MB limitation already.
As we mentioned before, tests can be run with a low level of downtime. For tests running, the best solution is using the CI/CD tools which enable more convenient test execution and process tracking.
Automated testing for audio and video applications opens great opportunities to see if the application’s performance is on a high level, and find weak spots during the limited network and cross platform usage of the application. The stable test execution ensures a huge amount of the test data can be analyzed so issues can be detected—and resolved—as soon as possible.
Do you have an application with audio and video capabilities? How confident are you that it checks all quality standards and meets users’ expectations? We can help you find out the status of your application’s audio and video quality. Get in touch and let’s discuss your project.