How to Test Quality in Video Conferencing Applications

As the pandemic is going on, almost all socializing and co-working environments have shifted to the digital side. Most universities and office staff started to use different tools to help them communicate throughout the workday. One tool that has been getting more popular every single day is conference applications. In less than a year in different countries, downloads of video conferencing applications have grown as high as 30 times.

Figure 1. Growth in downloads of video conferencing apps, the data was taken from Statista Research Department, Feb 4, 2021

As for the theoretical side, the conference call application is the one that supports multiple functionalities that people use in a modern office or study environment. Such functionalities include – multiple participant video and audio calls, the possibility to share the screen, and other tools to ensure a comfortable meeting for all participants.

Today people have a variety of solutions from which they can choose. According to the vendors themselves, in 2020, their user base sky-rocketed. Zoom platform has 300 million daily active users on average. Microsoft Teams stands at around 145 million daily active users. However, Google Meet isn’t far behind with 100 million, and at last, Cisco WebEx stands with its 300 million users worldwide. These numbers give an understanding of the tools’ popularity during this time. That is why other applications, such as Facebook Rooms and Blue Jeans, try to create new and innovative competition for existing solutions and why testing the conference call quality has become crucial to create a more pleasant experience for the users.

The functionality of video conferencing applications

Nowadays, video conferencing technology gives their users new opportunities to create the best meeting the digital world can offer. There are main parts of the video conferencing that can be tested, not only in terms of functionality but also for their quality.

These parts include:

Audio calls – voice-only call;
Video calls – calls that include transmitting not only audio but also video;
Screen-sharing – the ability to share the screen during the audio or video call;
Group calls – audio or video calls with more than 2 participants.
Cross-platform – the application is easy and comfortable to use on various devices and platforms – from desktop to mobile.

Each of these functionalities provides convenience to the user. Since the pandemic has started, the popularity of search terms for certain functionalities has visibly increased.

Figure 2. The popularity of search terms worldwide. Data by Google Trends.

The value on the y-axis is the relative popularity for the search term in 12 months’ time. Based on the data, we can see that due to lockdowns and the necessity for offices to shut down and start working from home – screen share and video calls have become objects of interest. Thus, it is crucial to create a high-quality experience for users, to ensure that the vendor’s and the tool’s popularity does not suffer.

What impacts the call from the participant’s side?

Unfortunately, there are possible problems on the way to create a good connection and good user experience. Most of them are unseen by the human eye and impossible to spot while using the software. Finding the problem usually takes time and can impact the efficient workflow. The user might need to check the router at home, Google for answers, and understand how the platform works or how the device works to solve the issue. To avoid this time loss and make the users happy, more and more vendors choose to test their product to find the problematic areas more efficiently and be ready for future updates. In this part of the article, we will try to understand some of the most common issues that might impact video/audio conferencing calls.

Sometimes users can experience some delayed responses in the call, which usually are consequences of round trip time (RTT) of the user’s network. In simple words, round trip time is the amount of time that a signal needs to get from one to the other end of the call. RTT is also known as ping time, a more common term for an everyday user. Round trip time can be affected by the distance that the signal has to go through, transmission environment (e.g., Wi-Fi or cable), and network load.
Drops in video and audio quality during the call are quite common issues. The fluctuations during a call usually are created by the alteration in available broadband or some lost packets from the sent or received stream. If the user moves during a call or has an unstable network connection in their home, it means that, at any point, it is realistic to experience information or signal loss.
Another problem worth mentioning is changing the device. Users usually decide to use the most comfortable device and expect the same quality on other devices. However, in reality, it is not the case. The quality depends on the device’s hardware, the platform’s ability to adapt to that type of hardware and create the best performance possible.

There are many more things that can create an unpleasant experience for the user. Such include the inability to suppress background noise, freezes in the video stream, video artifacts during the call, and many more. But most of them have a common cause, which is the network and the quality of the network that is available to the user. So the next question would be how platforms can improve and what to do to solve these issues.

What impacts the call from the vendor’s side?

Fortunately, vendors have some solutions that they can implement to create a better and better user experience during the conference calls. These solutions include architectures and adaptation to common network issues.

One of the examples of conference call architectures is the so-called Selective Forwarding Unit (SFU) architecture. The idea is that each participant sends its video to the server, and the server distributes the video to all other participants, sending the copies of the original video. Then each endpoint proceeds to compress all received videos together. This architecture is less demanding in terms of server resources, and each endpoint has only one outgoing stream that communicates with the server. However, SFU needs more available bandwidth because it needs to contain all copies of the sent video.

Another architecture is called Multipoint Control Unit (MCU) architecture which processes each participant’s video. For each participant, the server combines all video thumbnails in one stream. This stream is encoded to meet the set bandwidth value, and each participant receives a processed copy. This type of architecture is less demanding on bandwidth and CPU power for the users and creates the connection through the vendor’s server.

At last, another commonly used architecture is Simulcast which ensures more flexibility in network consumption. It is based on constant communication between each participant’s device and the vendor’s server. The principle is that the server receives the video from participants in multiple quality levels. After that, the server sends the video copy to the participants. Each copy corresponds to the device’s capabilities and available bandwidth. This architecture does not require the server to process the video streams but demands more resources from the participant’s side.

What can be tested and how?

Whatever the functionality is, a good practice is to provide it in high quality, no matter where the user is in the world. Due to increasing popularity worldwide, it has become crucial to control delay and consumption of the network to sustain good audio and video quality during the call. That said, the biggest question is how to test the application to ensure that sort of quality.

Audio quality

The first thing that comes to mind is the quality of the audio stream. Does the person sound as they should? To evaluate the quality, we use algorithms such as Perceptual Objective Listening Quality Analysis (POLQA) or the Virtual Speech Quality Objective Listener (ViSQOL). Then we calculate the Mean Opinion Score (MOS), which allows marking the quality with one value. The algorithm takes two audio samples – one is the original, another is processed or degraded by the service provider during the call. The main principle is to compare the two and score how similar the quality is to the original speech.

You might be interested in: How We Test Dominant Speaker Detection

Video quality

As mentioned previously, video conferencing is growing in popularity, which means video quality is a big part of a conference call platform. To test the video quality, we use Blind/Referenceless Image Spatial Quality Evaluator (also known as BRISQUE). It doesn’t give a specific score for the video itself but scores each image (frame) separately. As there is no reference, we train and improve the algorithm to evaluate the quality using pre-made pictures and data. In the end, we can analyze the quality score for each second of the video creating an overall average value. The graph shows the variance in video quality during the call (see Figure 3). Full-reference quality metrics, such as Video Multimethod Assessment Fusion (VMAF), also can be used to evaluate the video quality.

Video frame rate

Another quite important video feature during the call is the frame rate. The standard frame rate for movies and TV shows is 24 frames per second. Based on that, the same value per second should be enough to create a good quality video movement also in video conference calls. Nowadays, the target for mobile applications is to ensure a frame rate above 20 frames per second. If we look at video-oriented or desktop applications, the frame rate can get as high as 30 frames per second. It is one of the first metrics that is affected by network issues.

But how can this metric be measured? We measure frames per second with QR-type markers that create a unique combination of four QR squares for each frame. Then each second is split into those frames, and our algorithm counts the number of QR combinations. That generates a value for each second of the call and precise overtime graphs that we analyze to find anomalies and reactions to various network issues.

Figure 4. Video metric markers that are used during video testing.

Delay / E2E latency

Having a delayed signal is normal, as it has to travel through great distances to make the call possible. There are multiple factors for delay alterations, as was previously mentioned. But the standard might be to hold the delay 500 – 700ms at all times (if the user only shares the screen, delay under 1s). To measure audio delay, we use the same POLQA algorithms that include delay calculation. For video, we use color markers.

The video has three color markers in total, which change the colors during the video. These markers are on the sender device’s video, which sends them to the receiver. When processed, the algorithm compares the original video markers with the received video. This type of measurement allows us to capture delay up to 64 seconds. They change every second, so the combinations are counted before the same color markers on both videos are found on the receiver device. It allows us to calculate the delay for every second of the call, which creates precise results.

Network bandwidth

The principle that we use is that network bandwidth limitations impact all other mentioned metrics. That is why it is crucial to understand how it works and how different applications use the available bandwidth to ensure that their services are up and working. Bandwidth and mobile network speed vary from country to country due to service availability. The difference in speed can be very noticeable, which means that the user base demands flexibility from application vendors in terms of network consumption (see Figure 5).

This is an independent variable that we change and manipulate during testing. We have a laboratory that allows us to set custom bandwidth, router queue size, and lost packets during the regular conference calls, which helps us to investigate how the service reacts and how the quality changes (the change in other metrics).

Average mobile network speed in Mbps in different countries — Figure 5. Average mobile network speed throughout 03/20 – 03/21 in Mbps in different countries (Data by Speedtest.com)

Typical test process

The network limitations that the vendor is willing to test are a base for our typical test process. Out of the vendor’s needs, we create test plans that include various limitations going from less to more extreme. We investigate the changes during the call based on the bandwidth limitation and give the client a detailed analysis of findings.

Last thoughts

The world continues to implement more and more digital technologies in the day-to-day workflow. One of these technologies is video conferencing as well because it has multiple positive outcomes, increasing productivity, decreasing costs, and helping people connect from any country. Moreover, it brings internationality into the companies. Considering that there are different video conference technologies available on the market right now, every vendor wants to ensure that their service is the one that users should choose.

And that is what we do at TestDevLab. Our engineers help vendors test and analyze the quality of video conferencing. We do whatever we can to help them understand their users and improve their product. Head to audiovideotestlab.com to learn more about our audio/video quality testing capabilities.