Blog/Quality Assurance

Automated UI Testing: 8 Best Practices to Reduce Flaky Tests

Computer screen displaying lines of code

TL;DR

30-second summary

Eliminating flakiness is essential for maintaining trust in automated UI suites and ensuring seamless CI/CD workflows. By addressing non-deterministic factors like timing, environmental instability, and shared state, teams can transform unreliable scripts into robust assets. Prioritizing strategic synchronization, isolation, and data management minimizes false negatives, accelerates delivery cycles, and preserves engineering resources. Implementing these best practices empowers developers to focus on genuine defects, ultimately fostering a resilient testing culture that supports high-quality software releases.

  • Strategic synchronization and dynamic waiting: Using explicit or intelligent waits ensures tests interact with elements only when they are fully ready.
  • Strict test isolation and state management: Independent test execution prevents side effects from previous runs from causing unexpected failures.
  • Deterministic data and environment control: Standardizing inputs and maintaining stable test environments eliminates external variables that trigger inconsistency.
  • Modular design and robust element locators: Building scripts with resilient selectors and reusable components minimizes breaks during UI updates.
  • Continuous monitoring and proactive refactoring: Regularly auditing test performance helps teams identify, quarantine, and repair unstable scripts immediately.

The quality of your product’s user interface can make or break your customer experience. Research shows that 88% of users are less likely to return to a website after a poor experience, and about 33% of shoppers abandon purchases due to frustrating UI glitches and excessive performance issues.

For a product manager, the ultimate goal is a suite that provides a definitive "yes" or "no" on release day. For a tester, the priority is a suite that operates without triggering many false failures in the middle of the night.

This guide explores the best practices for writing high-quality automated UI tests. Having built frameworks in both Selenium and Cypress, I’ve learned that the secret to a high ROI suite isn't the tools you choose, but the design patterns you employ.

In the following sections, we will cover how to prioritize your automation tests, build resilient locator strategies, master architectural patterns, and handle modern challenges.

1. Prioritize automated tests

Effective automated UI testing does not begin with tools, frameworks, or code. It begins with strategic restraint, deciding which scenarios genuinely deserve automation and which do not.

The first best practice is applied before you have written a single line of code. In most cases, it's impossible—and often counterproductive—to automate every manual test case. But how do you determine which test cases to automate?

You can adopt the risk-based approach. This three-tier strategy focuses automation efforts on the critical user journeys, such as authentication, checkout, and payment processing, where a technical failure would result in immediate and direct revenue loss for the business.

  • Tier 1 (critical): Login, product search, checkout, payment processing.
  • Tier 2 (high): Account creation, password reset, profile update.
  • Tier 3 (low): About pages, FAQ links, footer icons, and UI issues (non-functional).

By targeting tiers 1 and 2, automation provides the highest return on investment: 80% risk coverage for 20% effort. Aiming for 100% UI coverage is a common pitfall. Such suites usually fail under pressure because the time and resources needed to maintain low-impact tests eventually starve the project of its efficiency. It’s important to note that tier 3 tests aren't "ignored" but are often better suited for exploratory testing or occasional smoke tests rather than full automation.

2. Implement robust locator strategies

Once high-value scenarios are selected, the reliability of the entire suite depends on how consistently the automation can interact with the UI.

Automated UI tests frequently fail without code changes due to unstable element identification. This instability is often caused by reliance on auto-generated XPaths or CSS selectors that are tightly coupled to the page’s visual structure.

Establishing a "testing contract"

The most resilient strategy is the implementation of a so-called ”testing contract” between developers and testers. By using dedicated attributes that are independent of CSS styles and HTML structure, the automated tests remain stable even when the visual design or layout of the application is updated.

  • Best: [data-testid="submit-login"] (Stable; clearly intended for testing).
  • Acceptable: .btn-primary (Brittle; breaks if a designer changes the button style).
  • Bad: //div[@id='root']/div[2]/form/button (Brittle; breaks if a div is added).

Comparison of locator strategies

Strategy Stability Maintainability Recommendation
Data-test-ID Highest Highest Industry standard.
ID / Name High High Use if unique and constant.
Text Content Moderate Moderate Use for user-facing assertions.
CSS Class Low Moderate Avoid styling classes.
Absolute XPath Very low Very low Never use it.

3. Use Page Object Model (POM)

Stable locators prevent immediate failures, but long-term maintainability requires a deliberate architectural foundation.

The Page Object Model (POM) is a foundational architectural pattern in UI automation. It separates the what (test intent) from the how (UI implementation details).

The benefits of separation  

By decoupling the test logic from the UI elements, the automation suite gains several key advantages:

  • Readability: Test scripts focus on user actions (e.g., loginPage.login(user)) rather than technical details (e.g., driver.findElement(By.id("user")).sendKeys(...)).  
  • Reusability: Common actions, like logging in or navigating a menu, are defined once and inherited by all relevant tests.  
  • Stability: Structural changes to the application only impact the Page Object layer, leaving the core test logic untouched.

Code example: Selenium (Java)

public class LoginPage {
 private WebDriver driver;
 // Locators should be stored in one place
 private By emailField = By.id("login-email");
 private By loginButton = By.id("submit-button");

 public LoginPage(WebDriver driver) { this.driver = driver; }
 
 public void login(String email) {
  driver.findElement(emailField).sendKeys(email);
  driver.findElement(loginButton).click();
 }
}

4. Avoid static pauses

Even with a clean architecture, UI automation can quickly become unreliable if it handles asynchronous behavior incorrectly.

One of the most common anti-patterns in UI automation is the use of hardcoded delays such as Thread.sleep(). This approach is often a quick fix for timing issues, but it creates a fragile and inefficient automation suite.

This mindset creates two major issues:

  • Wasted time: If a test suite contains 100 tests and each includes a redundant 2-second sleep, over 3 minutes of execution time is wasted in every single run. This delay scales poorly as the suite grows.
  • False failures: Static sleeps are not adaptable. If the server is under heavy load and an element takes 4 seconds to appear, a 3-second sleep will still result in a test failure, despite the application eventually functioning correctly.

Explicit and fluent waits

Instead of using “sleep”, use “explicit waits.” These tell the driver to poll the browser every 500ms until a specific condition is met (like an element being clickable).

  • Selenium: Use WebDriverWait with ExpectedConditions.
  • Cypress: Cypress does this automatically with its built-in retry logic.

Code example: Selenium (Java)

Do not use:

try {
 Thread.sleep(2000);
 signInButton.click();
} catch (InterruptedException e) {
   e.printStackTrace();
}

Instead use:

try {
 WebElement element = wait.until(ExpectedConditions.visibilityOfElementLocated(signInButton));
 element.click();
} catch (InterruptedException e) {
   e.printStackTrace();
}

5. Test data management: API seeding

After synchronization issues are addressed, test execution time and reliability are most often impacted by how test data is created. 

One of the biggest bottlenecks in UI testing is setting up the state of the application via the UI. If you want to test the "Delete Order" button, you shouldn't have to script the process of logging in, searching for a product, adding the product to the cart, and checking out just to delete.

Instead, use the shift-left data strategy, and use API calls to set up your test data. Take a look at the example below:

  1. POST to /api/login to get a session token.
  2. POST to /api/orders to create a fresh order.
  3. Navigate directly to /orders/{id}.
  4. UI Test: Click "Delete" and verify the UI response.

By decoupling the test from the prerequisite UI steps, the suite gains significant stability and performance. This method ensures that the test remains a targeted validation of the component, rather than an accidental end-to-end journey that is vulnerable to unrelated changes in the application.

Man looks looks at test automation scripts

6. Idempotency and test isolation

A fast test setup alone is not sufficient if tests interfere with one another through shared state.

A test is considered idempotent if it can be executed multiple times in succession and consistently produces the same result. The primary obstacle to this reliability is "shared state"—a scenario where tests interfere with one another by modifying the same data or environmental settings. When tests are coupled through a shared state, the suite becomes prone to "flakiness" that is difficult to debug and resolve.

To maintain strict isolation and ensure high-signal results, the automation should follow these best practices:

  • Unique data generation: Instead of hardcoded values, use libraries like Faker to generate unique attributes, like distinct email addresses for every run. This prevents "email already exists" errors and enables parallel execution without data collisions.
  • Independent test logic: Every test must be entirely decoupled. If Test A depends on the success of Test B, a single failure triggers a domino effect of false alarms. Each script must be capable of standing on its own.
  • Cleanup: Use hooks such as "@After" or "afterEach" to delete created data or reset the database state. However, a better approach is to ensure your tests are designed to work even if the data from the last run is still there.

Shifting from manual cleanup to resilient, independent design ensures that the suite remains stable even in the face of unexpected interruptions or environment lags. This focus on isolation transforms a fragile collection of scripts into a professional, scalable automation framework.

7. Advanced assertions: Beyond surface-level checks

Stability and isolation ensure tests run consistently—but consistency alone does not guarantee meaningful validation.

High-value automation validates business rules, application state, and user experience—not just element visibility. Moving beyond basic presence checks ensures the suite provides meaningful feedback on the quality of the software.

  • Negative testing: Don't just test that a valid login works. Test that an invalid login shows the correct error message, and that the "Submit" button becomes disabled after three failed attempts.
  • Visual regression: For visually complex UI components (like a dashboard chart), standard functional tests are insufficient. Integrate tools like Applitools or Percy to compare pixel-by-pixel screenshots against a baseline.
  • Accessibility (A11y) testing: Ensure the site is usable for everyone by using axe-core, which can be integrated into Selenium or Cypress. It automatically flags issues like poor color contrast or missing labels for screen readers during every test run.

8. Handle flakiness with intelligent retries

Even with perfectly written code, the environment may still be "noisy". Some network blips, slow microservice responses, or infrastructure lags can cause false negatives, even when the application itself is working as it's supposed to.

To protect your tests from these false alarms, implement a smart retry policy. Instead of failing a build at the first sign of trouble, the framework should be configured to retry the failed test one or two additional times. This creates a vital distinction for the team:

  • The "flaky" flag: If a test fails initially but passes on a subsequent try, it is marked as "flaky". This signals that the feature works, but the environment or the test itself needs to be checked for some issues.
  • The "real bug" alert: If a test fails all three times, it's a confirmation that it really is a bug.

This clarity is essential for product managers, as it allows them to distinguish between an unstable testing environment and a broken feature that should block a release.

Final thoughts

Building automation tests with high ROI is not about achieving 100% coverage; it is about achieving 100% trust. By shifting your focus toward a risk-based approach, you ensure that your team spends its time protecting the user journeys that actually drive revenue.

The transition from a fragile collection of scripts to a professional framework is defined by the design patterns you choose. By implementing robust locators, mastering the Page Object Model (POM), and using API seeding for faster setups, you create a test that is resilient to most of the changes.

When these practices are used together with advanced assertions and retries, automation stops being a burden that needs to be updated every few weeks and starts being the definitive "yes" or "no" your organization needs to move fast with testing.

FAQ

Most common questions

What is the primary cause of flaky UI tests?

Flakiness typically stems from timing issues, where tests attempt to interact with elements before the application has fully rendered or processed background tasks.

How does test isolation improve reliability?

Isolation ensures each test starts with a clean state, preventing "leaking" data or settings from one test from impacting the outcome of another.

Why should fixed sleep commands be avoided? 

Hardcoded delays waste time if the app is fast and fail if it is slow; dynamic waits adapt to the actual application speed.

What role does the test environment play?

Unstable or shared environments introduce external dependencies, such as network latency or database locks, which cause tests to fail without any code changes.

Stop wasting time on flaky UI tests—turn your automation into a source of trust.

Schedule a consultation and let our team help you build stable, high-ROI tests that deliver real confidence in every release.

QA engineer having a video call with 5-start rating graphic displayed above

Save your team from late-night firefighting

Stop scrambling for fixes. Prevent unexpected bugs and keep your releases smooth with our comprehensive QA services.

Explore our services