# Flaky Test Detection

{% hint style="info" %}
**Who is this for?** QA managers and engineering managers who need to separate intermittent test noise from real regressions — so failures that matter get attention and flaky tests don't block deployments.
{% endhint %}

> **Flaky test:** A test case that produces inconsistent results — passing on some executions and failing on others — without any change to the application code or test definition, typically caused by timing issues, environment variability, or non-deterministic UI behavior.

Flaky tests are the primary reason development teams lose confidence in automated test suites. When every pipeline failure requires human triage to determine whether it is a real regression or noise, velocity drops and eventually the suite is ignored. ContextQA addresses this by classifying every failure with an AI-derived root cause category, making flakiness visible as a distinct failure type rather than an undifferentiated red status.

## What is a flaky test?

A flaky test passes on some runs and fails on others under conditions that have not changed — same code, same environment, same test definition. Common causes include:

* **Timing dependencies:** The test clicks a button before an async operation completes.
* **Order dependence:** The test relies on state left by a previous test case that sometimes runs in a different order.
* **Environment variability:** Network latency spikes, DNS resolution delays, or shared database contention.
* **Non-deterministic UI:** Animations, lazy-loaded components, or third-party widgets that render at unpredictable times.

ContextQA defines flakiness as a failure category distinct from an application bug. If the application has a genuine regression, every run fails consistently and ContextQA classifies it as an **application bug**. If the failure is in the test logic, ContextQA classifies it as a **test bug**. If the failure is intermittent with no reproducible cause in the application, ContextQA classifies it as a **flaky failure**.

## How ContextQA classifies failures

ContextQA uses AI root cause analysis on every failed test case. The analysis pipeline examines:

* The failing step and the error message
* The browser console log for JavaScript errors
* The HAR network log for failed or slow requests
* The DOM state at the time of failure (from the Playwright trace)
* Historical execution data for the same test case

From this data, ContextQA assigns one of four failure categories:

| Category              | Meaning                                                                                        |
| --------------------- | ---------------------------------------------------------------------------------------------- |
| **Application Bug**   | The application behaved incorrectly; the test is functioning as designed                       |
| **Test Bug**          | The test definition has an error — incorrect selector, wrong expected value, missing wait      |
| **Flaky Failure**     | The failure is intermittent; root cause is timing, environment variability, or non-determinism |
| **Environment Issue** | Infrastructure-level failure — network timeout, missing credential, environment not responding |

The AI explanation accompanying each classification states specifically what evidence led to the classification. For a flaky failure, the explanation typically notes that the test has both passed and failed on identical code, and identifies the specific step and condition that is non-deterministic.

## Accessing flaky test data in the Analytics Dashboard

1. Open **Analytics** in the left navigation.
2. Click the **Execution Dashboard** tab.
3. Locate the **Consistently Failing Tests** widget. This widget lists test cases with repeated failures across recent runs, ranked by failure frequency.
4. Click any test case in the widget to open the failure detail panel.
5. The detail panel shows the failure category distribution (how many runs were classified as flaky vs. application bug vs. other) and the AI explanation for each failure type.

The **Consistently Failing Tests** widget surfaces test cases that failed in multiple consecutive runs. ContextQA distinguishes between cases that always fail (likely an application bug or test bug) and cases that alternate between passing and failing (likely flaky). The visual indicator for flakiness is a mixed pass/fail run history in the sparkline column.

For a broader view, the **Failure Analysis** report (accessed from **Analytics → Failure Analysis**) shows aggregate failure categories across an entire test suite or date range, allowing you to measure what percentage of your failures are flaky versus genuine regressions.

## Using the get\_root\_cause MCP tool

The `get_root_cause` MCP tool returns the AI failure classification for a specific execution programmatically. The response includes:

* `failureCategory`: one of `APPLICATION_BUG`, `TEST_BUG`, `FLAKY_FAILURE`, `ENVIRONMENT_ISSUE`
* `explanation`: the AI's natural-language explanation of the root cause
* `fixSuggestion`: a concrete suggestion for resolving the failure
* `affectedStep`: the step number and action that failed

When building CI integrations that need to distinguish "block the release" from "likely noise," use the `failureCategory` field from `get_root_cause` to gate your pipeline logic. A pipeline that fails the build on `APPLICATION_BUG` but creates a Jira ticket and continues on `FLAKY_FAILURE` is a common pattern for teams managing large suites.

Example MCP invocation pattern:

```
get_root_cause(executionId: "<execution_id>")
→ { failureCategory: "FLAKY_FAILURE", explanation: "...", fixSuggestion: "Add an explicit wait for the modal animation to complete before asserting." }
```

## What to do when a test is classified as flaky

ContextQA classifies the failure; resolving it requires one of three approaches depending on the root cause:

**1. Fix timing issues.** If the AI reasoning log identifies a race condition (for example, clicking an element before it is interactable), add an explicit wait step in the test case. In ContextQA's step editor, add a **Wait** action before the problematic step. The appropriate wait target is either a specific element becoming visible or a network request completing.

**2. Configure retry behavior.** ContextQA test plans support a recovery action for failed test cases. In **Test Plans → \[Plan] → Settings**, the **Recovery Action** field accepts `Run_Next_Testcase` as a value. This instructs ContextQA to continue executing remaining test cases when one fails, rather than halting the plan. Combine this with post-execution flakiness analysis rather than treating every failure as a blocker.

**3. Isolate environment dependencies.** If ContextQA consistently classifies failures as `ENVIRONMENT_ISSUE` for a specific test case, the test may be hitting a dependency that is unreliable in your staging environment. Use a dedicated test environment or mock the external dependency.

## Retry configuration and the recovery action

The **Recovery Action** in test plan settings controls what ContextQA does when a test case fails mid-plan:

| Recovery Action     | Behavior                                                    |
| ------------------- | ----------------------------------------------------------- |
| `Stop`              | Halt plan execution immediately on first failure            |
| `Run_Next_Testcase` | Mark the failed case and continue executing remaining cases |

`Run_Next_Testcase` is recommended for suites with known flaky tests. It allows the plan to complete, giving you a full picture of which failures are consistent and which are intermittent. After the run, use the failure category data from the Analytics Dashboard to triage.

## Frequently Asked Questions

### How many runs does ContextQA need before it can identify a flaky test?

ContextQA can classify a single failure as `FLAKY_FAILURE` on the first occurrence if the AI reasoning from the HAR, console log, and DOM snapshot is consistent with timing or environment variability. The **Consistently Failing Tests** widget in the Analytics Dashboard requires at least two runs to establish a pattern, but single-run classification is always available via the `get_root_cause` MCP tool.

### Does ContextQA automatically retry flaky tests?

ContextQA does not automatically retry individual test cases within an execution. The platform classifies failures and surfaces them for human action. Automatic retries can mask genuine regressions, so ContextQA's design philosophy is to classify and report rather than silently re-run. Use the retry logic in your CI pipeline if you want to re-run an entire plan.

### Can I mark a test case as "known flaky" to suppress notifications?

There is no explicit "known flaky" flag on test cases in the current UI. The recommended approach is to use the failure category data from `get_root_cause` in your CI integration to suppress notifications for `FLAKY_FAILURE` classifications while still alerting on `APPLICATION_BUG`.

## Related

* [Failure analysis report](https://learning.contextqa.com/reporting/failure-analysis)
* [Analytics dashboard](https://learning.contextqa.com/reporting/analytics-dashboard)
* [Test results](https://learning.contextqa.com/reporting/test-results)
* [Video recording and screenshots](https://learning.contextqa.com/execution/video-and-screenshots)
* [Exporting reports](https://learning.contextqa.com/reporting/exporting-reports)

{% hint style="info" %}
**Get release readiness reports your stakeholders understand.** [**Book a Demo →**](https://contextqa.com/book-a-demo/) — See the analytics dashboard, failure analysis, and flaky test detection for your test suite.
{% endhint %}
