Autonomous Agent Pipeline

circle-info

Who is this for? All roles — particularly engineers and QA leads who want to understand how ContextQA's AI pipeline executes tests, captures evidence, and self-heals broken steps.

When you run a ContextQA test case, an autonomous AI pipeline takes over. You do not run a script — you have a natural language description of what should happen, and ContextQA's AI pipeline figures out how to make it happen in a real browser.

This page explains each stage of that pipeline, what it does, and how it produces the evidence artifacts you see in test reports.


Overview

Every test execution goes through the same pipeline in sequence:

Stages 6 and 7 run in parallel with the main pipeline throughout the entire execution. Stages 8 and 9 are conditional — they activate only when needed.


Stage 1: Navigation Agent

The pipeline begins with the Navigation Agent, which opens a clean browser session and navigates to the test case URL.

What it does:

  • Launches a headless Chromium (or specified browser) instance in a clean profile — no cookies, no cached state from previous runs

  • Navigates to the test case URL

  • Handles HTTP redirects automatically (301, 302, 307, 308)

  • Waits for the page to reach a stable loaded state — specifically, waits for the load event and for network activity to go idle (no pending requests for 500ms)

  • Captures the initial screenshot after the page is fully loaded

  • Records the actual final URL (post-redirect) as the starting point

Why this matters: A clean browser session ensures test isolation. Each execution starts from the same known state regardless of what ran before it. If your application requires a logged-in starting state, the test steps themselves must include the login flow, or you can use a pre-requisite test case that establishes the session.


Stage 2: Element Discovery Agent

Before executing any steps, the pipeline builds a comprehensive map of every interactive element on the current page.

What it does:

  • Performs a full DOM traversal to identify all interactive elements: button, input, select, textarea, a, and elements with click handlers

  • Applies visual AI to the screenshot to identify elements that may not be semantically marked up correctly (e.g., a div styled as a button)

  • Analyzes the page to identify all interactive elements and prepares the AI agent for the next action.

  • Stores the element map in memory for use by the Action Executor in Stage 4

The element map is refreshed automatically after any action that triggers a navigation or significant DOM change (detected by watching MutationObserver events).


Stage 3: Intent Parser

The Intent Parser processes each natural language step and converts it into a structured action specification.

What it does:

  • Reads the NLP step description (e.g., "Click the Submit button")

  • Classifies the action type: navigate, click, type, select, scroll, hover, assert, wait, api_call

  • Identifies the target element reference from the step (e.g., "Submit button")

  • Extracts any data values embedded in the step (e.g., "type '[email protected]' into the Email field")

  • Resolves variable references if test data profiles are in use (e.g., "{{username}}" → the actual value from the data row)

  • Produces a structured action object:

Action types recognized:

  • click — single click on an element

  • double_click — double click

  • right_click — right-click (context menu)

  • type — type text into a focused input

  • clear_and_type — clear existing value and type new text

  • select — choose an option from a dropdown

  • check / uncheck — checkbox state

  • upload — file upload input

  • scroll — scroll the page or an element

  • hover — hover over an element (for tooltips, dropdowns)

  • navigate — navigate to a URL

  • assert — verify a condition without performing an action

  • wait — explicit wait for a condition or time

  • api_call — make an HTTP request and assert the response

Why this matters: Separating intent parsing from action execution allows the pipeline to validate the full step sequence before beginning execution, flag any steps that are ambiguous or unrecognizable, and optimize the execution order where steps can be parallelized.


Stage 4: Action Executor

The Action Executor performs the actual browser action using Playwright's automation API, guided by the structured action from Stage 3 and the element map from Stage 2.

What it does:

  1. Receives the structured action from the Intent Parser

  2. Looks up the target element in the pre-built element map from Stage 2

  3. Attempts to locate the element in the live DOM using the highest-confidence locator strategy

  4. If found: performs the action (click, type, select, etc.)

  5. If not found on first attempt: tries alternative locators from the element map

  6. If all locators fail: activates the Self-Healing Agent (Stage 9)

  7. Waits for the post-action state to stabilize before returning control

Locator fallback sequence:

  1. data-testid attribute match

  2. ARIA label match

  3. Exact visible text match

  4. Partial visible text match

  5. CSS selector from the element map

  6. Visual match (screenshot-based element identification)

  7. → Self-Healing Agent if all above fail

Smart waiting: Rather than using fixed sleep() calls, the Action Executor waits for meaningful signals: network idle (for actions that trigger API calls), DOM stability (for actions that modify the page structure), or specific element visibility (for assertions). This makes tests faster and more reliable than fixed-delay approaches.


Stage 5: Screenshot Agent

A screenshot is captured automatically after every action in the pipeline.

What it does:

  • Takes a full-page screenshot immediately after each action completes and the page stabilizes

  • Generates a unique filename with the step number, action type, and timestamp

  • Uploads the screenshot to S3 storage and returns a publicly accessible URL

  • Also captures a screenshot on failure — specifically, captures the DOM state at the exact moment of failure, not just before the failing action

Screenshot types captured:

  • Pre-action: the page state before an action (for complex assertion steps)

  • Post-action: the page state after each action (stored as evidence for every step)

  • Failure screenshot: captured immediately when an error occurs

  • Assertion screenshot: for assertion steps, captures the specific region being verified when possible

Why this matters: Every test report in ContextQA shows a screenshot for each step. This means test failures are always visually evident — you can see exactly what the browser looked like when a step failed, which is far more useful than a stack trace or error message alone.


Stage 6: Network Monitor

The Network Monitor runs as a parallel process throughout the entire execution, not just during specific steps.

What it does:

  • Intercepts all outgoing HTTP/HTTPS requests made by the browser

  • Captures request details: method, URL, headers, request body

  • Captures response details: status code, headers, response body (up to 1MB)

  • Flags requests that return 4xx or 5xx status codes

  • Tracks API response times

  • Writes the complete log to HAR (HTTP Archive) format

Data captured per request:

Why this matters: Many test failures are caused by API errors, not UI problems. If a form submission fails silently (the page does not show an error but the data was not saved), the network log shows the failed API call immediately. This makes the HAR log one of the most valuable debugging artifacts for modern single-page applications.

The network log is accessible via the get_network_logs MCP tool after execution.


Stage 7: Console Monitor

The Console Monitor also runs in parallel throughout the entire execution, capturing all browser console output.

What it does:

  • Intercepts all console events: console.log, console.warn, console.error, console.info, console.debug

  • Captures unhandled JavaScript exceptions and promise rejections

  • Records the console message text, the originating file, and the line number

  • Timestamps each entry to correlate with the execution timeline

Error types captured:

  • JavaScript runtime errors (TypeError, ReferenceError, etc.)

  • Unhandled promise rejections

  • Application-level error logs

  • Warning messages from frameworks (React, Angular, Vue deprecation warnings)

  • Custom application logging

Why this matters: Frontend applications often log errors to the console when something goes wrong — errors that are invisible to the end user and do not cause obvious UI failures. A test that "passes" but generates multiple console errors may indicate latent bugs. The console log gives QA engineers visibility into JavaScript health that screenshot-based testing cannot provide.

The console log is accessible via the get_console_logs MCP tool after execution.


Stage 8: Verification Agent

The Verification Agent is activated specifically for assertion steps — steps that check that something is (or is not) present, visible, or in a particular state.

What it does:

  1. Receives the assertion description from the Intent Parser (e.g., "verify the success message is visible")

  2. Takes a screenshot of the current page state

  3. Analyzes the screenshot + DOM state against the natural language assertion condition

  4. Returns a pass/fail determination with a reasoning explanation

Assertion types handled:

Assertion Pattern
Example

Element visibility

"verify the Submit button is visible"

Element absence

"verify the error message is not displayed"

Text content

"verify the page title says 'Dashboard'"

Text contains

"verify the success banner contains 'Order placed'"

URL verification

"verify the URL contains '/dashboard'"

Count assertion

"verify at least 3 items appear in the list"

Input value

"verify the email field contains '[email protected]'"

Checkbox state

"verify the Terms checkbox is checked"

Element enabled/disabled

"verify the Submit button is enabled"

Visual state

"verify the status indicator is green"

AI-powered assertions: For assertions that are difficult to express as DOM queries (e.g., "verify the chart shows an upward trend"), the Verification Agent uses visual AI to analyze the screenshot directly. It returns a confidence score alongside the pass/fail result. Assertions below 0.70 confidence are flagged in the report for human review.

Why this matters: Natural language assertions are far more maintainable than XPath assertions like //div[@class='alert alert-success']/span[contains(text(), 'Order')]. When the implementation changes (different CSS class, different DOM structure), the natural language assertion still works as long as the visible state is correct.


Stage 9: Self-Healing Agent

When the Action Executor cannot find the target element using any of its locator strategies, it activates the Self-Healing Agent.

What it does:

  1. Receives the step description and the failed locator information

  2. Takes a fresh screenshot of the current page state

  3. Searches the current DOM and screenshot for elements that are visually or semantically similar to the target

  4. Evaluates candidate elements to find the best match for the original test step.

  5. Selects the best candidate if it meets the confidence threshold

Healing outcomes:

Confidence Level
Action Taken

High

Heal and continue: performs the action using the new locator, marks step as auto-healed

Medium

Heal with warning: performs the action but flags the step for review in the report

Low

Fail: marks the step as FAILED, shows the best candidate as a suggested fix

What counts as a healing:

  • Button text changed (e.g., "Save" → "Save Changes") — the text similarity matcher finds it

  • Element moved to a different position on the page — the visual AI locates it in the new position

  • CSS class renamed — the visual match finds the same visual element with the new class

  • Element wrapped in an additional container — the DOM traversal finds it at a deeper path

What cannot be healed:

  • Element genuinely removed from the page (intentionally deleted from the UI)

  • Functionality moved to a completely different page

  • Element hidden behind authentication or permissions that the test user lacks

  • Complete page redesign where no equivalent element exists

Reviewing healed steps: In the test report, auto-healed steps are marked with a healing indicator. Click the step to see the original locator, the new locator, and the confidence level. You can accept the healing (which updates the test case definition permanently) or reject it (which reverts the step and marks it for manual review).

Healings can also be reviewed and applied via the MCP tools get_auto_healing_suggestions and approve_auto_healing.


Evidence Package

Every execution produces a complete set of artifacts regardless of whether the test passed or failed:

Per-Step Screenshots

.jpg images captured after every action. Stored in S3 with public URLs, retained for 90 days. Viewable in the test report step-by-step view or accessible via get_execution_step_details.

Full Session Video

A .webm video recording of the entire browser session from first navigation to last action. The video is synchronized with the step timeline in the report — click any step to jump to that moment in the video. Retained for 30 days.

Playwright Trace File

A .zip binary file in Playwright's trace format. Contains:

  • Complete DOM snapshots before and after every action

  • All network requests with full request/response data

  • Console output synchronized with the action timeline

  • Screenshots at every step

Viewable by uploading to trace.playwright.devarrow-up-right — no installation required. This is the deepest debugging artifact available. Access the URL via get_trace_url.

HAR Network Log

A JSON file in HAR (HTTP Archive) format containing every network request and response made during the session. Import into Chrome DevTools, Fiddler, or any HAR viewer for network analysis. Access via get_network_logs.

Browser Console Log

A JSON array of all console events: errors, warnings, info messages, and JavaScript exceptions. Access via get_console_logs.

AI Reasoning Log

A JSON structure containing the AI's decision-making trace for every step: which element was targeted and what actions were taken. Access via get_ai_reasoning. Useful for diagnosing flaky tests where the AI sometimes makes different decisions.


Execution Infrastructure

Tests run on managed cloud infrastructure:

  • Browser: Chromium (default), Firefox, or WebKit (Safari)

  • OS: Linux (Ubuntu 22.04) for browser tests; iOS/Android device farm for mobile

  • Isolation: Each execution gets a clean container with a fresh browser profile

  • Concurrency: Multiple tests can run in parallel (subject to your plan's concurrency limits)

  • Timeouts: Default step timeout is 30 seconds; default total execution timeout is 30 minutes. Both are configurable per test case.

  • Geolocation: Tests can be run from specific geographic regions if latency or geo-based routing is relevant to your tests


How to Interpret a Failure

When a test fails, use the evidence package to diagnose it:

  1. Start with the failure screenshot — it shows the browser state at the exact moment of failure. Often the cause is visually obvious: a modal blocking the target element, a validation error preventing form submission, or a loading spinner that never resolved.

  2. Check the network log — if the screenshot looks correct but the test still failed, check whether an API call returned an error. A 500 on a form submission is often the real root cause.

  3. Check the console log — if both the screenshot and network log look clean, a JavaScript exception may have broken client-side logic without any visible UI error.

  4. Use AI root cause analysis — call get_root_cause via MCP or click the AI Insights button in the UI. The AI correlates all evidence sources and provides a plain-English explanation.

  5. Open the Playwright trace — for complex failures where you need DOM-level detail, the trace file shows the exact DOM state before and after every action. Use it to verify that an element was in the expected state when an action was attempted.

circle-info

70% less human effort with AI test generation and self-healing. Book a Demo →arrow-up-right — See AI generate, execute, and maintain tests for your application.

Last updated

Was this helpful?