Autonomous Agent Pipeline
Who is this for? All roles — particularly engineers and QA leads who want to understand how ContextQA's AI pipeline executes tests, captures evidence, and self-heals broken steps.
When you run a ContextQA test case, an autonomous AI pipeline takes over. You do not run a script — you have a natural language description of what should happen, and ContextQA's AI pipeline figures out how to make it happen in a real browser.
This page explains each stage of that pipeline, what it does, and how it produces the evidence artifacts you see in test reports.
Overview
Every test execution goes through the same pipeline in sequence:
Stages 6 and 7 run in parallel with the main pipeline throughout the entire execution. Stages 8 and 9 are conditional — they activate only when needed.
Stage 1: Navigation Agent
The pipeline begins with the Navigation Agent, which opens a clean browser session and navigates to the test case URL.
What it does:
Launches a headless Chromium (or specified browser) instance in a clean profile — no cookies, no cached state from previous runs
Navigates to the test case URL
Handles HTTP redirects automatically (301, 302, 307, 308)
Waits for the page to reach a stable loaded state — specifically, waits for the
loadevent and for network activity to go idle (no pending requests for 500ms)Captures the initial screenshot after the page is fully loaded
Records the actual final URL (post-redirect) as the starting point
Why this matters: A clean browser session ensures test isolation. Each execution starts from the same known state regardless of what ran before it. If your application requires a logged-in starting state, the test steps themselves must include the login flow, or you can use a pre-requisite test case that establishes the session.
Stage 2: Element Discovery Agent
Before executing any steps, the pipeline builds a comprehensive map of every interactive element on the current page.
What it does:
Performs a full DOM traversal to identify all interactive elements:
button,input,select,textarea,a, and elements with click handlersApplies visual AI to the screenshot to identify elements that may not be semantically marked up correctly (e.g., a
divstyled as a button)Analyzes the page to identify all interactive elements and prepares the AI agent for the next action.
Stores the element map in memory for use by the Action Executor in Stage 4
The element map is refreshed automatically after any action that triggers a navigation or significant DOM change (detected by watching MutationObserver events).
Stage 3: Intent Parser
The Intent Parser processes each natural language step and converts it into a structured action specification.
What it does:
Reads the NLP step description (e.g.,
"Click the Submit button")Classifies the action type:
navigate,click,type,select,scroll,hover,assert,wait,api_callIdentifies the target element reference from the step (e.g.,
"Submit button")Extracts any data values embedded in the step (e.g.,
"type '[email protected]' into the Email field")Resolves variable references if test data profiles are in use (e.g.,
"{{username}}"→ the actual value from the data row)Produces a structured action object:
Action types recognized:
click— single click on an elementdouble_click— double clickright_click— right-click (context menu)type— type text into a focused inputclear_and_type— clear existing value and type new textselect— choose an option from a dropdowncheck/uncheck— checkbox stateupload— file upload inputscroll— scroll the page or an elementhover— hover over an element (for tooltips, dropdowns)navigate— navigate to a URLassert— verify a condition without performing an actionwait— explicit wait for a condition or timeapi_call— make an HTTP request and assert the response
Why this matters: Separating intent parsing from action execution allows the pipeline to validate the full step sequence before beginning execution, flag any steps that are ambiguous or unrecognizable, and optimize the execution order where steps can be parallelized.
Stage 4: Action Executor
The Action Executor performs the actual browser action using Playwright's automation API, guided by the structured action from Stage 3 and the element map from Stage 2.
What it does:
Receives the structured action from the Intent Parser
Looks up the target element in the pre-built element map from Stage 2
Attempts to locate the element in the live DOM using the highest-confidence locator strategy
If found: performs the action (click, type, select, etc.)
If not found on first attempt: tries alternative locators from the element map
If all locators fail: activates the Self-Healing Agent (Stage 9)
Waits for the post-action state to stabilize before returning control
Locator fallback sequence:
data-testidattribute matchARIA label match
Exact visible text match
Partial visible text match
CSS selector from the element map
Visual match (screenshot-based element identification)
→ Self-Healing Agent if all above fail
Smart waiting: Rather than using fixed sleep() calls, the Action Executor waits for meaningful signals: network idle (for actions that trigger API calls), DOM stability (for actions that modify the page structure), or specific element visibility (for assertions). This makes tests faster and more reliable than fixed-delay approaches.
Stage 5: Screenshot Agent
A screenshot is captured automatically after every action in the pipeline.
What it does:
Takes a full-page screenshot immediately after each action completes and the page stabilizes
Generates a unique filename with the step number, action type, and timestamp
Uploads the screenshot to S3 storage and returns a publicly accessible URL
Also captures a screenshot on failure — specifically, captures the DOM state at the exact moment of failure, not just before the failing action
Screenshot types captured:
Pre-action: the page state before an action (for complex assertion steps)
Post-action: the page state after each action (stored as evidence for every step)
Failure screenshot: captured immediately when an error occurs
Assertion screenshot: for assertion steps, captures the specific region being verified when possible
Why this matters: Every test report in ContextQA shows a screenshot for each step. This means test failures are always visually evident — you can see exactly what the browser looked like when a step failed, which is far more useful than a stack trace or error message alone.
Stage 6: Network Monitor
The Network Monitor runs as a parallel process throughout the entire execution, not just during specific steps.
What it does:
Intercepts all outgoing HTTP/HTTPS requests made by the browser
Captures request details: method, URL, headers, request body
Captures response details: status code, headers, response body (up to 1MB)
Flags requests that return 4xx or 5xx status codes
Tracks API response times
Writes the complete log to HAR (HTTP Archive) format
Data captured per request:
Why this matters: Many test failures are caused by API errors, not UI problems. If a form submission fails silently (the page does not show an error but the data was not saved), the network log shows the failed API call immediately. This makes the HAR log one of the most valuable debugging artifacts for modern single-page applications.
The network log is accessible via the get_network_logs MCP tool after execution.
Stage 7: Console Monitor
The Console Monitor also runs in parallel throughout the entire execution, capturing all browser console output.
What it does:
Intercepts all console events:
console.log,console.warn,console.error,console.info,console.debugCaptures unhandled JavaScript exceptions and promise rejections
Records the console message text, the originating file, and the line number
Timestamps each entry to correlate with the execution timeline
Error types captured:
JavaScript runtime errors (
TypeError,ReferenceError, etc.)Unhandled promise rejections
Application-level error logs
Warning messages from frameworks (React, Angular, Vue deprecation warnings)
Custom application logging
Why this matters: Frontend applications often log errors to the console when something goes wrong — errors that are invisible to the end user and do not cause obvious UI failures. A test that "passes" but generates multiple console errors may indicate latent bugs. The console log gives QA engineers visibility into JavaScript health that screenshot-based testing cannot provide.
The console log is accessible via the get_console_logs MCP tool after execution.
Stage 8: Verification Agent
The Verification Agent is activated specifically for assertion steps — steps that check that something is (or is not) present, visible, or in a particular state.
What it does:
Receives the assertion description from the Intent Parser (e.g.,
"verify the success message is visible")Takes a screenshot of the current page state
Analyzes the screenshot + DOM state against the natural language assertion condition
Returns a pass/fail determination with a reasoning explanation
Assertion types handled:
Element visibility
"verify the Submit button is visible"
Element absence
"verify the error message is not displayed"
Text content
"verify the page title says 'Dashboard'"
Text contains
"verify the success banner contains 'Order placed'"
URL verification
"verify the URL contains '/dashboard'"
Count assertion
"verify at least 3 items appear in the list"
Input value
"verify the email field contains '[email protected]'"
Checkbox state
"verify the Terms checkbox is checked"
Element enabled/disabled
"verify the Submit button is enabled"
Visual state
"verify the status indicator is green"
AI-powered assertions: For assertions that are difficult to express as DOM queries (e.g., "verify the chart shows an upward trend"), the Verification Agent uses visual AI to analyze the screenshot directly. It returns a confidence score alongside the pass/fail result. Assertions below 0.70 confidence are flagged in the report for human review.
Why this matters: Natural language assertions are far more maintainable than XPath assertions like //div[@class='alert alert-success']/span[contains(text(), 'Order')]. When the implementation changes (different CSS class, different DOM structure), the natural language assertion still works as long as the visible state is correct.
Stage 9: Self-Healing Agent
When the Action Executor cannot find the target element using any of its locator strategies, it activates the Self-Healing Agent.
What it does:
Receives the step description and the failed locator information
Takes a fresh screenshot of the current page state
Searches the current DOM and screenshot for elements that are visually or semantically similar to the target
Evaluates candidate elements to find the best match for the original test step.
Selects the best candidate if it meets the confidence threshold
Healing outcomes:
High
Heal and continue: performs the action using the new locator, marks step as auto-healed
Medium
Heal with warning: performs the action but flags the step for review in the report
Low
Fail: marks the step as FAILED, shows the best candidate as a suggested fix
What counts as a healing:
Button text changed (e.g., "Save" → "Save Changes") — the text similarity matcher finds it
Element moved to a different position on the page — the visual AI locates it in the new position
CSS class renamed — the visual match finds the same visual element with the new class
Element wrapped in an additional container — the DOM traversal finds it at a deeper path
What cannot be healed:
Element genuinely removed from the page (intentionally deleted from the UI)
Functionality moved to a completely different page
Element hidden behind authentication or permissions that the test user lacks
Complete page redesign where no equivalent element exists
Reviewing healed steps: In the test report, auto-healed steps are marked with a healing indicator. Click the step to see the original locator, the new locator, and the confidence level. You can accept the healing (which updates the test case definition permanently) or reject it (which reverts the step and marks it for manual review).
Healings can also be reviewed and applied via the MCP tools get_auto_healing_suggestions and approve_auto_healing.
Evidence Package
Every execution produces a complete set of artifacts regardless of whether the test passed or failed:
Per-Step Screenshots
.jpg images captured after every action. Stored in S3 with public URLs, retained for 90 days. Viewable in the test report step-by-step view or accessible via get_execution_step_details.
Full Session Video
A .webm video recording of the entire browser session from first navigation to last action. The video is synchronized with the step timeline in the report — click any step to jump to that moment in the video. Retained for 30 days.
Playwright Trace File
A .zip binary file in Playwright's trace format. Contains:
Complete DOM snapshots before and after every action
All network requests with full request/response data
Console output synchronized with the action timeline
Screenshots at every step
Viewable by uploading to trace.playwright.dev — no installation required. This is the deepest debugging artifact available. Access the URL via get_trace_url.
HAR Network Log
A JSON file in HAR (HTTP Archive) format containing every network request and response made during the session. Import into Chrome DevTools, Fiddler, or any HAR viewer for network analysis. Access via get_network_logs.
Browser Console Log
A JSON array of all console events: errors, warnings, info messages, and JavaScript exceptions. Access via get_console_logs.
AI Reasoning Log
A JSON structure containing the AI's decision-making trace for every step: which element was targeted and what actions were taken. Access via get_ai_reasoning. Useful for diagnosing flaky tests where the AI sometimes makes different decisions.
Execution Infrastructure
Tests run on managed cloud infrastructure:
Browser: Chromium (default), Firefox, or WebKit (Safari)
OS: Linux (Ubuntu 22.04) for browser tests; iOS/Android device farm for mobile
Isolation: Each execution gets a clean container with a fresh browser profile
Concurrency: Multiple tests can run in parallel (subject to your plan's concurrency limits)
Timeouts: Default step timeout is 30 seconds; default total execution timeout is 30 minutes. Both are configurable per test case.
Geolocation: Tests can be run from specific geographic regions if latency or geo-based routing is relevant to your tests
How to Interpret a Failure
When a test fails, use the evidence package to diagnose it:
Start with the failure screenshot — it shows the browser state at the exact moment of failure. Often the cause is visually obvious: a modal blocking the target element, a validation error preventing form submission, or a loading spinner that never resolved.
Check the network log — if the screenshot looks correct but the test still failed, check whether an API call returned an error. A 500 on a form submission is often the real root cause.
Check the console log — if both the screenshot and network log look clean, a JavaScript exception may have broken client-side logic without any visible UI error.
Use AI root cause analysis — call
get_root_causevia MCP or click the AI Insights button in the UI. The AI correlates all evidence sources and provides a plain-English explanation.Open the Playwright trace — for complex failures where you need DOM-level detail, the trace file shows the exact DOM state before and after every action. Use it to verify that an element was in the expected state when an action was attempted.
70% less human effort with AI test generation and self-healing. Book a Demo → — See AI generate, execute, and maintain tests for your application.
Last updated
Was this helpful?