Platform Architecture
A detailed explanation of ContextQA's 9-stage AI execution pipeline, its 13+ specialized agents, infrastructure components, and the MCP server that connects the platform to external AI assistants.
Who is this for? Engineers and technical leads who want to understand ContextQA's 9-stage AI pipeline, infrastructure components, and how evidence is captured during execution.
ContextQA executes tests through a 9-stage agentic pipeline in which 13 or more specialized AI agents collaborate to navigate, interact, verify, and repair tests against real browsers and devices. This page explains how each stage works, how the agents communicate, and how the supporting infrastructure stores and serves the evidence those agents produce.
Understanding the architecture helps you write better tests, interpret execution results accurately, and troubleshoot failures more effectively.
Prerequisites
You have read the Introduction and Core Concepts.
You understand the difference between a test case, a test suite, and a test plan.
The 9-Stage AI Execution Pipeline
Every test execution — whether triggered manually, via CI/CD, or by a schedule — passes through all nine stages in order. Stages run sequentially per step but the platform can run multiple test cases in parallel across browser instances.
Stage 1: Navigation
The pipeline begins by launching a browser instance and navigating to the starting URL defined in the test case. The navigation agent establishes the initial page context, waits for the page to reach a stable "load complete" state (using both the browser's load event and an AI-assessed visual stability check), and hands off control to the next stage.
If the navigation fails — for example, because the URL is unreachable or redirects to an error page — the execution terminates immediately with a root cause entry explaining the navigation failure.
Stage 2: Element Discovery
Before executing any step, the element discovery agent performs a full scan of the current page. It combines two complementary signals:
DOM analysis — the agent inspects the HTML structure, ARIA roles,
data-testidattributes, input types, form labels, and button text to build an element inventory.Visual analysis — a screenshot is analyzed using computer vision to identify interactive regions, buttons, form fields, navigation menus, and modal dialogs that may not have clean DOM representations.
The element inventory is used by the step execution agent to match natural language step descriptions to real page elements. This dual-signal approach handles both well-structured SPAs and legacy applications with inconsistent DOM hygiene.
Stage 3: Step Execution
The step execution agent interprets each natural language step and performs the corresponding browser action. The agent:
Reads the step description (e.g., "Type [email protected] in the Email field").
Consults the element inventory from Stage 2 to locate the target element.
Executes the action via the browser automation layer (click, type, select, scroll, hover, drag-and-drop).
Waits for the browser to stabilize after the action (dynamic content settling, navigation completing, network requests resolving).
Updates the element inventory for the next step.
The step execution agent handles implicit waits automatically. You do not need to add sleep or wait steps for standard navigation and form interactions.
Stage 4: Screenshot Capture
After every step execution (regardless of pass or fail), the screenshot capture agent saves a full-page screenshot to object storage. Screenshots are:
Stored in S3 with a unique key per step per execution.
Linked to the step result record in the database.
Accessible in the execution report immediately after the step completes.
Available for 90 days by default (configurable per workspace).
Screenshot capture can be configured per step: Always (default), On Failure Only, or Never.
Stage 5: Network Monitoring
The network monitoring agent runs continuously throughout execution, intercepting all HTTP and HTTPS traffic produced by the browser. For every request it captures:
Request method, URL, headers, and body
Response status code, headers, and body
Timing data (DNS resolution, connection, TLS handshake, time-to-first-byte, download duration)
The full HAR (HTTP Archive) log is attached to the execution record and viewable in the results report. This is particularly useful for diagnosing failures caused by API errors, authentication token expiry, or third-party service outages that are not visible in the UI screenshots.
Stage 6: Console Monitoring
The console monitoring agent captures every message written to the browser console during execution: console.log, console.warn, console.error, and unhandled JavaScript exceptions. Console entries are timestamped and correlated with the step that was executing when they were emitted.
For applications that log structured diagnostic data to the console, this stage often reveals the root cause of a failure faster than the screenshots alone.
Stage 7: AI Verification
When a step is an assertion ("verify the welcome message is displayed", "confirm the order total is correct"), the AI verification agent processes the step differently from an action step.
The verification agent:
Takes the current screenshot.
Reads the natural language verification condition.
Uses a vision-language model to evaluate whether the condition is satisfied in the screenshot.
Produces a boolean result (pass/fail) and a confidence score.
Adds a natural language explanation of what was observed to the step result.
This approach handles dynamic, unpredictable content that cannot be verified with hard-coded assertions. For example, "verify a unique order ID was generated and displayed" does not require knowing the exact order ID — the AI verifies that a plausible order ID is present in the expected location.
For deterministic assertions (exact string matching, specific numeric values), the agent performs the string comparison directly rather than relying solely on visual interpretation.
Stage 8: Self-Healing
The self-healing agent activates when a step execution fails because an expected element was not found at its known location.
When a locator fails, the self-healing agent analyzes the current page state and finds the best matching element using AI.
Self-healing events are non-destructive: the healed locator is applied persistently to the test case going forward, so future runs do not need to heal the same step again.
Stage 9: Evidence Collection
After all steps have executed, the evidence collection agent compiles the final execution record:
Video recording — the full browser session stitched from the frame captures taken during execution, encoded as an MP4.
Playwright trace — a structured trace file compatible with Playwright's trace viewer, containing DOM snapshots, action timeline, and network log for deep debugging.
Root cause analysis — for any failed steps, the AI generates a structured root cause analysis that identifies the likely cause of the failure, distinguishes flakiness (transient network issues, race conditions) from genuine application bugs, and suggests corrective actions.
Execution summary — overall pass/fail, duration, step count, browser/device used, environment.
All evidence assets are stored in S3 and linked to the execution record. The execution report page in the portal assembles all of this data into a readable, navigable view.
Infrastructure Components
Frontend
The ContextQA web portal is an Angular TypeScript single-page application. It communicates with the backend via REST APIs and WebSocket connections (for live execution streaming). The portal handles workspace management, test case authoring, execution triggering, report viewing, and administration.
AI Engine
The AI engine is a multi-agent pipeline comprising 13 or more specialized agents. Agents are implemented as fine-tuned and prompted language models and vision-language models, orchestrated by a central pipeline coordinator. Agent specializations include:
Navigation Agent
URL navigation, page load stabilization
Element Discovery Agent
DOM and visual element inventory
Step Interpretation Agent
Natural language to browser action mapping
Verification Agent
AI-powered assertion evaluation
Self-Healing Agent
Broken locator detection and repair
Root Cause Analysis Agent
Failure diagnosis and explanation
Test Generation Agent
Creating test steps from natural language tasks
Code Generation Agent
Exporting tests as Playwright/Selenium code
Figma Analysis Agent
Extracting test scenarios from design files
Ticket Analysis Agent
Extracting test scenarios from Jira/Linear tickets
Video Analysis Agent
Extracting test scenarios from screen recordings
API Analysis Agent
Extracting test scenarios from Swagger/OpenAPI specs
Performance Analysis Agent
Load test result interpretation
MCP Server
The ContextQA MCP Server is a Python application built with the FastMCP framework. It exposes 67 platform capabilities as callable tools following the Model Context Protocol specification.
The server accepts connections from any MCP-compatible client (Claude Desktop, Cursor, custom integrations) and translates tool calls into authenticated requests to the ContextQA backend API.
Key tool categories exposed by the MCP server:
Test authoring
create_test_case, update_test_case_step, create_complex_test_step
Execution
execute_test_case, execute_test_suite, execute_test_plan
Results retrieval
get_execution_status, get_test_step_results, get_test_case_results
Evidence
get_console_logs, get_network_logs, get_trace_url
Self-healing
get_auto_healing_suggestions, approve_auto_healing
AI insights
get_ai_insights, get_root_cause, investigate_failure
Test generation
generate_tests_from_jira_ticket, generate_tests_from_figma, generate_tests_from_swagger
The MCP server source code is open source and available in the ContextQA GitHub organization.
Storage
Object Storage (S3) — screenshots, video recordings, Playwright traces, HAR files. Assets are stored per execution with a structured key format:
{workspace_id}/{execution_id}/{step_index}/screenshot.png.Relational Database — test case definitions, execution records, step results, workspace configuration, user accounts.
Secrets Storage — environment parameter values of type
passwordare stored encrypted at rest using AES-256 and are never returned in plain text through any API endpoint.
Authentication
ContextQA uses session-based authentication for the web portal. Users authenticate with email and password (or SSO via SAML 2.0 for enterprise accounts). API access and MCP server connections use API keys scoped to a workspace, generated in Settings → API Keys.
Execution Flow: End to End
The following trace shows what happens from the moment you click Run on a test case to the moment the report is available:
Tips & Best Practices
Understand Stage 7 for better assertions — The AI verification agent is powerful but works best with clear, observable conditions. "Verify the page title says Welcome, John" is more reliable than "Verify the user logged in successfully" because the former specifies an observable UI element.
Use network logs for API failures — If your test fails on a page action but the screenshot shows the UI looks correct, check the network log. A 401 or 500 response from an API call is often the real cause.
HAR logs contain sensitive data — Network logs capture request and response bodies, which may include tokens, session cookies, or user data. Manage access to execution reports accordingly.
Self-healing is conservative by design — The confidence threshold is intentional. A lower threshold would cause incorrect healings that mask real failures. If a step consistently fails the self-healing check, it is a signal that the application change is significant enough to warrant a manual test review.
Troubleshooting
Screenshots are not loading in the report S3 asset retrieval uses pre-signed URLs that expire after 1 hour. If you are viewing a report after the URL expiry, refresh the page to generate new pre-signed URLs.
The live execution view is not updating in real time The live view uses a WebSocket connection. If you are behind a proxy or corporate firewall that terminates WebSockets, the view will fall back to polling, which updates every 5 seconds. Check with your network administrator if the fallback is noticeably delayed.
Root cause analysis says "Analysis unavailable" Root cause analysis requires at least one failed step with a screenshot. If the execution failed before any screenshot was captured (e.g., Stage 1 navigation failure due to a network timeout), the AI has insufficient evidence to generate analysis. The raw error message from the browser is still shown in the step result.
The MCP server is not connecting Verify that the API key in your MCP client configuration belongs to the correct workspace and has not been revoked. API keys can be managed in Settings → API Keys.
Related Pages
Create your first test in 5 minutes — no code required. Start Free Trial → — Or Book a Demo → to see ContextQA with your application.
Last updated
Was this helpful?