Platform Architecture

A detailed explanation of ContextQA's 9-stage AI execution pipeline, its 13+ specialized agents, infrastructure components, and the MCP server that connects the platform to external AI assistants.

circle-info

Who is this for? Engineers and technical leads who want to understand ContextQA's 9-stage AI pipeline, infrastructure components, and how evidence is captured during execution.

ContextQA executes tests through a 9-stage agentic pipeline in which 13 or more specialized AI agents collaborate to navigate, interact, verify, and repair tests against real browsers and devices. This page explains how each stage works, how the agents communicate, and how the supporting infrastructure stores and serves the evidence those agents produce.

Understanding the architecture helps you write better tests, interpret execution results accurately, and troubleshoot failures more effectively.

Prerequisites

  • You have read the Introduction and Core Concepts.

  • You understand the difference between a test case, a test suite, and a test plan.


The 9-Stage AI Execution Pipeline

Every test execution — whether triggered manually, via CI/CD, or by a schedule — passes through all nine stages in order. Stages run sequentially per step but the platform can run multiple test cases in parallel across browser instances.

Stage 1: Navigation

The pipeline begins by launching a browser instance and navigating to the starting URL defined in the test case. The navigation agent establishes the initial page context, waits for the page to reach a stable "load complete" state (using both the browser's load event and an AI-assessed visual stability check), and hands off control to the next stage.

If the navigation fails — for example, because the URL is unreachable or redirects to an error page — the execution terminates immediately with a root cause entry explaining the navigation failure.

Stage 2: Element Discovery

Before executing any step, the element discovery agent performs a full scan of the current page. It combines two complementary signals:

  • DOM analysis — the agent inspects the HTML structure, ARIA roles, data-testid attributes, input types, form labels, and button text to build an element inventory.

  • Visual analysis — a screenshot is analyzed using computer vision to identify interactive regions, buttons, form fields, navigation menus, and modal dialogs that may not have clean DOM representations.

The element inventory is used by the step execution agent to match natural language step descriptions to real page elements. This dual-signal approach handles both well-structured SPAs and legacy applications with inconsistent DOM hygiene.

Stage 3: Step Execution

The step execution agent interprets each natural language step and performs the corresponding browser action. The agent:

  1. Reads the step description (e.g., "Type [email protected] in the Email field").

  2. Consults the element inventory from Stage 2 to locate the target element.

  3. Executes the action via the browser automation layer (click, type, select, scroll, hover, drag-and-drop).

  4. Waits for the browser to stabilize after the action (dynamic content settling, navigation completing, network requests resolving).

  5. Updates the element inventory for the next step.

The step execution agent handles implicit waits automatically. You do not need to add sleep or wait steps for standard navigation and form interactions.

Stage 4: Screenshot Capture

After every step execution (regardless of pass or fail), the screenshot capture agent saves a full-page screenshot to object storage. Screenshots are:

  • Stored in S3 with a unique key per step per execution.

  • Linked to the step result record in the database.

  • Accessible in the execution report immediately after the step completes.

  • Available for 90 days by default (configurable per workspace).

Screenshot capture can be configured per step: Always (default), On Failure Only, or Never.

Stage 5: Network Monitoring

The network monitoring agent runs continuously throughout execution, intercepting all HTTP and HTTPS traffic produced by the browser. For every request it captures:

  • Request method, URL, headers, and body

  • Response status code, headers, and body

  • Timing data (DNS resolution, connection, TLS handshake, time-to-first-byte, download duration)

The full HAR (HTTP Archive) log is attached to the execution record and viewable in the results report. This is particularly useful for diagnosing failures caused by API errors, authentication token expiry, or third-party service outages that are not visible in the UI screenshots.

Stage 6: Console Monitoring

The console monitoring agent captures every message written to the browser console during execution: console.log, console.warn, console.error, and unhandled JavaScript exceptions. Console entries are timestamped and correlated with the step that was executing when they were emitted.

For applications that log structured diagnostic data to the console, this stage often reveals the root cause of a failure faster than the screenshots alone.

Stage 7: AI Verification

When a step is an assertion ("verify the welcome message is displayed", "confirm the order total is correct"), the AI verification agent processes the step differently from an action step.

The verification agent:

  1. Takes the current screenshot.

  2. Reads the natural language verification condition.

  3. Uses a vision-language model to evaluate whether the condition is satisfied in the screenshot.

  4. Produces a boolean result (pass/fail) and a confidence score.

  5. Adds a natural language explanation of what was observed to the step result.

This approach handles dynamic, unpredictable content that cannot be verified with hard-coded assertions. For example, "verify a unique order ID was generated and displayed" does not require knowing the exact order ID — the AI verifies that a plausible order ID is present in the expected location.

For deterministic assertions (exact string matching, specific numeric values), the agent performs the string comparison directly rather than relying solely on visual interpretation.

Stage 8: Self-Healing

The self-healing agent activates when a step execution fails because an expected element was not found at its known location.

When a locator fails, the self-healing agent analyzes the current page state and finds the best matching element using AI.

Self-healing events are non-destructive: the healed locator is applied persistently to the test case going forward, so future runs do not need to heal the same step again.

Stage 9: Evidence Collection

After all steps have executed, the evidence collection agent compiles the final execution record:

  • Video recording — the full browser session stitched from the frame captures taken during execution, encoded as an MP4.

  • Playwright trace — a structured trace file compatible with Playwright's trace viewer, containing DOM snapshots, action timeline, and network log for deep debugging.

  • Root cause analysis — for any failed steps, the AI generates a structured root cause analysis that identifies the likely cause of the failure, distinguishes flakiness (transient network issues, race conditions) from genuine application bugs, and suggests corrective actions.

  • Execution summary — overall pass/fail, duration, step count, browser/device used, environment.

All evidence assets are stored in S3 and linked to the execution record. The execution report page in the portal assembles all of this data into a readable, navigable view.


Infrastructure Components

Frontend

The ContextQA web portal is an Angular TypeScript single-page application. It communicates with the backend via REST APIs and WebSocket connections (for live execution streaming). The portal handles workspace management, test case authoring, execution triggering, report viewing, and administration.

AI Engine

The AI engine is a multi-agent pipeline comprising 13 or more specialized agents. Agents are implemented as fine-tuned and prompted language models and vision-language models, orchestrated by a central pipeline coordinator. Agent specializations include:

Agent
Responsibility

Navigation Agent

URL navigation, page load stabilization

Element Discovery Agent

DOM and visual element inventory

Step Interpretation Agent

Natural language to browser action mapping

Verification Agent

AI-powered assertion evaluation

Self-Healing Agent

Broken locator detection and repair

Root Cause Analysis Agent

Failure diagnosis and explanation

Test Generation Agent

Creating test steps from natural language tasks

Code Generation Agent

Exporting tests as Playwright/Selenium code

Figma Analysis Agent

Extracting test scenarios from design files

Ticket Analysis Agent

Extracting test scenarios from Jira/Linear tickets

Video Analysis Agent

Extracting test scenarios from screen recordings

API Analysis Agent

Extracting test scenarios from Swagger/OpenAPI specs

Performance Analysis Agent

Load test result interpretation

MCP Server

The ContextQA MCP Server is a Python application built with the FastMCP framework. It exposes 67 platform capabilities as callable tools following the Model Context Protocol specification.

The server accepts connections from any MCP-compatible client (Claude Desktop, Cursor, custom integrations) and translates tool calls into authenticated requests to the ContextQA backend API.

Key tool categories exposed by the MCP server:

Category
Example Tools

Test authoring

create_test_case, update_test_case_step, create_complex_test_step

Execution

execute_test_case, execute_test_suite, execute_test_plan

Results retrieval

get_execution_status, get_test_step_results, get_test_case_results

Evidence

get_console_logs, get_network_logs, get_trace_url

Self-healing

get_auto_healing_suggestions, approve_auto_healing

AI insights

get_ai_insights, get_root_cause, investigate_failure

Test generation

generate_tests_from_jira_ticket, generate_tests_from_figma, generate_tests_from_swagger

The MCP server source code is open source and available in the ContextQA GitHub organization.

Storage

  • Object Storage (S3) — screenshots, video recordings, Playwright traces, HAR files. Assets are stored per execution with a structured key format: {workspace_id}/{execution_id}/{step_index}/screenshot.png.

  • Relational Database — test case definitions, execution records, step results, workspace configuration, user accounts.

  • Secrets Storage — environment parameter values of type password are stored encrypted at rest using AES-256 and are never returned in plain text through any API endpoint.

Authentication

ContextQA uses session-based authentication for the web portal. Users authenticate with email and password (or SSO via SAML 2.0 for enterprise accounts). API access and MCP server connections use API keys scoped to a workspace, generated in Settings → API Keys.


Execution Flow: End to End

The following trace shows what happens from the moment you click Run on a test case to the moment the report is available:


Tips & Best Practices

  • Understand Stage 7 for better assertions — The AI verification agent is powerful but works best with clear, observable conditions. "Verify the page title says Welcome, John" is more reliable than "Verify the user logged in successfully" because the former specifies an observable UI element.

  • Use network logs for API failures — If your test fails on a page action but the screenshot shows the UI looks correct, check the network log. A 401 or 500 response from an API call is often the real cause.

  • HAR logs contain sensitive data — Network logs capture request and response bodies, which may include tokens, session cookies, or user data. Manage access to execution reports accordingly.

  • Self-healing is conservative by design — The confidence threshold is intentional. A lower threshold would cause incorrect healings that mask real failures. If a step consistently fails the self-healing check, it is a signal that the application change is significant enough to warrant a manual test review.

Troubleshooting

Screenshots are not loading in the report S3 asset retrieval uses pre-signed URLs that expire after 1 hour. If you are viewing a report after the URL expiry, refresh the page to generate new pre-signed URLs.

The live execution view is not updating in real time The live view uses a WebSocket connection. If you are behind a proxy or corporate firewall that terminates WebSockets, the view will fall back to polling, which updates every 5 seconds. Check with your network administrator if the fallback is noticeably delayed.

Root cause analysis says "Analysis unavailable" Root cause analysis requires at least one failed step with a screenshot. If the execution failed before any screenshot was captured (e.g., Stage 1 navigation failure due to a network timeout), the AI has insufficient evidence to generate analysis. The raw error message from the browser is still shown in the step result.

The MCP server is not connecting Verify that the API key in your MCP client configuration belongs to the correct workspace and has not been revoked. API keys can be managed in Settings → API Keys.

circle-info

Create your first test in 5 minutes — no code required. Start Free Trial →arrow-up-right — Or Book a Demo →arrow-up-right to see ContextQA with your application.

Last updated

Was this helpful?