> For the complete documentation index, see [llms.txt](https://learning.contextqa.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://learning.contextqa.com/getting-started/architecture-overview.md).

# Platform Architecture

{% hint style="info" %}
**Who is this for?** Engineers and technical leads who want to understand ContextQA's 9-stage AI pipeline, infrastructure components, and how evidence is captured during execution.
{% endhint %}

ContextQA executes tests through a 9-stage agentic pipeline in which 13 or more specialized AI agents collaborate to navigate, interact, verify, and repair tests against real browsers and devices. This page explains how each stage works, how the agents communicate, and how the supporting infrastructure stores and serves the evidence those agents produce.

Understanding the architecture helps you write better tests, interpret execution results accurately, and troubleshoot failures more effectively.

![ContextQA platform architecture diagram showing clients, MCP Server with 67 tools, backend API services, and the 9-stage pipeline targeting web, mobile, API, Salesforce, and SAP](/files/mWWkcGrsHqGlU4EvBYW6)

## Prerequisites

* You have read the [Introduction](/getting-started/introduction.md) and [Core Concepts](/getting-started/core-concepts.md).
* You understand the difference between a test case, a test suite, and a test plan.

***

## The 9-Stage AI Execution Pipeline

Every test execution — whether triggered manually, via CI/CD, or by a schedule — passes through all nine stages in order. Stages run sequentially per step but the platform can run multiple test cases in parallel across browser instances.

### Stage 1: Navigation

The pipeline begins by launching a browser instance and navigating to the starting URL defined in the test case. The navigation agent establishes the initial page context, waits for the page to reach a stable "load complete" state (using both the browser's `load` event and an AI-assessed visual stability check), and hands off control to the next stage.

If the navigation fails — for example, because the URL is unreachable or redirects to an error page — the execution terminates immediately with a root cause entry explaining the navigation failure.

### Stage 2: Element Discovery

Before executing any step, the element discovery agent performs a full scan of the current page. It combines two complementary signals:

* **DOM analysis** — the agent inspects the HTML structure, ARIA roles, `data-testid` attributes, input types, form labels, and button text to build an element inventory.
* **Visual analysis** — a screenshot is analyzed using computer vision to identify interactive regions, buttons, form fields, navigation menus, and modal dialogs that may not have clean DOM representations.

The element inventory is used by the step execution agent to match natural language step descriptions to real page elements. This dual-signal approach handles both well-structured SPAs and legacy applications with inconsistent DOM hygiene.

### Stage 3: Step Execution

The step execution agent interprets each natural language step and performs the corresponding browser action. The agent:

1. Reads the step description (e.g., "Type <admin@test.com> in the Email field").
2. Consults the element inventory from Stage 2 to locate the target element.
3. Executes the action via the browser automation layer (click, type, select, scroll, hover, drag-and-drop).
4. Waits for the browser to stabilize after the action (dynamic content settling, navigation completing, network requests resolving).
5. Updates the element inventory for the next step.

The step execution agent handles implicit waits automatically. You do not need to add sleep or wait steps for standard navigation and form interactions.

### Stage 4: Screenshot Capture

After every step execution (regardless of pass or fail), the screenshot capture agent saves a full-page screenshot to object storage. Screenshots are:

* Stored in S3 with a unique key per step per execution.
* Linked to the step result record in the database.
* Accessible in the execution report immediately after the step completes.
* Available for 90 days by default (configurable per workspace).

Screenshot capture can be configured per step: **Always** (default), **On Failure Only**, or **Never**.

### Stage 5: Network Monitoring

The network monitoring agent runs continuously throughout execution, intercepting all HTTP and HTTPS traffic produced by the browser. For every request it captures:

* Request method, URL, headers, and body
* Response status code, headers, and body
* Timing data (DNS resolution, connection, TLS handshake, time-to-first-byte, download duration)

The full HAR (HTTP Archive) log is attached to the execution record and viewable in the results report. This is particularly useful for diagnosing failures caused by API errors, authentication token expiry, or third-party service outages that are not visible in the UI screenshots.

### Stage 6: Console Monitoring

The console monitoring agent captures every message written to the browser console during execution: `console.log`, `console.warn`, `console.error`, and unhandled JavaScript exceptions. Console entries are timestamped and correlated with the step that was executing when they were emitted.

For applications that log structured diagnostic data to the console, this stage often reveals the root cause of a failure faster than the screenshots alone.

### Stage 7: AI Verification

When a step is an assertion ("verify the welcome message is displayed", "confirm the order total is correct"), the AI verification agent processes the step differently from an action step.

The verification agent:

1. Takes the current screenshot.
2. Reads the natural language verification condition.
3. Uses a vision-language model to evaluate whether the condition is satisfied in the screenshot.
4. Produces a boolean result (pass/fail) and a confidence score.
5. Adds a natural language explanation of what was observed to the step result.

This approach handles dynamic, unpredictable content that cannot be verified with hard-coded assertions. For example, "verify a unique order ID was generated and displayed" does not require knowing the exact order ID — the AI verifies that a plausible order ID is present in the expected location.

For deterministic assertions (exact string matching, specific numeric values), the agent performs the string comparison directly rather than relying solely on visual interpretation.

### Stage 8: Self-Healing

The self-healing agent activates when a step execution fails because an expected element was not found at its known location.

When a locator fails, the self-healing agent analyzes the current page state and finds the best matching element using AI.

Self-healing events are non-destructive: the healed locator is applied persistently to the test case going forward, so future runs do not need to heal the same step again.

### Stage 9: Evidence Collection

After all steps have executed, the evidence collection agent compiles the final execution record:

* **Video recording** — the full browser session stitched from the frame captures taken during execution, encoded as an MP4.
* **Playwright trace** — a structured trace file compatible with Playwright's trace viewer, containing DOM snapshots, action timeline, and network log for deep debugging.
* **Root cause analysis** — for any failed steps, the AI generates a structured root cause analysis that identifies the likely cause of the failure, distinguishes flakiness (transient network issues, race conditions) from genuine application bugs, and suggests corrective actions.
* **Execution summary** — overall pass/fail, duration, step count, browser/device used, environment.

All evidence assets are stored in S3 and linked to the execution record. The execution report page in the portal assembles all of this data into a readable, navigable view.

***

## Infrastructure Components

### Frontend

The ContextQA web portal is an **Angular TypeScript** single-page application. It communicates with the backend via REST APIs and WebSocket connections (for live execution streaming). The portal handles workspace management, test case authoring, execution triggering, report viewing, and administration.

### AI Engine

The AI engine is a **multi-agent pipeline** comprising 13 or more specialized agents. Agents are implemented as fine-tuned and prompted language models and vision-language models, orchestrated by a central pipeline coordinator. Agent specializations include:

| Agent                      | Responsibility                                       |
| -------------------------- | ---------------------------------------------------- |
| Navigation Agent           | URL navigation, page load stabilization              |
| Element Discovery Agent    | DOM and visual element inventory                     |
| Step Interpretation Agent  | Natural language to browser action mapping           |
| Verification Agent         | AI-powered assertion evaluation                      |
| Self-Healing Agent         | Broken locator detection and repair                  |
| Root Cause Analysis Agent  | Failure diagnosis and explanation                    |
| Test Generation Agent      | Creating test steps from natural language tasks      |
| Code Generation Agent      | Exporting tests as Playwright/Selenium code          |
| Figma Analysis Agent       | Extracting test scenarios from design files          |
| Ticket Analysis Agent      | Extracting test scenarios from Jira/Linear tickets   |
| Video Analysis Agent       | Extracting test scenarios from screen recordings     |
| API Analysis Agent         | Extracting test scenarios from Swagger/OpenAPI specs |
| Performance Analysis Agent | Load test result interpretation                      |

### MCP Server

The **ContextQA MCP Server** is a Python application built with the **FastMCP** framework. It exposes 67 platform capabilities as callable tools following the Model Context Protocol specification.

The server accepts connections from any MCP-compatible client (Claude Desktop, Cursor, custom integrations) and translates tool calls into authenticated requests to the ContextQA backend API.

Key tool categories exposed by the MCP server:

| Category          | Example Tools                                                                                 |
| ----------------- | --------------------------------------------------------------------------------------------- |
| Test authoring    | `create_test_case`, `update_test_case_step`, `create_complex_test_step`                       |
| Execution         | `execute_test_case`, `execute_test_suite`, `execute_test_plan`                                |
| Results retrieval | `get_execution_status`, `get_test_step_results`, `get_test_case_results`                      |
| Evidence          | `get_console_logs`, `get_network_logs`, `get_trace_url`                                       |
| Self-healing      | `get_auto_healing_suggestions`, `approve_auto_healing`                                        |
| AI insights       | `get_ai_insights`, `get_root_cause`, `investigate_failure`                                    |
| Test generation   | `generate_tests_from_jira_ticket`, `generate_tests_from_figma`, `generate_tests_from_swagger` |

The MCP server source code is open source and available in the ContextQA GitHub organization.

### Storage

* **Object Storage (S3)** — screenshots, video recordings, Playwright traces, HAR files. Assets are stored per execution with a structured key format: `{workspace_id}/{execution_id}/{step_index}/screenshot.png`.
* **Relational Database** — test case definitions, execution records, step results, workspace configuration, user accounts.
* **Secrets Storage** — environment parameter values of type `password` are stored encrypted at rest using AES-256 and are never returned in plain text through any API endpoint.

### Authentication

ContextQA uses **session-based authentication** for the web portal. Users authenticate with email and password (or SSO via SAML 2.0 for enterprise accounts). API access and MCP server connections use **API keys** scoped to a workspace, generated in Settings → API Keys.

***

## Execution Flow: End to End

The following trace shows what happens from the moment you click **Run** on a test case to the moment the report is available:

```
1. User clicks Run in portal
2. Portal sends POST /api/executions { test_case_id, environment_id, browser }
3. Backend creates execution record (status: QUEUED)
4. Execution coordinator picks up the job from the queue
5. Browser instance is launched (headless Chrome / visible Chrome)
6. Stage 1: Navigation agent navigates to starting URL
7. For each step:
   a. Stage 2: Element discovery agent scans current page
   b. Stage 3: Step execution agent performs action
   c. Stage 4: Screenshot agent captures current state → upload to S3
   d. Stage 5: Network agent appends request/response to HAR buffer
   e. Stage 6: Console agent appends new console entries
   f. If assertion step → Stage 7: AI verification agent evaluates condition
   g. If element not found → Stage 8: Self-healing agent attempts repair
   h. Step result (pass/fail, screenshot URL) written to database
   i. Portal receives step result via WebSocket → live UI updates
8. Stage 9: Evidence collection
   a. Video recording encoded and uploaded to S3
   b. Playwright trace compiled and uploaded to S3
   c. Root cause analysis generated for any failed steps
   d. Execution record updated to PASSED or FAILED
9. Portal receives completion event via WebSocket
10. User sees final report with all evidence linked
```

***

## Tips & Best Practices

* **Understand Stage 7 for better assertions** — The AI verification agent is powerful but works best with clear, observable conditions. "Verify the page title says Welcome, John" is more reliable than "Verify the user logged in successfully" because the former specifies an observable UI element.
* **Use network logs for API failures** — If your test fails on a page action but the screenshot shows the UI looks correct, check the network log. A 401 or 500 response from an API call is often the real cause.
* **HAR logs contain sensitive data** — Network logs capture request and response bodies, which may include tokens, session cookies, or user data. Manage access to execution reports accordingly.
* **Self-healing is conservative by design** — The confidence threshold is intentional. A lower threshold would cause incorrect healings that mask real failures. If a step consistently fails the self-healing check, it is a signal that the application change is significant enough to warrant a manual test review.

## Troubleshooting

**Screenshots are not loading in the report** S3 asset retrieval uses pre-signed URLs that expire after 1 hour. If you are viewing a report after the URL expiry, refresh the page to generate new pre-signed URLs.

**The live execution view is not updating in real time** The live view uses a WebSocket connection. If you are behind a proxy or corporate firewall that terminates WebSockets, the view will fall back to polling, which updates every 5 seconds. Check with your network administrator if the fallback is noticeably delayed.

**Root cause analysis says "Analysis unavailable"** Root cause analysis requires at least one failed step with a screenshot. If the execution failed before any screenshot was captured (e.g., Stage 1 navigation failure due to a network timeout), the AI has insufficient evidence to generate analysis. The raw error message from the browser is still shown in the step result.

**The MCP server is not connecting** Verify that the API key in your MCP client configuration belongs to the correct workspace and has not been revoked. API keys can be managed in Settings → API Keys.

## Related Pages

* [Introduction to ContextQA](/getting-started/introduction.md)
* [Core Concepts](/getting-started/core-concepts.md)
* [AI Self-Healing](/web-testing/self-healing.md)
* [MCP Server](https://github.com/indivatools/gitbooks-docs/blob/main/docs/mcp-server/README.md)
* [Running Tests](/execution/running-tests.md)

{% hint style="info" %}
**Create your first test in 5 minutes — no code required.** [**Start Free Trial →**](https://app.contextqa.com/register) — Or [**Book a Demo →**](https://contextqa.com/book-a-demo/) to see ContextQA with your application.
{% endhint %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://learning.contextqa.com/getting-started/architecture-overview.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.