Architecture

Prometheal is a server-side agent runtime built with Next.js. This document explains how the major systems work together.

High-level overview

                ┌──────────┐    ┌──────────────┐    ┌──────────────────┐
                │  Browser │    │ LLM Providers│    │ MCP Integrations │
                │          │    │  OpenRouter   │    │  GitHub, Slack,  │
                │  chat +  │    │  Anthropic    │    │  Google, Postgres│
                │  desktop │    │  OpenAI       │    │  Notion, +custom │
                └────┬─────┘    └──────┬───────┘    └────────┬─────────┘
                     │ HTTPS           │ API                 │ MCP
                     ▼                 ▼                     ▼
  ┌──────────────────────────────────────────────────────────────────────┐
  │                        Prometheal Server                              │
  │                                                                      │
  │   Agent Runtime ──── LLM Proxy ──── MCP Manager ──── Data Flow Audit │
  │   (LLM loop,        (routes to      (server-side     (every byte     │
  │    tool routing)      providers)      credentials)     logged)        │
  │                                                                      │
  └──────────────────────────────┬───────────────────────────────────────┘
                                 │ only allowed connection
                                 ▼
              ┌──────────────────────────────────────────┐
              │           Agent Sandbox                   │
              │                                           │
              │   Shell execution    File system           │
              │   Desktop (noVNC)    /documents            │
              │                                           │
              │   iptables: DENY ALL, ALLOW Prometheal     │
              │   Docker + gVisor  |  E2B Cloud            │
              └───────────────────────────────────────────┘

Agent runtime

File: src/lib/agent-runtime.ts

The agent runtime is the core loop that orchestrates LLM calls, tool execution, and response streaming. When a user sends a message:

Load context — Fetch conversation history, agent config, system prompt, agent memories, and available tools (sandbox + browser + memory + MCP)
Call LLM — Send the conversation to the configured provider (OpenRouter, Anthropic, or OpenAI)
Parse response — If the LLM returns tool calls, execute them. If it returns text, stream it to the user.
Execute tools — Tools run in parallel via Promise.all. Sandbox tools execute inside the container. MCP tools execute server-side via the MCP manager.
Loop — Feed tool results back to the LLM and repeat (max 30 rounds)
Track usage — Record token counts and estimated cost

Built-in sandbox tools

These execute inside the agent's sandbox via execCommand():

Tool	What it does
`sandbox__shell`	Run a bash command (120s timeout)
`sandbox__read_file`	Read a file's contents
`sandbox__write_file`	Write/create a file (base64-encoded for safety)
`sandbox__edit_file`	Targeted string replacement via Python script
`sandbox__list_files`	List directory contents (`ls -la`)
`sandbox__search_files`	Search files with grep patterns
`sandbox__screenshot`	Capture desktop screenshot (ImageMagick)
`sandbox__computer`	Mouse/keyboard control via xdotool (click, type, scroll, key press)

Browser tools

Agents can control a full Chrome browser inside the sandbox via CDP (Chrome DevTools Protocol). The browser helper script (sandbox/browser-helper.py) communicates with Chrome over raw WebSocket.

Tool	What it does
`browser__launch`	Start Chrome and optionally navigate to a URL
`browser__navigate`	Navigate to a URL
`browser__screenshot`	Capture a JPEG screenshot (returned as base64 image for vision models)
`browser__snapshot`	Walk the DOM and return a structured element tree with `data-ref` attributes
`browser__click`	Click an element by its ref ID (e.g., `e5`)
`browser__type`	Type text into an element by ref ID
`browser__press`	Press a keyboard key (Enter, Escape, Tab, etc.)
`browser__scroll`	Scroll the page (up/down/left/right)
`browser__evaluate`	Execute arbitrary JavaScript in the page
`browser__tabs`	List open browser tabs

Ref-based targeting: The snapshot tool injects JavaScript that walks the DOM, assigns data-ref attributes to interactive and visible elements, and returns a structured list. Agents use these refs (like e5, e13) to target elements for clicking and typing — no coordinate guessing needed.

Screenshot + Vision: When the agent takes a screenshot, the base64 image is included in the tool result as a multimodal content block. Vision-capable models (Claude, GPT-4o, Qwen) can interpret the screenshot to understand visual layout and content.

Agent memory tools

Agents have persistent, cross-conversation memory stored in the database (AgentMemory model). Memory tools are always available regardless of sandbox state.

Tool	What it does
`memory__save`	Save a memory with a key, content, and optional category
`memory__search`	Search memories by query (word-based matching across key, content, category)
`memory__list`	List all memories, optionally filtered by category
`memory__delete`	Delete a memory by key

Memories are scoped per-agent. At the start of each conversation, the agent's most recent 50 memories are injected into the system prompt so the agent has context from prior interactions. Memories use upsert — saving with an existing key updates the value.

Background heartbeat

File: src/lib/heartbeat.ts

Agents can be configured to run periodic background turns without user interaction. This enables agents to proactively check on tasks, update their memory, or perform maintenance.

Configuration (per agent):

heartbeatEnabled — Toggle on/off
heartbeatIntervalMinutes — How often to run (5–1440 minutes)
heartbeatPrompt — The message sent to the agent each heartbeat

How it works:

The heartbeat runner starts via Next.js instrumentation.ts on server boot
Every 60 seconds, it checks for agents with heartbeatEnabled: true whose last run was more than heartbeatIntervalMinutes ago
For each due agent, it loads memories into the system prompt, runs a full agent turn (with all tools), and logs the result to HeartbeatLog
The agent can use its memory tools during heartbeat to record observations for future conversations

MCP tool execution

External MCP tools (GitHub, Slack, etc.) execute server-side — the sandbox never sees credentials:

Load the agent's bound integrations from the database
For each integration, call mcpManager.listTools() to get available tools
Apply policy filters (blocklist, allowlist, readOnly, scope rules)
Convert MCP tool schemas to OpenAI function format with namespaced names ({integrationId}__{toolName})
When the LLM calls an MCP tool, parse the namespace, find the integration, and call mcpManager.callTool()
Log the data flow (direction, size, summary)

Tool confirmation

Some tools can be configured to require user approval before execution. When the agent calls a confirmable tool:

A confirmation request is created with a 5-minute timeout
The UI shows the tool name, arguments, and approve/deny buttons
The agent waits for the user's decision
If approved, the tool executes. If denied or timed out, the agent receives a denial message.

Configure confirmable tools per integration in the agent's integration bindings (confirmTools array).

External channels

Files: src/lib/channels/handler.ts, src/app/api/channels/

Channels let users interact with agents via external platforms (Slack, Telegram) instead of the web UI.

Slack/Telegram → Webhook endpoint → Channel handler → Agent runtime → Response → Platform API

Flow:

External platform sends a webhook to /api/channels/{platform}/...
The webhook route verifies the request (HMAC signature for Slack, secret token for Telegram)
It matches the request to an enabled AgentChannel with an assigned agent
The channel handler creates or reuses a synthetic user ({type}-{userId}@channel.internal) and conversation
It builds message history (last 20 messages) and runs runAgentTurn()
The response is sent back via the platform's API (Slack chat.postMessage, Telegram sendMessage)

Channels can be created without an agent — they store the platform config (bot tokens, webhook secrets) and become active once an agent is assigned.

Agent-to-agent communication

File: src/lib/agent-runtime.ts

Agents can delegate tasks to other agents via a built-in agent__<targetId> tool. This enables specialization — a coordinator agent can route requests to a code agent, research agent, etc.

Agent A (coordinator)
  └─ calls agent__<B_id>(message: "Analyze this data")
       └─ Agent B runs a full turn (with its own tools, sandbox, integrations)
            └─ returns response text as the tool result

Security model:

Control	Implementation
Explicit allowlist	`Agent.allowedCallTargets` — array of agent IDs this agent can call. Empty by default. Directional (A→B ≠ B→A).
No history leaking	Target agent gets a fresh context — only its system prompt + the single message. No access to existing conversations.
Cycle detection	A call chain tracks visited agent IDs. If the target is already in the chain, the call is rejected.
Depth limit	Max 3 nested calls. After that, agent tools are not offered to the LLM.
Spending	Charged to the calling agent, not the target. Prevents budget draining across agents.
Defense-in-depth	The allowlist is re-verified at execution time, not just at tool loading.
Audit	Every cross-agent call creates `DataFlowLog` entries (OUTBOUND from caller, RESPONSE with result).

The target agent uses its own tools and integrations as normal — this is the whole point. The admin explicitly authorizes the call path, so the target's capabilities are an intended part of the delegation.

Streaming

The messages endpoint streams events to the browser using server-sent events:

Event type	Payload
`text`	LLM text chunk
`tool_start`	Tool name, arguments, and location (sandbox/server)
`tool_end`	Tool result (success/failure)
`tool_confirmation_pending`	Waiting for user approval
`tool_confirmation_resolved`	User decision
`done`	Token counts and cost

Sandbox system

Files: src/lib/sandbox/provider.ts, src/lib/sandbox/docker-manager.ts, src/lib/sandbox/e2b-manager.ts

Every agent gets its own isolated sandbox — a container with a full Linux environment including shell, filesystem, and a virtual desktop.

Sandbox lifecycle

Create — On first message, ensureSandboxRunning() provisions a new sandbox
Execute — execCommand(agentId, command, timeout) runs shell commands inside the container
Stream — noVNC provides a live desktop view in the browser
Extend — Activity extends the idle timeout
Stop — Manual stop or idle timeout (default: 1 hour) destroys the container

Docker provider

Uses dockerode to manage sibling containers (Prometheal talks to the host Docker daemon)
Auto-detects gVisor (runsc) runtime; falls back to runc with a warning
Creates a dedicated bridge network (prometheal-sandbox)
Applies iptables rules inside the container to block all outbound traffic except to the Prometheal host
Documents are synced to /documents/ via tar upload
Persistent workspaces: Each agent gets a Docker named volume (prometheal-workspace-{agentId}) mounted at /home/user. Files created by the agent survive sandbox restarts — only destroying the volume removes them.
Workspace quotas: Each agent has a configurable workspaceMaxSizeMB (default 3 GB, 0 = unlimited). The runtime checks disk usage before file writes and blocks them when over quota. Shell commands get a warning appended when quota is exceeded. Admins can reset workspaces via the API or UI.

E2B provider

Uses the @e2b/desktop SDK to create Firecracker microVMs in E2B's cloud
Desktop streaming via E2B's built-in stream API
Network isolation handled by E2B's infrastructure

Sandbox image

The sandbox image (sandbox/Dockerfile) includes:

Ubuntu 22.04 base
Desktop: Xvfb (virtual X server, 1280x800), Fluxbox (window manager), x11vnc + noVNC + websockify (streaming)
Browser: Google Chrome stable (for CDP-based browser tools)
Browser helper: Python CDP controller (browser-helper.py) for programmatic browser control
Tools: Python 3, Git, ImageMagick (screenshots), xdotool (computer use), iptables (network isolation)
User: Unprivileged user account
Port: 8080 (noVNC web interface)

MCP system

Files: src/lib/mcp/client-manager.ts, src/lib/mcp/policy.ts, src/lib/mcp/tool-format.ts, src/lib/mcp/catalog.ts

OAuth flow

Some integrations (like Google Workspace) require an OAuth flow instead of a static API key. Prometheal handles this entirely in the browser:

Admin enters the OAuth client ID and client secret in the integration form
Admin clicks Authorize in the integration list
Prometheal generates a CSRF-protected state parameter (stored in KV with 10-minute TTL) and redirects to the provider's consent screen
After the user grants access, the provider redirects back to /api/integrations/{id}/oauth/callback
Prometheal exchanges the authorization code for access and refresh tokens
Tokens are stored encrypted in the integration's credentials field in the database
When the MCP server is spawned, writeGoogleOAuthFiles() writes credentials.json and tokens.json to managed XDG paths — the MCP server reads from these files transparently

The catalog entry's oauthFlow field defines the provider, auth URL, token URL, and scopes. This system is extensible to any OAuth 2.0 provider.

Client manager

Singleton that manages long-lived MCP client connections:

Supports stdio (subprocess) and HTTP (remote server) transports
Credentials are decrypted from the database and passed as environment variables to stdio subprocesses
Protected env vars (PATH, DATABASE_URL, etc.) can never be overwritten by integration credentials
Tool list is cached for 5 minutes (in Redis for multi-instance, in-memory fallback)
Idle connections are closed after 30 minutes

Policy enforcement

Each agent-integration binding can define:

blockedTools — Tools that are never available
allowedTools — If set, only these tools are available (allowlist mode)
confirmTools — Tools that require user approval before execution
readOnly — Blocks all tools marked as destructiveHint in their annotations
Scope rules — Restrict tool arguments (e.g., only allow specific GitHub repos or filesystem paths)

Evaluation order: blocklist > readOnly+destructive > allowlist > scope rules.

Tool format conversion

MCP tool schemas are converted to OpenAI function calling format for the LLM. Tool names are namespaced as {integrationId}__{toolName} to prevent collisions when multiple integrations are bound to the same agent.

Catalog

Pre-defined integration templates (src/lib/mcp/catalog.ts) with:

Server command and transport type
Required credential fields (with labels, placeholders, and help text)
Optional scope fields (with constraint types: prefix or enum)
Documentation links

LLM proxy

File: src/lib/llm-proxy.ts

Routes LLM requests to the correct upstream provider:

Model pattern	Provider	Endpoint
`openrouter/...`	OpenRouter	`openrouter.ai/api/v1/chat/completions`
`anthropic/...` or `claude-*`	Anthropic	`api.anthropic.com/v1/messages`
`openai/...` or `gpt-` or `o1`	OpenAI	`api.openai.com/v1/chat/completions`
Multi-segment (e.g., `minimax/minimax-m2.5`)	OpenRouter	Auto-detected

API keys are stored encrypted in the Settings singleton and decrypted at request time. The proxy also:

Counts tokens from provider responses
Estimates cost using provider pricing
Stores usage records in the database

External access

The LLM proxy is also available at /api/llm-proxy/v1/chat/completions for external use (e.g., from within sandbox tools that need LLM access). Authentication uses:

Authorization: Bearer <agentToken> header
x-prometheal-agent-id header

Authentication and sessions

File: src/lib/auth.ts

Passwords are hashed with bcryptjs (12 rounds)
Sessions use JWT tokens (HS256, 7-day expiry) stored in httpOnly cookies
Session validation is cached in KV with a 60-second TTL
Middleware automatically refreshes tokens when they're past 50% of their lifetime (3.5 days)

Database

File: prisma/schema.prisma

PostgreSQL with Prisma ORM. Key models:

Model	Purpose
`User`	Authentication, roles (ADMIN/MANAGER/USER)
`Agent`	AI agent config (model, system prompt, container state)
`Conversation` / `Message`	Chat history with role, content, attachments, tool calls
`Integration`	MCP server config with encrypted credentials
`AgentIntegration`	Agent-to-integration binding with policy (allowed/blocked/confirm tools)
`Document`	Agent-specific uploaded files
`LibraryDocument` / `AgentLibraryDocument`	Shared document pool with agent bindings
`Settings`	Singleton instance config (API keys, registration, domains)
`SpendingLimit` / `UsageRecord`	Per-agent cost tracking and limits
`AuditLog`	Event-based activity trail
`DataFlowLog`	Directional data movement tracking (INBOUND/OUTBOUND/RESPONSE)
`InviteLink`	Registration invites with role, max uses, expiry
`AgentAccess`	Per-user agent access grants
`AgentChannel`	External channel config (Slack, Telegram webhooks)
`AgentMemory`	Per-agent persistent key-value memories (key, content, category)
`HeartbeatLog`	Background heartbeat run records (prompt, response, tokens)

Agent.allowedCallTargets (String array) controls which agents can be called via the agent__ tool prefix.

KV store

File: src/lib/kv.ts

A key-value store used for:

Rate limit counters (with TTL)
Session cache (60s TTL)
MCP tool cache (5min TTL)
Sandbox creation locks (distributed mutex)

Uses Redis when REDIS_URL is set (required for multi-instance deployment). Falls back to an in-memory Map for single-instance setups.

Data flow and audit

Data flow logging

Every data movement is recorded in DataFlowLog:

INBOUND — User messages, file uploads
OUTBOUND — LLM prompts, MCP tool calls
RESPONSE — LLM responses, tool results, sandbox output

Includes: agent ID, conversation ID, data type, size in bytes, summary, and metadata.

Audit logging

System events are recorded in AuditLog:

User actions: login, signup, password change
Agent events: created, updated, deleted, sandbox started/stopped
Admin actions: settings changed, integration added, user role changed
Tool events: tool calls, confirmations

Admins can search, filter, and export audit logs as CSV.

Architecture ​

High-level overview ​

Agent runtime ​

Built-in sandbox tools ​

Browser tools ​

Agent memory tools ​

Background heartbeat ​

MCP tool execution ​

Tool confirmation ​

External channels ​

Agent-to-agent communication ​

Streaming ​

Sandbox system ​

Sandbox lifecycle ​

Docker provider ​

E2B provider ​

Sandbox image ​

MCP system ​

OAuth flow ​

Client manager ​

Policy enforcement ​

Tool format conversion ​

Catalog ​

LLM proxy ​

External access ​

Authentication and sessions ​

Database ​

KV store ​

Data flow and audit ​

Data flow logging ​

Audit logging ​