Architecture
Prometheal is a server-side agent runtime built with Next.js. This document explains how the major systems work together.
High-level overview
┌──────────┐ ┌──────────────┐ ┌──────────────────┐
│ Browser │ │ LLM Providers│ │ MCP Integrations │
│ │ │ OpenRouter │ │ GitHub, Slack, │
│ chat + │ │ Anthropic │ │ Google, Postgres│
│ desktop │ │ OpenAI │ │ Notion, +custom │
└────┬─────┘ └──────┬───────┘ └────────┬─────────┘
│ HTTPS │ API │ MCP
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────────────┐
│ Prometheal Server │
│ │
│ Agent Runtime ──── LLM Proxy ──── MCP Manager ──── Data Flow Audit │
│ (LLM loop, (routes to (server-side (every byte │
│ tool routing) providers) credentials) logged) │
│ │
└──────────────────────────────┬───────────────────────────────────────┘
│ only allowed connection
▼
┌──────────────────────────────────────────┐
│ Agent Sandbox │
│ │
│ Shell execution File system │
│ Desktop (noVNC) /documents │
│ │
│ iptables: DENY ALL, ALLOW Prometheal │
│ Docker + gVisor | E2B Cloud │
└───────────────────────────────────────────┘Agent runtime
File: src/lib/agent-runtime.ts
The agent runtime is the core loop that orchestrates LLM calls, tool execution, and response streaming. When a user sends a message:
- Load context — Fetch conversation history, agent config, system prompt, agent memories, and available tools (sandbox + browser + memory + MCP)
- Call LLM — Send the conversation to the configured provider (OpenRouter, Anthropic, or OpenAI)
- Parse response — If the LLM returns tool calls, execute them. If it returns text, stream it to the user.
- Execute tools — Tools run in parallel via
Promise.all. Sandbox tools execute inside the container. MCP tools execute server-side via the MCP manager. - Loop — Feed tool results back to the LLM and repeat (max 30 rounds)
- Track usage — Record token counts and estimated cost
Built-in sandbox tools
These execute inside the agent's sandbox via execCommand():
| Tool | What it does |
|---|---|
sandbox__shell | Run a bash command (120s timeout) |
sandbox__read_file | Read a file's contents |
sandbox__write_file | Write/create a file (base64-encoded for safety) |
sandbox__edit_file | Targeted string replacement via Python script |
sandbox__list_files | List directory contents (ls -la) |
sandbox__search_files | Search files with grep patterns |
sandbox__screenshot | Capture desktop screenshot (ImageMagick) |
sandbox__computer | Mouse/keyboard control via xdotool (click, type, scroll, key press) |
Browser tools
Agents can control a full Chrome browser inside the sandbox via CDP (Chrome DevTools Protocol). The browser helper script (sandbox/browser-helper.py) communicates with Chrome over raw WebSocket.
| Tool | What it does |
|---|---|
browser__launch | Start Chrome and optionally navigate to a URL |
browser__navigate | Navigate to a URL |
browser__screenshot | Capture a JPEG screenshot (returned as base64 image for vision models) |
browser__snapshot | Walk the DOM and return a structured element tree with data-ref attributes |
browser__click | Click an element by its ref ID (e.g., e5) |
browser__type | Type text into an element by ref ID |
browser__press | Press a keyboard key (Enter, Escape, Tab, etc.) |
browser__scroll | Scroll the page (up/down/left/right) |
browser__evaluate | Execute arbitrary JavaScript in the page |
browser__tabs | List open browser tabs |
Ref-based targeting: The snapshot tool injects JavaScript that walks the DOM, assigns data-ref attributes to interactive and visible elements, and returns a structured list. Agents use these refs (like e5, e13) to target elements for clicking and typing — no coordinate guessing needed.
Screenshot + Vision: When the agent takes a screenshot, the base64 image is included in the tool result as a multimodal content block. Vision-capable models (Claude, GPT-4o, Qwen) can interpret the screenshot to understand visual layout and content.
Agent memory tools
Agents have persistent, cross-conversation memory stored in the database (AgentMemory model). Memory tools are always available regardless of sandbox state.
| Tool | What it does |
|---|---|
memory__save | Save a memory with a key, content, and optional category |
memory__search | Search memories by query (word-based matching across key, content, category) |
memory__list | List all memories, optionally filtered by category |
memory__delete | Delete a memory by key |
Memories are scoped per-agent. At the start of each conversation, the agent's most recent 50 memories are injected into the system prompt so the agent has context from prior interactions. Memories use upsert — saving with an existing key updates the value.
Background heartbeat
File: src/lib/heartbeat.ts
Agents can be configured to run periodic background turns without user interaction. This enables agents to proactively check on tasks, update their memory, or perform maintenance.
Configuration (per agent):
heartbeatEnabled— Toggle on/offheartbeatIntervalMinutes— How often to run (5–1440 minutes)heartbeatPrompt— The message sent to the agent each heartbeat
How it works:
- The heartbeat runner starts via Next.js
instrumentation.tson server boot - Every 60 seconds, it checks for agents with
heartbeatEnabled: truewhose last run was more thanheartbeatIntervalMinutesago - For each due agent, it loads memories into the system prompt, runs a full agent turn (with all tools), and logs the result to
HeartbeatLog - The agent can use its memory tools during heartbeat to record observations for future conversations
MCP tool execution
External MCP tools (GitHub, Slack, etc.) execute server-side — the sandbox never sees credentials:
- Load the agent's bound integrations from the database
- For each integration, call
mcpManager.listTools()to get available tools - Apply policy filters (blocklist, allowlist, readOnly, scope rules)
- Convert MCP tool schemas to OpenAI function format with namespaced names (
{integrationId}__{toolName}) - When the LLM calls an MCP tool, parse the namespace, find the integration, and call
mcpManager.callTool() - Log the data flow (direction, size, summary)
Tool confirmation
Some tools can be configured to require user approval before execution. When the agent calls a confirmable tool:
- A confirmation request is created with a 5-minute timeout
- The UI shows the tool name, arguments, and approve/deny buttons
- The agent waits for the user's decision
- If approved, the tool executes. If denied or timed out, the agent receives a denial message.
Configure confirmable tools per integration in the agent's integration bindings (confirmTools array).
External channels
Files: src/lib/channels/handler.ts, src/app/api/channels/
Channels let users interact with agents via external platforms (Slack, Telegram) instead of the web UI.
Slack/Telegram → Webhook endpoint → Channel handler → Agent runtime → Response → Platform APIFlow:
- External platform sends a webhook to
/api/channels/{platform}/... - The webhook route verifies the request (HMAC signature for Slack, secret token for Telegram)
- It matches the request to an enabled
AgentChannelwith an assigned agent - The channel handler creates or reuses a synthetic user (
{type}-{userId}@channel.internal) and conversation - It builds message history (last 20 messages) and runs
runAgentTurn() - The response is sent back via the platform's API (Slack
chat.postMessage, TelegramsendMessage)
Channels can be created without an agent — they store the platform config (bot tokens, webhook secrets) and become active once an agent is assigned.
Agent-to-agent communication
File: src/lib/agent-runtime.ts
Agents can delegate tasks to other agents via a built-in agent__<targetId> tool. This enables specialization — a coordinator agent can route requests to a code agent, research agent, etc.
Agent A (coordinator)
└─ calls agent__<B_id>(message: "Analyze this data")
└─ Agent B runs a full turn (with its own tools, sandbox, integrations)
└─ returns response text as the tool resultSecurity model:
| Control | Implementation |
|---|---|
| Explicit allowlist | Agent.allowedCallTargets — array of agent IDs this agent can call. Empty by default. Directional (A→B ≠ B→A). |
| No history leaking | Target agent gets a fresh context — only its system prompt + the single message. No access to existing conversations. |
| Cycle detection | A call chain tracks visited agent IDs. If the target is already in the chain, the call is rejected. |
| Depth limit | Max 3 nested calls. After that, agent tools are not offered to the LLM. |
| Spending | Charged to the calling agent, not the target. Prevents budget draining across agents. |
| Defense-in-depth | The allowlist is re-verified at execution time, not just at tool loading. |
| Audit | Every cross-agent call creates DataFlowLog entries (OUTBOUND from caller, RESPONSE with result). |
The target agent uses its own tools and integrations as normal — this is the whole point. The admin explicitly authorizes the call path, so the target's capabilities are an intended part of the delegation.
Streaming
The messages endpoint streams events to the browser using server-sent events:
| Event type | Payload |
|---|---|
text | LLM text chunk |
tool_start | Tool name, arguments, and location (sandbox/server) |
tool_end | Tool result (success/failure) |
tool_confirmation_pending | Waiting for user approval |
tool_confirmation_resolved | User decision |
done | Token counts and cost |
Sandbox system
Files: src/lib/sandbox/provider.ts, src/lib/sandbox/docker-manager.ts, src/lib/sandbox/e2b-manager.ts
Every agent gets its own isolated sandbox — a container with a full Linux environment including shell, filesystem, and a virtual desktop.
Sandbox lifecycle
- Create — On first message,
ensureSandboxRunning()provisions a new sandbox - Execute —
execCommand(agentId, command, timeout)runs shell commands inside the container - Stream — noVNC provides a live desktop view in the browser
- Extend — Activity extends the idle timeout
- Stop — Manual stop or idle timeout (default: 1 hour) destroys the container
Docker provider
- Uses
dockerodeto manage sibling containers (Prometheal talks to the host Docker daemon) - Auto-detects gVisor (
runsc) runtime; falls back toruncwith a warning - Creates a dedicated bridge network (
prometheal-sandbox) - Applies iptables rules inside the container to block all outbound traffic except to the Prometheal host
- Documents are synced to
/documents/via tar upload - Persistent workspaces: Each agent gets a Docker named volume (
prometheal-workspace-{agentId}) mounted at/home/user. Files created by the agent survive sandbox restarts — only destroying the volume removes them. - Workspace quotas: Each agent has a configurable
workspaceMaxSizeMB(default 3 GB, 0 = unlimited). The runtime checks disk usage before file writes and blocks them when over quota. Shell commands get a warning appended when quota is exceeded. Admins can reset workspaces via the API or UI.
E2B provider
- Uses the
@e2b/desktopSDK to create Firecracker microVMs in E2B's cloud - Desktop streaming via E2B's built-in stream API
- Network isolation handled by E2B's infrastructure
Sandbox image
The sandbox image (sandbox/Dockerfile) includes:
- Ubuntu 22.04 base
- Desktop: Xvfb (virtual X server, 1280x800), Fluxbox (window manager), x11vnc + noVNC + websockify (streaming)
- Browser: Google Chrome stable (for CDP-based browser tools)
- Browser helper: Python CDP controller (
browser-helper.py) for programmatic browser control - Tools: Python 3, Git, ImageMagick (screenshots), xdotool (computer use), iptables (network isolation)
- User: Unprivileged
useraccount - Port: 8080 (noVNC web interface)
MCP system
Files: src/lib/mcp/client-manager.ts, src/lib/mcp/policy.ts, src/lib/mcp/tool-format.ts, src/lib/mcp/catalog.ts
OAuth flow
Some integrations (like Google Workspace) require an OAuth flow instead of a static API key. Prometheal handles this entirely in the browser:
- Admin enters the OAuth client ID and client secret in the integration form
- Admin clicks Authorize in the integration list
- Prometheal generates a CSRF-protected state parameter (stored in KV with 10-minute TTL) and redirects to the provider's consent screen
- After the user grants access, the provider redirects back to
/api/integrations/{id}/oauth/callback - Prometheal exchanges the authorization code for access and refresh tokens
- Tokens are stored encrypted in the integration's credentials field in the database
- When the MCP server is spawned,
writeGoogleOAuthFiles()writescredentials.jsonandtokens.jsonto managed XDG paths — the MCP server reads from these files transparently
The catalog entry's oauthFlow field defines the provider, auth URL, token URL, and scopes. This system is extensible to any OAuth 2.0 provider.
Client manager
Singleton that manages long-lived MCP client connections:
- Supports stdio (subprocess) and HTTP (remote server) transports
- Credentials are decrypted from the database and passed as environment variables to stdio subprocesses
- Protected env vars (PATH, DATABASE_URL, etc.) can never be overwritten by integration credentials
- Tool list is cached for 5 minutes (in Redis for multi-instance, in-memory fallback)
- Idle connections are closed after 30 minutes
Policy enforcement
Each agent-integration binding can define:
blockedTools— Tools that are never availableallowedTools— If set, only these tools are available (allowlist mode)confirmTools— Tools that require user approval before executionreadOnly— Blocks all tools marked asdestructiveHintin their annotations- Scope rules — Restrict tool arguments (e.g., only allow specific GitHub repos or filesystem paths)
Evaluation order: blocklist > readOnly+destructive > allowlist > scope rules.
Tool format conversion
MCP tool schemas are converted to OpenAI function calling format for the LLM. Tool names are namespaced as {integrationId}__{toolName} to prevent collisions when multiple integrations are bound to the same agent.
Catalog
Pre-defined integration templates (src/lib/mcp/catalog.ts) with:
- Server command and transport type
- Required credential fields (with labels, placeholders, and help text)
- Optional scope fields (with constraint types: prefix or enum)
- Documentation links
LLM proxy
File: src/lib/llm-proxy.ts
Routes LLM requests to the correct upstream provider:
| Model pattern | Provider | Endpoint |
|---|---|---|
openrouter/... | OpenRouter | openrouter.ai/api/v1/chat/completions |
anthropic/... or claude-* | Anthropic | api.anthropic.com/v1/messages |
openai/... or gpt-* or o1* | OpenAI | api.openai.com/v1/chat/completions |
Multi-segment (e.g., minimax/minimax-m2.5) | OpenRouter | Auto-detected |
API keys are stored encrypted in the Settings singleton and decrypted at request time. The proxy also:
- Counts tokens from provider responses
- Estimates cost using provider pricing
- Stores usage records in the database
External access
The LLM proxy is also available at /api/llm-proxy/v1/chat/completions for external use (e.g., from within sandbox tools that need LLM access). Authentication uses:
Authorization: Bearer <agentToken>headerx-prometheal-agent-idheader
Authentication and sessions
File: src/lib/auth.ts
- Passwords are hashed with bcryptjs (12 rounds)
- Sessions use JWT tokens (HS256, 7-day expiry) stored in httpOnly cookies
- Session validation is cached in KV with a 60-second TTL
- Middleware automatically refreshes tokens when they're past 50% of their lifetime (3.5 days)
Database
File: prisma/schema.prisma
PostgreSQL with Prisma ORM. Key models:
| Model | Purpose |
|---|---|
User | Authentication, roles (ADMIN/MANAGER/USER) |
Agent | AI agent config (model, system prompt, container state) |
Conversation / Message | Chat history with role, content, attachments, tool calls |
Integration | MCP server config with encrypted credentials |
AgentIntegration | Agent-to-integration binding with policy (allowed/blocked/confirm tools) |
Document | Agent-specific uploaded files |
LibraryDocument / AgentLibraryDocument | Shared document pool with agent bindings |
Settings | Singleton instance config (API keys, registration, domains) |
SpendingLimit / UsageRecord | Per-agent cost tracking and limits |
AuditLog | Event-based activity trail |
DataFlowLog | Directional data movement tracking (INBOUND/OUTBOUND/RESPONSE) |
InviteLink | Registration invites with role, max uses, expiry |
AgentAccess | Per-user agent access grants |
AgentChannel | External channel config (Slack, Telegram webhooks) |
AgentMemory | Per-agent persistent key-value memories (key, content, category) |
HeartbeatLog | Background heartbeat run records (prompt, response, tokens) |
Agent.allowedCallTargets (String array) controls which agents can be called via the agent__ tool prefix.
KV store
File: src/lib/kv.ts
A key-value store used for:
- Rate limit counters (with TTL)
- Session cache (60s TTL)
- MCP tool cache (5min TTL)
- Sandbox creation locks (distributed mutex)
Uses Redis when REDIS_URL is set (required for multi-instance deployment). Falls back to an in-memory Map for single-instance setups.
Data flow and audit
Data flow logging
Every data movement is recorded in DataFlowLog:
- INBOUND — User messages, file uploads
- OUTBOUND — LLM prompts, MCP tool calls
- RESPONSE — LLM responses, tool results, sandbox output
Includes: agent ID, conversation ID, data type, size in bytes, summary, and metadata.
Audit logging
System events are recorded in AuditLog:
- User actions: login, signup, password change
- Agent events: created, updated, deleted, sandbox started/stopped
- Admin actions: settings changed, integration added, user role changed
- Tool events: tool calls, confirmations
Admins can search, filter, and export audit logs as CSV.