Skip to content

Architecture

Prometheal is a server-side agent runtime built with Next.js. This document explains how the major systems work together.


High-level overview

                ┌──────────┐    ┌──────────────┐    ┌──────────────────┐
                │  Browser │    │ LLM Providers│    │ MCP Integrations │
                │          │    │  OpenRouter   │    │  GitHub, Slack,  │
                │  chat +  │    │  Anthropic    │    │  Google, Postgres│
                │  desktop │    │  OpenAI       │    │  Notion, +custom │
                └────┬─────┘    └──────┬───────┘    └────────┬─────────┘
                     │ HTTPS           │ API                 │ MCP
                     ▼                 ▼                     ▼
  ┌──────────────────────────────────────────────────────────────────────┐
  │                        Prometheal Server                              │
  │                                                                      │
  │   Agent Runtime ──── LLM Proxy ──── MCP Manager ──── Data Flow Audit │
  │   (LLM loop,        (routes to      (server-side     (every byte     │
  │    tool routing)      providers)      credentials)     logged)        │
  │                                                                      │
  └──────────────────────────────┬───────────────────────────────────────┘
                                 │ only allowed connection

              ┌──────────────────────────────────────────┐
              │           Agent Sandbox                   │
              │                                           │
              │   Shell execution    File system           │
              │   Desktop (noVNC)    /documents            │
              │                                           │
              │   iptables: DENY ALL, ALLOW Prometheal     │
              │   Docker + gVisor  |  E2B Cloud            │
              └───────────────────────────────────────────┘

Agent runtime

File: src/lib/agent-runtime.ts

The agent runtime is the core loop that orchestrates LLM calls, tool execution, and response streaming. When a user sends a message:

  1. Load context — Fetch conversation history, agent config, system prompt, agent memories, and available tools (sandbox + browser + memory + MCP)
  2. Call LLM — Send the conversation to the configured provider (OpenRouter, Anthropic, or OpenAI)
  3. Parse response — If the LLM returns tool calls, execute them. If it returns text, stream it to the user.
  4. Execute tools — Tools run in parallel via Promise.all. Sandbox tools execute inside the container. MCP tools execute server-side via the MCP manager.
  5. Loop — Feed tool results back to the LLM and repeat (max 30 rounds)
  6. Track usage — Record token counts and estimated cost

Built-in sandbox tools

These execute inside the agent's sandbox via execCommand():

ToolWhat it does
sandbox__shellRun a bash command (120s timeout)
sandbox__read_fileRead a file's contents
sandbox__write_fileWrite/create a file (base64-encoded for safety)
sandbox__edit_fileTargeted string replacement via Python script
sandbox__list_filesList directory contents (ls -la)
sandbox__search_filesSearch files with grep patterns
sandbox__screenshotCapture desktop screenshot (ImageMagick)
sandbox__computerMouse/keyboard control via xdotool (click, type, scroll, key press)

Browser tools

Agents can control a full Chrome browser inside the sandbox via CDP (Chrome DevTools Protocol). The browser helper script (sandbox/browser-helper.py) communicates with Chrome over raw WebSocket.

ToolWhat it does
browser__launchStart Chrome and optionally navigate to a URL
browser__navigateNavigate to a URL
browser__screenshotCapture a JPEG screenshot (returned as base64 image for vision models)
browser__snapshotWalk the DOM and return a structured element tree with data-ref attributes
browser__clickClick an element by its ref ID (e.g., e5)
browser__typeType text into an element by ref ID
browser__pressPress a keyboard key (Enter, Escape, Tab, etc.)
browser__scrollScroll the page (up/down/left/right)
browser__evaluateExecute arbitrary JavaScript in the page
browser__tabsList open browser tabs

Ref-based targeting: The snapshot tool injects JavaScript that walks the DOM, assigns data-ref attributes to interactive and visible elements, and returns a structured list. Agents use these refs (like e5, e13) to target elements for clicking and typing — no coordinate guessing needed.

Screenshot + Vision: When the agent takes a screenshot, the base64 image is included in the tool result as a multimodal content block. Vision-capable models (Claude, GPT-4o, Qwen) can interpret the screenshot to understand visual layout and content.

Agent memory tools

Agents have persistent, cross-conversation memory stored in the database (AgentMemory model). Memory tools are always available regardless of sandbox state.

ToolWhat it does
memory__saveSave a memory with a key, content, and optional category
memory__searchSearch memories by query (word-based matching across key, content, category)
memory__listList all memories, optionally filtered by category
memory__deleteDelete a memory by key

Memories are scoped per-agent. At the start of each conversation, the agent's most recent 50 memories are injected into the system prompt so the agent has context from prior interactions. Memories use upsert — saving with an existing key updates the value.

Background heartbeat

File: src/lib/heartbeat.ts

Agents can be configured to run periodic background turns without user interaction. This enables agents to proactively check on tasks, update their memory, or perform maintenance.

Configuration (per agent):

  • heartbeatEnabled — Toggle on/off
  • heartbeatIntervalMinutes — How often to run (5–1440 minutes)
  • heartbeatPrompt — The message sent to the agent each heartbeat

How it works:

  1. The heartbeat runner starts via Next.js instrumentation.ts on server boot
  2. Every 60 seconds, it checks for agents with heartbeatEnabled: true whose last run was more than heartbeatIntervalMinutes ago
  3. For each due agent, it loads memories into the system prompt, runs a full agent turn (with all tools), and logs the result to HeartbeatLog
  4. The agent can use its memory tools during heartbeat to record observations for future conversations

MCP tool execution

External MCP tools (GitHub, Slack, etc.) execute server-side — the sandbox never sees credentials:

  1. Load the agent's bound integrations from the database
  2. For each integration, call mcpManager.listTools() to get available tools
  3. Apply policy filters (blocklist, allowlist, readOnly, scope rules)
  4. Convert MCP tool schemas to OpenAI function format with namespaced names ({integrationId}__{toolName})
  5. When the LLM calls an MCP tool, parse the namespace, find the integration, and call mcpManager.callTool()
  6. Log the data flow (direction, size, summary)

Tool confirmation

Some tools can be configured to require user approval before execution. When the agent calls a confirmable tool:

  1. A confirmation request is created with a 5-minute timeout
  2. The UI shows the tool name, arguments, and approve/deny buttons
  3. The agent waits for the user's decision
  4. If approved, the tool executes. If denied or timed out, the agent receives a denial message.

Configure confirmable tools per integration in the agent's integration bindings (confirmTools array).

External channels

Files: src/lib/channels/handler.ts, src/app/api/channels/

Channels let users interact with agents via external platforms (Slack, Telegram) instead of the web UI.

Slack/Telegram → Webhook endpoint → Channel handler → Agent runtime → Response → Platform API

Flow:

  1. External platform sends a webhook to /api/channels/{platform}/...
  2. The webhook route verifies the request (HMAC signature for Slack, secret token for Telegram)
  3. It matches the request to an enabled AgentChannel with an assigned agent
  4. The channel handler creates or reuses a synthetic user ({type}-{userId}@channel.internal) and conversation
  5. It builds message history (last 20 messages) and runs runAgentTurn()
  6. The response is sent back via the platform's API (Slack chat.postMessage, Telegram sendMessage)

Channels can be created without an agent — they store the platform config (bot tokens, webhook secrets) and become active once an agent is assigned.

Agent-to-agent communication

File: src/lib/agent-runtime.ts

Agents can delegate tasks to other agents via a built-in agent__<targetId> tool. This enables specialization — a coordinator agent can route requests to a code agent, research agent, etc.

Agent A (coordinator)
  └─ calls agent__<B_id>(message: "Analyze this data")
       └─ Agent B runs a full turn (with its own tools, sandbox, integrations)
            └─ returns response text as the tool result

Security model:

ControlImplementation
Explicit allowlistAgent.allowedCallTargets — array of agent IDs this agent can call. Empty by default. Directional (A→B ≠ B→A).
No history leakingTarget agent gets a fresh context — only its system prompt + the single message. No access to existing conversations.
Cycle detectionA call chain tracks visited agent IDs. If the target is already in the chain, the call is rejected.
Depth limitMax 3 nested calls. After that, agent tools are not offered to the LLM.
SpendingCharged to the calling agent, not the target. Prevents budget draining across agents.
Defense-in-depthThe allowlist is re-verified at execution time, not just at tool loading.
AuditEvery cross-agent call creates DataFlowLog entries (OUTBOUND from caller, RESPONSE with result).

The target agent uses its own tools and integrations as normal — this is the whole point. The admin explicitly authorizes the call path, so the target's capabilities are an intended part of the delegation.

Streaming

The messages endpoint streams events to the browser using server-sent events:

Event typePayload
textLLM text chunk
tool_startTool name, arguments, and location (sandbox/server)
tool_endTool result (success/failure)
tool_confirmation_pendingWaiting for user approval
tool_confirmation_resolvedUser decision
doneToken counts and cost

Sandbox system

Files: src/lib/sandbox/provider.ts, src/lib/sandbox/docker-manager.ts, src/lib/sandbox/e2b-manager.ts

Every agent gets its own isolated sandbox — a container with a full Linux environment including shell, filesystem, and a virtual desktop.

Sandbox lifecycle

  1. Create — On first message, ensureSandboxRunning() provisions a new sandbox
  2. ExecuteexecCommand(agentId, command, timeout) runs shell commands inside the container
  3. Stream — noVNC provides a live desktop view in the browser
  4. Extend — Activity extends the idle timeout
  5. Stop — Manual stop or idle timeout (default: 1 hour) destroys the container

Docker provider

  • Uses dockerode to manage sibling containers (Prometheal talks to the host Docker daemon)
  • Auto-detects gVisor (runsc) runtime; falls back to runc with a warning
  • Creates a dedicated bridge network (prometheal-sandbox)
  • Applies iptables rules inside the container to block all outbound traffic except to the Prometheal host
  • Documents are synced to /documents/ via tar upload
  • Persistent workspaces: Each agent gets a Docker named volume (prometheal-workspace-{agentId}) mounted at /home/user. Files created by the agent survive sandbox restarts — only destroying the volume removes them.
  • Workspace quotas: Each agent has a configurable workspaceMaxSizeMB (default 3 GB, 0 = unlimited). The runtime checks disk usage before file writes and blocks them when over quota. Shell commands get a warning appended when quota is exceeded. Admins can reset workspaces via the API or UI.

E2B provider

  • Uses the @e2b/desktop SDK to create Firecracker microVMs in E2B's cloud
  • Desktop streaming via E2B's built-in stream API
  • Network isolation handled by E2B's infrastructure

Sandbox image

The sandbox image (sandbox/Dockerfile) includes:

  • Ubuntu 22.04 base
  • Desktop: Xvfb (virtual X server, 1280x800), Fluxbox (window manager), x11vnc + noVNC + websockify (streaming)
  • Browser: Google Chrome stable (for CDP-based browser tools)
  • Browser helper: Python CDP controller (browser-helper.py) for programmatic browser control
  • Tools: Python 3, Git, ImageMagick (screenshots), xdotool (computer use), iptables (network isolation)
  • User: Unprivileged user account
  • Port: 8080 (noVNC web interface)

MCP system

Files: src/lib/mcp/client-manager.ts, src/lib/mcp/policy.ts, src/lib/mcp/tool-format.ts, src/lib/mcp/catalog.ts

OAuth flow

Some integrations (like Google Workspace) require an OAuth flow instead of a static API key. Prometheal handles this entirely in the browser:

  1. Admin enters the OAuth client ID and client secret in the integration form
  2. Admin clicks Authorize in the integration list
  3. Prometheal generates a CSRF-protected state parameter (stored in KV with 10-minute TTL) and redirects to the provider's consent screen
  4. After the user grants access, the provider redirects back to /api/integrations/{id}/oauth/callback
  5. Prometheal exchanges the authorization code for access and refresh tokens
  6. Tokens are stored encrypted in the integration's credentials field in the database
  7. When the MCP server is spawned, writeGoogleOAuthFiles() writes credentials.json and tokens.json to managed XDG paths — the MCP server reads from these files transparently

The catalog entry's oauthFlow field defines the provider, auth URL, token URL, and scopes. This system is extensible to any OAuth 2.0 provider.

Client manager

Singleton that manages long-lived MCP client connections:

  • Supports stdio (subprocess) and HTTP (remote server) transports
  • Credentials are decrypted from the database and passed as environment variables to stdio subprocesses
  • Protected env vars (PATH, DATABASE_URL, etc.) can never be overwritten by integration credentials
  • Tool list is cached for 5 minutes (in Redis for multi-instance, in-memory fallback)
  • Idle connections are closed after 30 minutes

Policy enforcement

Each agent-integration binding can define:

  • blockedTools — Tools that are never available
  • allowedTools — If set, only these tools are available (allowlist mode)
  • confirmTools — Tools that require user approval before execution
  • readOnly — Blocks all tools marked as destructiveHint in their annotations
  • Scope rules — Restrict tool arguments (e.g., only allow specific GitHub repos or filesystem paths)

Evaluation order: blocklist > readOnly+destructive > allowlist > scope rules.

Tool format conversion

MCP tool schemas are converted to OpenAI function calling format for the LLM. Tool names are namespaced as {integrationId}__{toolName} to prevent collisions when multiple integrations are bound to the same agent.

Catalog

Pre-defined integration templates (src/lib/mcp/catalog.ts) with:

  • Server command and transport type
  • Required credential fields (with labels, placeholders, and help text)
  • Optional scope fields (with constraint types: prefix or enum)
  • Documentation links

LLM proxy

File: src/lib/llm-proxy.ts

Routes LLM requests to the correct upstream provider:

Model patternProviderEndpoint
openrouter/...OpenRouteropenrouter.ai/api/v1/chat/completions
anthropic/... or claude-*Anthropicapi.anthropic.com/v1/messages
openai/... or gpt-* or o1*OpenAIapi.openai.com/v1/chat/completions
Multi-segment (e.g., minimax/minimax-m2.5)OpenRouterAuto-detected

API keys are stored encrypted in the Settings singleton and decrypted at request time. The proxy also:

  • Counts tokens from provider responses
  • Estimates cost using provider pricing
  • Stores usage records in the database

External access

The LLM proxy is also available at /api/llm-proxy/v1/chat/completions for external use (e.g., from within sandbox tools that need LLM access). Authentication uses:

  • Authorization: Bearer <agentToken> header
  • x-prometheal-agent-id header

Authentication and sessions

File: src/lib/auth.ts

  • Passwords are hashed with bcryptjs (12 rounds)
  • Sessions use JWT tokens (HS256, 7-day expiry) stored in httpOnly cookies
  • Session validation is cached in KV with a 60-second TTL
  • Middleware automatically refreshes tokens when they're past 50% of their lifetime (3.5 days)

Database

File: prisma/schema.prisma

PostgreSQL with Prisma ORM. Key models:

ModelPurpose
UserAuthentication, roles (ADMIN/MANAGER/USER)
AgentAI agent config (model, system prompt, container state)
Conversation / MessageChat history with role, content, attachments, tool calls
IntegrationMCP server config with encrypted credentials
AgentIntegrationAgent-to-integration binding with policy (allowed/blocked/confirm tools)
DocumentAgent-specific uploaded files
LibraryDocument / AgentLibraryDocumentShared document pool with agent bindings
SettingsSingleton instance config (API keys, registration, domains)
SpendingLimit / UsageRecordPer-agent cost tracking and limits
AuditLogEvent-based activity trail
DataFlowLogDirectional data movement tracking (INBOUND/OUTBOUND/RESPONSE)
InviteLinkRegistration invites with role, max uses, expiry
AgentAccessPer-user agent access grants
AgentChannelExternal channel config (Slack, Telegram webhooks)
AgentMemoryPer-agent persistent key-value memories (key, content, category)
HeartbeatLogBackground heartbeat run records (prompt, response, tokens)

Agent.allowedCallTargets (String array) controls which agents can be called via the agent__ tool prefix.


KV store

File: src/lib/kv.ts

A key-value store used for:

  • Rate limit counters (with TTL)
  • Session cache (60s TTL)
  • MCP tool cache (5min TTL)
  • Sandbox creation locks (distributed mutex)

Uses Redis when REDIS_URL is set (required for multi-instance deployment). Falls back to an in-memory Map for single-instance setups.


Data flow and audit

Data flow logging

Every data movement is recorded in DataFlowLog:

  • INBOUND — User messages, file uploads
  • OUTBOUND — LLM prompts, MCP tool calls
  • RESPONSE — LLM responses, tool results, sandbox output

Includes: agent ID, conversation ID, data type, size in bytes, summary, and metadata.

Audit logging

System events are recorded in AuditLog:

  • User actions: login, signup, password change
  • Agent events: created, updated, deleted, sandbox started/stopped
  • Admin actions: settings changed, integration added, user role changed
  • Tool events: tool calls, confirmations

Admins can search, filter, and export audit logs as CSV.

Released under the MIT License.