Architecture

System Design

System Design

This reference describes how the Jan Server components fit together. Use it when reviewing cross-service changes or planning deployments.

1. System Overview

Jan Server is a microservices platform that exposes OpenAI-compatible APIs through Kong. Each service owns a focused domain:

  • LLM API (8080) - chat completions, conversations, projects, model catalog.
  • Response API (8082) - multi-step orchestration and MCP tool coordination.
  • Media API (8285) - binary ingestion, jan_* IDs, presigned URL management.
  • MCP Tools (8091) - JSON-RPC endpoint that proxies Serper/SearXNG search, scraping, file search, and SandboxFusion execution.
  • Memory Tools (8090) - semantic memory using BGE-M3 embeddings with caching and batch processing.
  • Realtime API (8186) - WebRTC session management via LiveKit for real-time audio/video communication.
  • Shared infrastructure - Kong (8000), Keycloak (8085), PostgreSQL, vLLM (8101), observability stack.

Note: Template API (8185) is a development scaffold and not part of the deployed stack.

Kong terminates TLS (in production), validates JWT/API keys, applies rate limits, and forwards requests to the internal services.

2. Architecture Layers

LayerComponentsNotes
EdgeKong Gateway, KeycloakCentralized auth, rate limiting, guest-token endpoint.
ApplicationLLM API, Response API, Media API, MCP Tools, Memory Tools, Realtime APIWritten in Go using Gin + zerolog, configured via pkg/config.
ToolingSearXNG, Serper, SandboxFusion, vector-storeOnly accessible from MCP Tools.
Data/StoragePostgreSQL (api-db, keycloak-db), S3-compatible storageMedia files live in object storage; metadata lives in PostgreSQL.
InferencevLLM (local) or remote OpenAI-compatible providersSelected per request using the provider metadata catalog.
ObservabilityOpenTelemetry Collector, Prometheus, Grafana, JaegerEnabled with OTEL_ENABLED=true + make monitor-up.

3. Component Diagram

             +------------------------------+
             |  External Clients / SDKs     |
             +---------------+--------------+
                             |
                             v
                   +-------------------+
                   |   Kong Gateway    | 8000
                   +---+---+----+------+ 
                       |   |    |
        +--------------+   |    +----------------+
        |                  |                     |
        v                  v                     v
  +-----------+    +---------------+      +---------------+
  |  LLM API  |    |  Response API |      |   Media API   |
  | (8080)    |    |    (8082)     |      |    (8285)     |
  +-----+-----+    +-------+-------+      +-------+-------+
        |                  |                     |
        |                  v                     |
        |        +-------------------+           |
        +------->|    MCP Tools      |<----------+
        |        |     (8091)        |
        |        +----+---+----+-----+
        |             |   |    |
        |             |   |    +--> SandboxFusion
        |             |   +-------> Vector Store
        |             +-----------> SearXNG / Serper
        |
        v
  +---------------+      +----------------+
  | Memory Tools  |      | Realtime API   |
  |   (8090)      |      |    (8186)      |
  +---------------+      +----------------+

Shared dependencies (not shown): PostgreSQL (api-db), S3/Object storage, Keycloak (JWT issuer), vLLM (8101), BGE-M3 (embeddings), LiveKit (WebRTC).

4. Request Lifecycles

Chat Completions

  1. Client calls POST /v1/chat/completions on http://localhost:8000.
  2. Kong validates the JWT/API key and forwards to llm-api:8080.
  3. LLM API resolves jan_* placeholders via Media API, selects a provider (local vLLM or remote), and streams tokens back to the gateway.
  4. Conversations/projects are persisted in PostgreSQL.

Response Orchestration

  1. Client calls POST /responses/v1/responses (streaming optional).
  2. Response API loads the conversation context and iteratively issues tools/list / tools/call requests to MCP Tools.
  3. Tool executions are capped by RESPONSE_MAX_TOOL_DEPTH and TOOL_EXECUTION_TIMEOUT.
  4. Final synthesis is delegated to LLM API and streamed back to the caller.

Media Handling

  1. Upload via POST /media/v1/media (remote URL or data URL) or request a presigned upload with POST /media/v1/media/prepare-upload.
  2. Media API deduplicates content, issues a jan_* ID, and stores metadata in PostgreSQL.
  3. Other services embed the jan_* ID; LLM API resolves them to presigned URLs right before inference.

MCP JSON-RPC

  1. Response API or external automation sends JSON-RPC requests to POST /v1/mcp.
  2. MCP Tools validates the method (tools/list, tools/call, prompts/*, resources/*) and dispatches to the Serper/SearXNG/SandboxFusion clients.
  3. Results are returned as SSE events (streaming) or plain JSON when the response fits a single chunk.

5. Data & Network Topology

  • Docker Compose defines two primary networks: jan-server_default (Kong + core services + databases) and jan-server_mcp-network (MCP-only helpers such as SearXNG, vector store, SandboxFusion).
  • Production deployments should mirror this split using Kubernetes namespaces or NetworkPolicies.
  • Persistent data:
    • api-db (LLM/Response/Media metadata) - each service uses its own schema.
    • keycloak-db - Keycloak realm and client configuration.
    • Object storage (S3, MinIO, etc.) - Media files and presigned URLs.

6. Deployment Modes

ModeDescriptionCommands
Local (recommended)make quickstart prompts for providers, writes .env, and runs docker compose up with all services.make quickstart
ProfilesStart a subset of services (API only, MCP only, GPU inference).make up-api, make up-mcp, make up-gpu
Monitoring stackOptional Prometheus/Grafana/Jaeger.make monitor-up
KubernetesUse k8s/jan-server Helm chart. Values mirror pkg/config defaults.helm install jan ./k8s/jan-server -f values.yaml

7. Change Impact Checklist

When modifying the system architecture:

  1. Update the relevant service README and API docs.
  2. Reflect new ports/paths in Kong configuration.
  3. Adjust docs/architecture/services.md and docs/architecture/data-flow.md.
  4. Regenerate configuration artifacts (make config-generate) if pkg/config changes.
  5. Update Kubernetes values and Helm defaults as needed.

Maintainer: Jan Server Architecture Group - Last Reviewed: November 2025