System Design

This reference describes how the Jan Server components fit together. Use it when reviewing cross-service changes or planning deployments.

1. System Overview

Jan Server is a microservices platform that exposes OpenAI-compatible APIs through Kong. Each service owns a focused domain:

LLM API (8080) - chat completions, conversations, projects, model catalog.
Response API (8082) - multi-step orchestration and MCP tool coordination.
Media API (8285) - binary ingestion, jan_* IDs, presigned URL management.
MCP Tools (8091) - JSON-RPC endpoint that proxies Serper/SearXNG search, scraping, file search, and SandboxFusion execution.
Memory Tools (8090) - semantic memory using BGE-M3 embeddings with caching and batch processing.
Realtime API (8186) - WebRTC session management via LiveKit for real-time audio/video communication.
Shared infrastructure - Kong (8000), Keycloak (8085), PostgreSQL, vLLM (8101), observability stack.

Note: Template API (8185) is a development scaffold and not part of the deployed stack.

Kong terminates TLS (in production), validates JWT/API keys, applies rate limits, and forwards requests to the internal services.

2. Architecture Layers

Layer	Components	Notes
Edge	Kong Gateway, Keycloak	Centralized auth, rate limiting, guest-token endpoint.
Application	LLM API, Response API, Media API, MCP Tools, Memory Tools, Realtime API	Written in Go using Gin + zerolog, configured via `pkg/config`.
Tooling	SearXNG, Serper, SandboxFusion, vector-store	Only accessible from MCP Tools.
Data/Storage	PostgreSQL (`api-db`, `keycloak-db`), S3-compatible storage	Media files live in object storage; metadata lives in PostgreSQL.
Inference	vLLM (local) or remote OpenAI-compatible providers	Selected per request using the provider metadata catalog.
Observability	OpenTelemetry Collector, Prometheus, Grafana, Jaeger	Enabled with `OTEL_ENABLED=true` + `make monitor-up`.

3. Component Diagram

             +------------------------------+
             |  External Clients / SDKs     |
             +---------------+--------------+
                             |
                             v
                   +-------------------+
                   |   Kong Gateway    | 8000
                   +---+---+----+------+ 
                       |   |    |
        +--------------+   |    +----------------+
        |                  |                     |
        v                  v                     v
  +-----------+    +---------------+      +---------------+
  |  LLM API  |    |  Response API |      |   Media API   |
  | (8080)    |    |    (8082)     |      |    (8285)     |
  +-----+-----+    +-------+-------+      +-------+-------+
        |                  |                     |
        |                  v                     |
        |        +-------------------+           |
        +------->|    MCP Tools      |<----------+
        |        |     (8091)        |
        |        +----+---+----+-----+
        |             |   |    |
        |             |   |    +--> SandboxFusion
        |             |   +-------> Vector Store
        |             +-----------> SearXNG / Serper
        |
        v
  +---------------+      +----------------+
  | Memory Tools  |      | Realtime API   |
  |   (8090)      |      |    (8186)      |
  +---------------+      +----------------+

Shared dependencies (not shown): PostgreSQL (api-db), S3/Object storage, Keycloak (JWT issuer), vLLM (8101), BGE-M3 (embeddings), LiveKit (WebRTC).

4. Request Lifecycles

Chat Completions

Client calls POST /v1/chat/completions on http://localhost:8000.
Kong validates the JWT/API key and forwards to llm-api:8080.
LLM API resolves jan_* placeholders via Media API, selects a provider (local vLLM or remote), and streams tokens back to the gateway.
Conversations/projects are persisted in PostgreSQL.

Response Orchestration

Client calls POST /responses/v1/responses (streaming optional).
Response API loads the conversation context and iteratively issues tools/list / tools/call requests to MCP Tools.
Tool executions are capped by RESPONSE_MAX_TOOL_DEPTH and TOOL_EXECUTION_TIMEOUT.
Final synthesis is delegated to LLM API and streamed back to the caller.

Media Handling

Upload via POST /media/v1/media (remote URL or data URL) or request a presigned upload with POST /media/v1/media/prepare-upload.
Media API deduplicates content, issues a jan_* ID, and stores metadata in PostgreSQL.
Other services embed the jan_* ID; LLM API resolves them to presigned URLs right before inference.

MCP JSON-RPC

Response API or external automation sends JSON-RPC requests to POST /v1/mcp.
MCP Tools validates the method (tools/list, tools/call, prompts/*, resources/*) and dispatches to the Serper/SearXNG/SandboxFusion clients.
Results are returned as SSE events (streaming) or plain JSON when the response fits a single chunk.

5. Data & Network Topology

Docker Compose defines two primary networks: jan-server_default (Kong + core services + databases) and jan-server_mcp-network (MCP-only helpers such as SearXNG, vector store, SandboxFusion).
Production deployments should mirror this split using Kubernetes namespaces or NetworkPolicies.
Persistent data:
- api-db (LLM/Response/Media metadata) - each service uses its own schema.
- keycloak-db - Keycloak realm and client configuration.
- Object storage (S3, MinIO, etc.) - Media files and presigned URLs.

6. Deployment Modes

Mode	Description	Commands
Local (recommended)	`make quickstart` prompts for providers, writes `.env`, and runs `docker compose up` with all services.	`make quickstart`
Profiles	Start a subset of services (API only, MCP only, GPU inference).	`make up-api`, `make up-mcp`, `make up-gpu`
Monitoring stack	Optional Prometheus/Grafana/Jaeger.	`make monitor-up`
Kubernetes	Use `k8s/jan-server` Helm chart. Values mirror `pkg/config` defaults.	`helm install jan ./k8s/jan-server -f values.yaml`

7. Change Impact Checklist

When modifying the system architecture:

Update the relevant service README and API docs.
Reflect new ports/paths in Kong configuration.
Adjust docs/architecture/services.md and docs/architecture/data-flow.md.
Regenerate configuration artifacts (make config-generate) if pkg/config changes.
Update Kubernetes values and Helm defaults as needed.

Maintainer: Jan Server Architecture Group - Last Reviewed: November 2025

System Design

On this page