Architecture
System Design
System Design
This reference describes how the Jan Server components fit together. Use it when reviewing cross-service changes or planning deployments.
1. System Overview
Jan Server is a microservices platform that exposes OpenAI-compatible APIs through Kong. Each service owns a focused domain:
- LLM API (8080) - chat completions, conversations, projects, model catalog.
- Response API (8082) - multi-step orchestration and MCP tool coordination.
- Media API (8285) - binary ingestion, jan_* IDs, presigned URL management.
- MCP Tools (8091) - JSON-RPC endpoint that proxies Serper/SearXNG search, scraping, file search, and SandboxFusion execution.
- Memory Tools (8090) - semantic memory using BGE-M3 embeddings with caching and batch processing.
- Realtime API (8186) - WebRTC session management via LiveKit for real-time audio/video communication.
- Shared infrastructure - Kong (8000), Keycloak (8085), PostgreSQL, vLLM (8101), observability stack.
Note: Template API (8185) is a development scaffold and not part of the deployed stack.
Kong terminates TLS (in production), validates JWT/API keys, applies rate limits, and forwards requests to the internal services.
2. Architecture Layers
| Layer | Components | Notes |
|---|---|---|
| Edge | Kong Gateway, Keycloak | Centralized auth, rate limiting, guest-token endpoint. |
| Application | LLM API, Response API, Media API, MCP Tools, Memory Tools, Realtime API | Written in Go using Gin + zerolog, configured via pkg/config. |
| Tooling | SearXNG, Serper, SandboxFusion, vector-store | Only accessible from MCP Tools. |
| Data/Storage | PostgreSQL (api-db, keycloak-db), S3-compatible storage | Media files live in object storage; metadata lives in PostgreSQL. |
| Inference | vLLM (local) or remote OpenAI-compatible providers | Selected per request using the provider metadata catalog. |
| Observability | OpenTelemetry Collector, Prometheus, Grafana, Jaeger | Enabled with OTEL_ENABLED=true + make monitor-up. |
3. Component Diagram
+------------------------------+
| External Clients / SDKs |
+---------------+--------------+
|
v
+-------------------+
| Kong Gateway | 8000
+---+---+----+------+
| | |
+--------------+ | +----------------+
| | |
v v v
+-----------+ +---------------+ +---------------+
| LLM API | | Response API | | Media API |
| (8080) | | (8082) | | (8285) |
+-----+-----+ +-------+-------+ +-------+-------+
| | |
| v |
| +-------------------+ |
+------->| MCP Tools |<----------+
| | (8091) |
| +----+---+----+-----+
| | | |
| | | +--> SandboxFusion
| | +-------> Vector Store
| +-----------> SearXNG / Serper
|
v
+---------------+ +----------------+
| Memory Tools | | Realtime API |
| (8090) | | (8186) |
+---------------+ +----------------+
Shared dependencies (not shown): PostgreSQL (api-db), S3/Object storage, Keycloak (JWT issuer), vLLM (8101), BGE-M3 (embeddings), LiveKit (WebRTC).4. Request Lifecycles
Chat Completions
- Client calls
POST /v1/chat/completionsonhttp://localhost:8000. - Kong validates the JWT/API key and forwards to
llm-api:8080. - LLM API resolves
jan_*placeholders via Media API, selects a provider (local vLLM or remote), and streams tokens back to the gateway. - Conversations/projects are persisted in PostgreSQL.
Response Orchestration
- Client calls
POST /responses/v1/responses(streaming optional). - Response API loads the conversation context and iteratively issues
tools/list/tools/callrequests to MCP Tools. - Tool executions are capped by
RESPONSE_MAX_TOOL_DEPTHandTOOL_EXECUTION_TIMEOUT. - Final synthesis is delegated to LLM API and streamed back to the caller.
Media Handling
- Upload via
POST /media/v1/media(remote URL or data URL) or request a presigned upload withPOST /media/v1/media/prepare-upload. - Media API deduplicates content, issues a
jan_*ID, and stores metadata in PostgreSQL. - Other services embed the
jan_*ID; LLM API resolves them to presigned URLs right before inference.
MCP JSON-RPC
- Response API or external automation sends JSON-RPC requests to
POST /v1/mcp. - MCP Tools validates the method (
tools/list,tools/call,prompts/*,resources/*) and dispatches to the Serper/SearXNG/SandboxFusion clients. - Results are returned as SSE events (streaming) or plain JSON when the response fits a single chunk.
5. Data & Network Topology
- Docker Compose defines two primary networks:
jan-server_default(Kong + core services + databases) andjan-server_mcp-network(MCP-only helpers such as SearXNG, vector store, SandboxFusion). - Production deployments should mirror this split using Kubernetes namespaces or NetworkPolicies.
- Persistent data:
api-db(LLM/Response/Media metadata) - each service uses its own schema.keycloak-db- Keycloak realm and client configuration.- Object storage (S3, MinIO, etc.) - Media files and presigned URLs.
6. Deployment Modes
| Mode | Description | Commands |
|---|---|---|
| Local (recommended) | make quickstart prompts for providers, writes .env, and runs docker compose up with all services. | make quickstart |
| Profiles | Start a subset of services (API only, MCP only, GPU inference). | make up-api, make up-mcp, make up-gpu |
| Monitoring stack | Optional Prometheus/Grafana/Jaeger. | make monitor-up |
| Kubernetes | Use k8s/jan-server Helm chart. Values mirror pkg/config defaults. | helm install jan ./k8s/jan-server -f values.yaml |
7. Change Impact Checklist
When modifying the system architecture:
- Update the relevant service README and API docs.
- Reflect new ports/paths in Kong configuration.
- Adjust
docs/architecture/services.mdanddocs/architecture/data-flow.md. - Regenerate configuration artifacts (
make config-generate) ifpkg/configchanges. - Update Kubernetes values and Helm defaults as needed.
Maintainer: Jan Server Architecture Group - Last Reviewed: November 2025