Architecture

Jan Server Architecture

Jan Server Architecture

Overview

Jan Server is built from multiple small services (microservices) that work together. Each service has a specific job.

Main Parts

  1. System Design - How all the pieces fit together
  2. Services - What each service does
  3. Data Flow - How requests move through the system
  4. Security - How we keep things secure
  5. Observability - How we monitor and debug
  6. Test Flows - How we test everything

Quick Reference

Service Ports (Docker Compose defaults)

ServicePortAccess Notes
Kong Gateway8000Entry point for /llm/*, /responses/*, /media/*, /mcp (routing + auth)
LLM API8080Internal; exposed through Kong routes
Response API8082Internal; streaming SSE via Kong /responses
Media API8285Internal; proxied by Kong /media
MCP Tools8091Internal; routed through Kong /mcp
Memory Tools8090Internal; semantic memory service
Realtime API8186Internal; WebRTC session management
Template API8185Dev scaffold (not deployed by default)
Keycloak8085Admin console (protect behind VPN/SSO in production)
vLLM8101Inference backend (local GPU/CPU profile)
Prometheus9090Dev-only monitoring UI (make monitor-up)
Jaeger16686Trace UI
Grafana3331Dashboards (admin/admin in dev)

Technology Stack

ComponentTechnology
API GatewayKong 3.5 + keycloak-apikey plugin
ServicesGo 1.21+ (Gin framework, zerolog, wire DI)
MCP Servermark3labs/mcp-go v0.7.0
ORMGORM + goose migrations
DatabasePostgreSQL 15/16 (Docker) / managed service
AuthKeycloak (OpenID Connect)
InferencevLLM (OpenAI-compatible) or remote providers
ObservabilityOpenTelemetry Collector
MetricsPrometheus 2.48
TracingJaeger 1.51
DashboardsGrafana 10.2

How to Run It

You can run Jan Server in different ways:

Docker Compose (For Development)

Use Docker Compose to run on your local computer:

  • make quickstart - interactive wizard (creates .env, starts stack)
  • make up-full - bring up all services (docker compose.yml + infra/docker/*.yml)
  • make up-vllm-gpu / make up-vllm-cpu - start vLLM profile
  • make monitor-up - start Prometheus, Grafana, Jaeger
  • make down / make down-clean - stop stack (preserve or remove volumes)

Kubernetes (For Production)

Use Kubernetes to run in the cloud or on servers:

  • Local testing: Minikube or kind (see k8s/SETUP.md)
  • Production: k8s/jan-server Helm chart + managed Postgres + managed Keycloak
  • Hybrid: Run inference locally while other services run in the cluster

Helpful references:

References