Jan Server Architecture

Overview

Jan Server is built from multiple small services (microservices) that work together. Each service has a specific job.

Service	Port	Access Notes
Kong Gateway	8000	Entry point for `/llm/`, `/responses/`, `/media/*`, `/mcp` (routing + auth)
LLM API	8080	Internal; exposed through Kong routes
Response API	8082	Internal; streaming SSE via Kong `/responses`
Media API	8285	Internal; proxied by Kong `/media`
MCP Tools	8091	Internal; routed through Kong `/mcp`
Memory Tools	8090	Internal; semantic memory service
Realtime API	8186	Internal; WebRTC session management
Template API	8185	Dev scaffold (not deployed by default)
Keycloak	8085	Admin console (protect behind VPN/SSO in production)
vLLM	8101	Inference backend (local GPU/CPU profile)
Prometheus	9090	Dev-only monitoring UI (`make monitor-up`)
Jaeger	16686	Trace UI
Grafana	3331	Dashboards (admin/admin in dev)

Component	Technology
API Gateway	Kong 3.5 + `keycloak-apikey` plugin
Services	Go 1.21+ (Gin framework, zerolog, wire DI)
MCP Server	mark3labs/mcp-go v0.7.0
ORM	GORM + goose migrations
Database	PostgreSQL 15/16 (Docker) / managed service
Auth	Keycloak (OpenID Connect)
Inference	vLLM (OpenAI-compatible) or remote providers
Observability	OpenTelemetry Collector
Metrics	Prometheus 2.48
Tracing	Jaeger 1.51
Dashboards	Grafana 10.2

You can run Jan Server in different ways:

Use Docker Compose to run on your local computer:

make quickstart - interactive wizard (creates .env, starts stack)
make up-full - bring up all services (docker compose.yml + infra/docker/*.yml)
make up-vllm-gpu / make up-vllm-cpu - start vLLM profile
make monitor-up - start Prometheus, Grafana, Jaeger
make down / make down-clean - stop stack (preserve or remove volumes)

Use Kubernetes to run in the cloud or on servers:

Local testing: Minikube or kind (see k8s/SETUP.md)
Production: k8s/jan-server Helm chart + managed Postgres + managed Keycloak
Hybrid: Run inference locally while other services run in the cluster

Helpful references: