Architecture
Jan Server Architecture
Jan Server Architecture
Overview
Jan Server is built from multiple small services (microservices) that work together. Each service has a specific job.
Main Parts
- System Design - How all the pieces fit together
- Services - What each service does
- Data Flow - How requests move through the system
- Security - How we keep things secure
- Observability - How we monitor and debug
- Test Flows - How we test everything
Quick Reference
Service Ports (Docker Compose defaults)
| Service | Port | Access Notes |
|---|---|---|
| Kong Gateway | 8000 | Entry point for /llm/*, /responses/*, /media/*, /mcp (routing + auth) |
| LLM API | 8080 | Internal; exposed through Kong routes |
| Response API | 8082 | Internal; streaming SSE via Kong /responses |
| Media API | 8285 | Internal; proxied by Kong /media |
| MCP Tools | 8091 | Internal; routed through Kong /mcp |
| Memory Tools | 8090 | Internal; semantic memory service |
| Realtime API | 8186 | Internal; WebRTC session management |
| Template API | 8185 | Dev scaffold (not deployed by default) |
| Keycloak | 8085 | Admin console (protect behind VPN/SSO in production) |
| vLLM | 8101 | Inference backend (local GPU/CPU profile) |
| Prometheus | 9090 | Dev-only monitoring UI (make monitor-up) |
| Jaeger | 16686 | Trace UI |
| Grafana | 3331 | Dashboards (admin/admin in dev) |
Technology Stack
| Component | Technology |
|---|---|
| API Gateway | Kong 3.5 + keycloak-apikey plugin |
| Services | Go 1.21+ (Gin framework, zerolog, wire DI) |
| MCP Server | mark3labs/mcp-go v0.7.0 |
| ORM | GORM + goose migrations |
| Database | PostgreSQL 15/16 (Docker) / managed service |
| Auth | Keycloak (OpenID Connect) |
| Inference | vLLM (OpenAI-compatible) or remote providers |
| Observability | OpenTelemetry Collector |
| Metrics | Prometheus 2.48 |
| Tracing | Jaeger 1.51 |
| Dashboards | Grafana 10.2 |
How to Run It
You can run Jan Server in different ways:
Docker Compose (For Development)
Use Docker Compose to run on your local computer:
make quickstart- interactive wizard (creates.env, starts stack)make up-full- bring up all services (docker compose.yml+infra/docker/*.yml)make up-vllm-gpu/make up-vllm-cpu- start vLLM profilemake monitor-up- start Prometheus, Grafana, Jaegermake down/make down-clean- stop stack (preserve or remove volumes)
Kubernetes (For Production)
Use Kubernetes to run in the cloud or on servers:
- Local testing: Minikube or kind (see
k8s/SETUP.md) - Production:
k8s/jan-serverHelm chart + managed Postgres + managed Keycloak - Hybrid: Run inference locally while other services run in the cluster
Helpful references:
- Kubernetes Setup Guide - minikube/bootstrap walkthrough
- Kubernetes README - Helm values, ingress, TLS
- Deployment Guide - Docker, hybrid, CI/CD instructions