Architecture
Data Flow Reference
Data Flow Reference
1. Chat Completion (LLM API)
- Client calls Kong Gateway
POST /v1/chat/completionswith Bearer token. - Kong forwards to
llm-api:8080(internal DNS) and injects request headers. - LLM API:
- Validates JWT via Keycloak JWKS.
- Resolves any
jan_*media IDs by calling Media API/v1/media/resolve. - Selects a provider (local vLLM or configured upstream) and forwards the request.
- Provider (vLLM) streams tokens back to LLM API.
- LLM API streams data to the client (SSE) via Kong and persists conversation rows in PostgreSQL.
2. Response API Orchestration
- Client calls
POST /v1/responses. - Response API looks up conversation state and enqueues tool steps.
- For each tool call:
- Executes JSON-RPC request against MCP Tools (
/v1/mcp). - Records execution metadata in PostgreSQL.
- Applies depth/timeout limits (
MAX_TOOL_EXECUTION_DEPTH,TOOL_EXECUTION_TIMEOUT).
- Final synthesis request is sent to LLM API.
- Completed response is stored and streamed back to the caller (SSE
response.*events).
3. Media Upload and Resolution
- Client uploads via:
POST /v1/media(server-proxied, data URL or remote fetch), orPOST /v1/media/prepare-uploadfollowed by direct S3 upload.
- Media API stores metadata rows and issues
jan_<snowflake>IDs. - Other services reference those IDs instead of exposing raw S3 URLs.
- Before inference, LLM API calls
/v1/media/resolvewith the request payload; Media API rewrites each placeholder with a fresh presigned URL.
4. MCP Tool Execution
- Response API or external clients send MCP JSON-RPC requests to
mcp-tools:8091. - MCP Tools selects the proper backend:
- Web search -> Serper or SearXNG (via redis-searxng cache)
- Scrape -> HTTP fetcher with metadata
- File search -> vector-store service
- Python exec -> SandboxFusion container
- Results are returned synchronously; streaming support is planned via incremental notifications.
5. Observability Pipeline
- Services emit traces and metrics via OTLP (4317).
- The OpenTelemetry Collector forwards metrics to Prometheus and traces to Jaeger.
- Logs are structured JSON printed to stdout; Docker/ Kubernetes aggregates them for your logging stack.
- Grafana dashboards connect to Prometheus and Jaeger for live inspection.
Use this file when onboarding engineers or mapping changes that span multiple services.