Prompt Orchestration
Prompt Orchestration
Overview
The Prompt Orchestration system is a pipeline component within the LLM API service that dynamically composes and enhances prompts before they are sent to inference providers. It applies conditional modules based on context, user preferences, conversation history, and database-driven templates.
Architecture Decision: Prompt orchestration is implemented as a processor within the LLM API service, not as an isolated microservice. This gives you dynamic control at runtime to add memory, tools, templates, customize tone, and assemble final prompts automatically.
What is a Prompt Orchestration Processor?
A Prompt Orchestration Processor is a processing layer within LLM API that:
- Takes a user's raw input (before it reaches the inference engine)
- Checks conditions (flags, context, user settings, memory, etc.)
- Composes a final prompt programmatically by applying conditional modules
- Passes that composed prompt to the inference provider (vLLM or remote)
The processor sits in the request pipeline within llm-api, between the HTTP handler and the inference provider client.
Architecture
HTTP Request (POST /v1/chat/completions)
↓
Gin Handler (llm-api/internal/interfaces/httpserver/handlers/chathandler)
↓
1. Conversation Context Loading
- Get or create conversation
- Load conversation items
- Load project instructions
↓
2. Memory Loading
- Collect prompt memory from headers/metadata
- Load memory context via memory-tools service
↓
3. Provider Selection
- Select provider model based on request
- Load model catalog for context length
↓
4. Media Resolution
- Resolve jan_* media placeholders
↓
5. Project Instruction Injection
- Prepend project instruction as first system message
↓
6. Prompt Orchestration Processor
- Build prompt context with preferences
- Apply conditional modules in priority order
- Return enhanced messages
↓
7. Context Trimming
- Trim messages to fit model context length
↓
Inference Provider Client (internal/infrastructure/inference)
↓
vLLM or Remote ProviderPackage Structure
services/llm-api/internal/domain/prompt/
├── types.go # Core interfaces and types
├── modules.go # Built-in module implementations
├── deep_research_module.go # Deep Research module
├── processor.go # Main processor implementation
└── processor_test.go # Unit testsFeatures & Capabilities
What the Processor Can Do
The processor can automatically attach optional modules as part of the LLM API request pipeline:
Deep Research
Inject specialized research prompts for comprehensive analysis when deep_research: true is set.
User Profile Personalization
Inject user profile settings including:
- Base style (concise, friendly, professional)
- Custom instructions
- Nickname, occupation, and personal context
Memory
If user enables memory, insert memory hints/preferences into prompt.
Tool Usage
Conditionally include instructions like:
- "use the retrieval tool when needed"
- "use the calculator tool if numbers appear"
Templates / Prompt Patterns
For example:
- Chain-of-Thought structure
- Code assistant guidance
- Output format
- "First think step-by-step, then answer"
Project Instructions
Inject project-specific instructions with highest priority.
Timing Context
Add current date and AI assistant introduction.
Built-in Modules
The processor includes several built-in modules that are automatically applied based on context:
-20. Deep Research Module (Conditional)
- Purpose: Injects comprehensive research prompts for deep analysis
- Activation: Enabled when
deep_research: truein preferences - Adds: Research methodology and comprehensive analysis instructions
- Priority: -20 (runs before all other modules)
-15. Timing Module (Always Active)
- Purpose: Ensures a base system prompt with current date is present
- Activation: Always registered when prompt orchestration is enabled
- Adds: AI assistant intro and current date to the system prompt
- Priority: -15
-10. Project Instruction Module (Conditional)
- Purpose: Injects project-specific instructions with highest priority
- Activation: When conversation has a linked project with instructions
- Adds: Project instructions as first system message with priority note
- Priority: -10
5. User Profile Module (Conditional)
- Purpose: Injects user profile personalization settings
- Activation: When user has profile settings (base style, custom instructions, nickname, etc.)
- Adds: Style preferences, custom instructions, and user context
- Priority: 5
10. Memory Module (Optional)
- Purpose: Injects user-specific memory/preferences into prompts
- Activation: Enabled via
PROMPT_ORCHESTRATION_MEMORY=trueand memory items present - Adds: Memory hints stitched into the system prompt
- Priority: 10
20. Tool Instructions Module (Optional)
- Purpose: Adds instructions for tool usage
- Activation:
PROMPT_ORCHESTRATION_TOOLS=trueand tools present in request - Adds: Tool selection and usage guidelines
- Priority: 20
30. Code Assistant Module (Template-Gated)
- Purpose: Enhances prompts for code-related questions
- Activation:
PROMPT_ORCHESTRATION_TEMPLATES=trueand code keywords detected - Adds: Code formatting guidelines, best practices, error handling tips
- Priority: 30
40. Chain-of-Thought Module (Template-Gated)
- Purpose: Encourages step-by-step reasoning for complex questions
- Activation:
PROMPT_ORCHESTRATION_TEMPLATES=trueand reasoning signals detected - Adds: Instructions to break down problems and think systematically
- Priority: 40
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
PROMPT_ORCHESTRATION_ENABLED | false | Enable/disable the processor |
PROMPT_ORCHESTRATION_MEMORY | false | Enable memory injection |
PROMPT_ORCHESTRATION_TEMPLATES | false | Enable template-based prompts (CoT + code assistant) |
PROMPT_ORCHESTRATION_TOOLS | false | Enable tool usage instructions |
YAML Configuration
In config/defaults.yaml:
services:
llm_api:
prompt_orchestration:
enabled: false
enable_memory: false
enable_templates: false
enable_tools: falseWire Integration
The processor is integrated via dependency injection in services/llm-api/internal/domain/provider.go:
// ProvidePromptProcessor creates the prompt processor with all modules including Deep Research
func ProvidePromptProcessor(
config prompt.ProcessorConfig,
log zerolog.Logger,
templateService *prompttemplate.Service,
) *prompt.ProcessorImpl {
processor := prompt.NewProcessorWithTemplateService(config, log, templateService)
// Register Deep Research module if prompt orchestration is enabled
if config.Enabled && templateService != nil {
processor.RegisterModule(prompt.NewDeepResearchModule(templateService))
}
return processor
}Implementation Details
Module Interface
Each module implements the Module interface:
type Module interface {
Name() string
ShouldApply(ctx context.Context, promptCtx *Context, messages []openai.ChatCompletionMessage) bool
Apply(ctx context.Context, promptCtx *Context, messages []openai.ChatCompletionMessage) ([]openai.ChatCompletionMessage, error)
}Context Structure
The prompt context contains all information needed for module decisions:
type Context struct {
UserID uint
ConversationID string
Language string
Preferences map[string]interface{}
Memory []string
ProjectInstruction string
AppliedModules []string
Profile *usersettings.ProfileSettings
}Processing Flow
- Context Building: Create a
prompt.Contextwith user ID, conversation ID, preferences, memory, project instruction, and user profile - Module Evaluation: Each registered module checks if it should apply via
ShouldApply() - Module Application: Applicable modules modify messages via
Apply()in priority order - Result: Enhanced messages are passed to the inference provider
Module Priority System
Modules are executed in priority order to ensure correct composition:
- Priority -20: Deep Research Module (comprehensive research prompts)
- Priority -15: Timing Module (creates base system prompt with date)
- Priority -10: Project Instruction Module (highest priority instructions)
- Priority 5: User Profile Module (personalization settings)
- Priority 10: Memory Module (adds user context)
- Priority 20: Tool Instructions (adds tool capabilities)
- Priority 30: Code Assistant (adds code-specific guidance)
- Priority 40: Chain-of-Thought (adds reasoning structure)
Usage in Chat Handler
The processor is integrated into the chat completion flow in chat_handler.go:
// Apply prompt orchestration (if enabled)
if h.promptProcessor != nil {
observability.AddSpanEvent(ctx, "processing_prompts")
preferences := make(map[string]interface{})
if len(request.Tools) > 0 || request.ToolChoice != nil {
preferences["use_tools"] = true
}
if persona := strings.TrimSpace(reqCtx.GetHeader("X-Prompt-Persona")); persona != "" {
preferences["persona"] = persona
}
if persona := strings.TrimSpace(reqCtx.Query("persona")); persona != "" {
preferences["persona"] = persona
}
// Pass deep_research flag to prompt orchestration
if request.DeepResearch != nil && *request.DeepResearch {
preferences["deep_research"] = true
}
var profileSettings *usersettings.ProfileSettings
if userSettings != nil {
profileSettings = &userSettings.ProfileSettings
}
promptCtx := &prompt.Context{
UserID: userID,
ConversationID: conversationID,
Language: strings.TrimSpace(reqCtx.GetHeader("Accept-Language")),
Preferences: preferences,
Memory: loadedMemory,
ProjectInstruction: projectInstruction,
Profile: profileSettings,
}
processedMessages, processErr := h.promptProcessor.Process(ctx, promptCtx, request.Messages)
if processErr != nil {
// Continue with original messages
} else {
request.Messages = processedMessages
if len(promptCtx.AppliedModules) > 0 {
reqCtx.Header("X-Applied-Prompt-Modules", strings.Join(promptCtx.AppliedModules, ","))
}
observability.AddSpanEvent(ctx, "prompts_processed")
}
}Template Service Integration
Modules can load prompts from the database via the prompttemplate.Service:
// Try to fetch template from database and render with variables
if m.templateService != nil {
template, err := m.templateService.GetByKey(ctx, prompttemplate.TemplateKeyTiming)
if err == nil && template != nil && template.IsActive {
rendered, renderErr := m.templateService.RenderTemplate(ctx, prompttemplate.TemplateKeyTiming, map[string]any{
"CurrentDate": currentDate,
})
if renderErr == nil {
timingText = rendered
}
}
}
// Fallback to hardcoded text if template not loaded
if timingText == "" {
timingText = "You are Jan, a helpful AI assistant..."
}Available Template Keys
| Template Key | Module | Variables |
|---|---|---|
timing | TimingModule | CurrentDate |
user_profile | UserProfileModule | BaseStyle, CustomInstructions, NickName, Occupation, MoreAboutYou |
memory | MemoryModule | MemoryItems |
tool_instructions | ToolInstructionsModule | ToolDescriptions |
code_assistant | CodeAssistantModule | (none) |
chain_of_thought | ChainOfThoughtModule | (none) |
deep_research | DeepResearchModule | (none) |
Example Transformations
Before Processing
{
"messages": [
{"role": "user", "content": "How do I implement binary search in Go?"}
]
}After Processing
With Timing + User Profile + Code Assistant modules applied:
{
"messages": [
{
"role": "system",
"content": "You are Jan, a helpful AI assistant. Jan is trained by Menlo Research (https://www.menlo.ai).\nToday is: December 16, 2025.\nAlways treat this as the current date.\n\nUser-level settings are preferences for style and context. If they ever conflict with explicit project or system instructions, always follow the project or system instructions.\n\nUse a friendly, warm, and encouraging tone while staying helpful.\n\nWhen providing code assistance:\n1. Provide clear, well-commented code.\n2. Explain your approach and reasoning.\n3. Include error handling where appropriate.\n4. Follow best practices and conventions.\n5. Suggest testing approaches when relevant.\n6. Respect project instructions and user constraints; never violate them to simplify code."
},
{"role": "user", "content": "How do I implement binary search in Go?"}
]
}With Project Instructions
When a conversation is linked to a project with instructions:
{
"messages": [
{
"role": "system",
"content": "Always respond in JSON format. Use TypeScript conventions.\n\nProject priority: These project-specific instructions have the highest priority. If any user settings, style preferences, or other guidance conflict with these project instructions, you must follow the project instructions."
},
{
"role": "system",
"content": "You are Jan, a helpful AI assistant...\n\nUse a friendly, warm, and encouraging tone..."
},
{"role": "user", "content": "Create a function to validate email addresses"}
]
}Disabling Modules
Users can disable specific modules via preferences:
promptCtx := &prompt.Context{
Preferences: map[string]interface{}{
"disable_modules": []string{"chain_of_thought", "code_assistant"},
},
}Or via the helper function:
promptCtx = prompt.WithDisabledModules(promptCtx, []string{"memory"})Supported formats for disable_modules:
- Comma-separated string:
"chain_of_thought,code_assistant" - String slice:
[]string{"chain_of_thought", "code_assistant"} - Interface slice:
[]interface{}{"chain_of_thought", "code_assistant"}
Observability
The processor emits:
- OTEL events:
processing_prompts,prompts_processed - Logs: Applied module list, processing errors, module priority order
- HTTP header:
X-Applied-Prompt-Modules(comma-separated) for debugging
Example log output:
{
"level": "debug",
"component": "prompt-processor",
"conversation_id": "conv-123",
"applied_modules": ["timing", "user_profile", "memory", "code_assistant"],
"message": "applied prompt orchestration modules"
}Testing
Run the test suite:
cd services/llm-api
go test ./internal/domain/prompt/... -vTests cover:
- Individual module behavior
- Module conditional logic
- Processor integration
- Configuration handling
- Module priority ordering
- Template service integration
Future Enhancements
Potential additions to the processor:
- Template Library: Pre-built templates for common tasks (writing, analysis, translation)
- Dynamic Persona: Adjust assistant personality based on context
- Language Detection: Automatically adapt to user's language
- Safety Filters: Add content moderation and safety rules
- A/B Testing: Compare different prompt strategies
- Custom Module Registry: Allow users to register custom modules
- Module Composition Rules: Define dependencies and conflicts between modules
- Prompt Versioning: Track and version prompt templates
- Performance Optimization: Cache compiled prompts for common scenarios
Related Documentation
Troubleshooting
Modules Not Applying
Check:
- Is
PROMPT_ORCHESTRATION_ENABLED=true? - Are specific module flags enabled (
MEMORY,TEMPLATES,TOOLS)? - Does the module's
ShouldApply()logic match your request? - Check logs for
X-Applied-Prompt-Modulesheader
Module Order Issues
Solution: Modules execute in priority order (-20, -15, -10, 5, 10, 20, 30, 40). Deep Research runs first, then Timing, then Project Instructions.
Memory Not Loading
Check:
- Is memory provided via
X-Prompt-Memoryheader or conversation metadata? - Is
promptCtx.Memorypopulated with items? - Is
PROMPT_ORCHESTRATION_MEMORY=true? - Is
MEMORY_ENABLED=truefor the memory-tools integration?
User Profile Not Applying
Check:
- Does the user have profile settings configured?
- Is at least one profile field non-empty (BaseStyle, CustomInstructions, NickName, etc.)?
- Is the
user_profilemodule not in the disabled list?
Template Not Loading from Database
Check:
- Is the template service properly initialized?
- Is the template active (
is_active: true)? - Check logs for template loading errors
- Fallback prompts will be used if template fails
Performance Concerns
Optimization:
- Modules are sorted once during processor initialization
- Each module only applies if
ShouldApply()returns true - Template service caches templates
- Consider disabling unused modules via environment variables