Skip to main content

What is LLM Configuration?

The Language Model (LLM) is the brain of your voice agent, powering conversation understanding, response generation, and decision-making. Configure which model to use, how creative or consistent it should be, and how much it can say to optimize performance for your specific use case.
Supported LLM Providers:
  • OpenAI - GPT-5, GPT-4.1, GPT-4o (and Mini/Nano variants)
  • Gemini - Gemini 2.5 Pro, Gemini 2.5 Flash
  • DeepMyst - Voice-optimized GPT-4.1 models
  • Custom - Self-hosted or third-party models via OpenAI-compatible API

Key Configuration Parameters

Model Selection
  • Choose the right balance of speed, quality, and cost
  • Faster models (GPT-5 Nano, Gemini Flash) for high volume
  • Powerful models (GPT-5, GPT-4.1) for complex reasoning
Temperature (0.0 - 2.0)
  • Controls randomness and creativity of responses
  • Lower (0.2-0.4) for consistent, factual responses
  • Higher (0.7-1.0) for natural, varied conversations
Max Tokens (50 - 4096)
  • Limits the length of agent responses
  • Short (50-150) for quick confirmations
  • Medium (150-500) for balanced conversations
  • Long (500+) for detailed explanations
Advanced Parameters
  • Top P (nucleus sampling)
  • Frequency and presence penalties
  • Stop sequences

Model Comparison

Model TypeSpeedQualityCostBest For
Premium (GPT-5, GPT-4.1, Gemini Pro)⭐⭐⭐⭐⭐⭐⭐⭐⭐$$$Complex reasoning, critical conversations
Balanced (GPT-5 Mini, GPT-4.1 Mini, GPT-4o)⭐⭐⭐⭐⭐⭐⭐⭐⭐$$Most voice agent use cases
Fast (GPT-5 Nano, GPT-4.1 Nano, Gemini Flash)⭐⭐⭐⭐⭐⭐⭐⭐$High volume, cost-sensitive

Common Use Cases

Customer Support (Temperature: 0.3-0.5, Max Tokens: 150-300)
  • Consistent, accurate information delivery
  • Professional and predictable tone
  • Factual responses without creativity
Sales & Lead Qualification (Temperature: 0.6-0.8, Max Tokens: 200-400)
  • Engaging, natural conversations
  • Personality and warmth
  • Adaptive to caller mood
Appointment Scheduling (Temperature: 0.3-0.5, Max Tokens: 100-200)
  • Precise date/time handling
  • Minimal errors in critical details
  • Clear, unambiguous confirmations

Getting Started

Choose your approach based on how you want to configure models:

Configure in Dashboard

Select models and adjust parameters using the web interface. Perfect for testing different configurations and finding optimal settings.

Set via API

Configure LLM settings programmatically. Ideal for dynamic model selection and parameter tuning based on use case.

Learn More

Prompt Engineering

Write effective prompts for better results

Testing

Test different LLM configurations

Single Prompt Agents

Configure LLM for Single Prompt Agents

Flow Agents

Set LLM parameters in Flow Agents

Best Practices

Model Selection
  • Start with balanced models (GPT-5 Mini, GPT-4.1 Mini)
  • Use faster models for simple, high-volume tasks
  • Reserve premium models for complex reasoning
Temperature Tuning
  • Begin conservative (0.4) and increase if needed
  • Test extensively with real scenarios
  • Lower temperature reduces hallucinations
  • Higher temperature for more natural conversations
Cost Optimization
  • Monitor token usage and costs continuously
  • Set appropriate max tokens for your use case
  • Use cheaper models where quality isn’t critical
  • Implement smart routing based on complexity
Performance Monitoring
  • Track response times and latency
  • Monitor error rates and quality metrics
  • Adjust parameters based on user feedback
  • A/B test different configurations