Skip to main content

What is LLM Configuration?

The Language Model (LLM) is the brain of your voice agent, powering conversation understanding, response generation, and decision-making. Configure which model to use, how creative or consistent it should be, and how much it can say to optimize performance for your specific use case.
Supported LLM Providers:
  • OpenAI - GPT-5, GPT-4.1, GPT-4o (and Mini/Nano variants)
  • Gemini - Gemini 2.5 Pro, Gemini 2.5 Flash
  • DeepMyst - Voice-optimized GPT-4.1 models
  • Custom - Self-hosted or third-party models via OpenAI-compatible API

Key Configuration Parameters

Model Selection
  • Choose the right balance of speed, quality, and cost
  • Faster models (GPT-5 Nano, Gemini Flash) for high volume
  • Powerful models (GPT-5, GPT-4.1) for complex reasoning
Temperature (0.0 - 2.0)
  • Controls randomness and creativity of responses
  • Lower (0.2-0.4) for consistent, factual responses
  • Higher (0.7-1.0) for natural, varied conversations
Max Tokens (50 - 4096)
  • Limits the length of agent responses
  • Short (50-150) for quick confirmations
  • Medium (150-500) for balanced conversations
  • Long (500+) for detailed explanations
Advanced Parameters
  • Top P (nucleus sampling)
  • Frequency and presence penalties
  • Stop sequences

Model Comparison

Model TypeSpeedQualityCostBest For
Premium (GPT-5, GPT-4.1, Gemini Pro)⭐⭐⭐⭐⭐⭐⭐⭐⭐$$$Complex reasoning, critical conversations
Balanced (GPT-5 Mini, GPT-4.1 Mini, GPT-4o)⭐⭐⭐⭐⭐⭐⭐⭐⭐$$Most voice agent use cases
Fast (GPT-5 Nano, GPT-4.1 Nano, Gemini Flash)⭐⭐⭐⭐⭐⭐⭐⭐$High volume, cost-sensitive

Common Use Cases

Customer Support (Temperature: 0.3-0.5, Max Tokens: 150-300)
  • Consistent, accurate information delivery
  • Professional and predictable tone
  • Factual responses without creativity
Sales & Lead Qualification (Temperature: 0.6-0.8, Max Tokens: 200-400)
  • Engaging, natural conversations
  • Personality and warmth
  • Adaptive to caller mood
Appointment Scheduling (Temperature: 0.3-0.5, Max Tokens: 100-200)
  • Precise date/time handling
  • Minimal errors in critical details
  • Clear, unambiguous confirmations

Getting Started

Choose your approach based on how you want to configure models:

Learn More

Best Practices

Model Selection
  • Start with balanced models (GPT-5 Mini, GPT-4.1 Mini)
  • Use faster models for simple, high-volume tasks
  • Reserve premium models for complex reasoning
Temperature Tuning
  • Begin conservative (0.4) and increase if needed
  • Test extensively with real scenarios
  • Lower temperature reduces hallucinations
  • Higher temperature for more natural conversations
Cost Optimization
  • Monitor token usage and costs continuously
  • Set appropriate max tokens for your use case
  • Use cheaper models where quality isn’t critical
  • Implement smart routing based on complexity
Performance Monitoring
  • Track response times and latency
  • Monitor error rates and quality metrics
  • Adjust parameters based on user feedback
  • A/B test different configurations