Overview
Creating a Text to Speech job involves entering text, selecting a voice, adjusting controls, and generating audio. The system processes your text and generates high-quality speech audio that you can preview, download, or use in other applications.TTS jobs are processed in real-time or near real-time. Most jobs complete within seconds to minutes depending on text length. You can preview audio while it’s being generated.
Step-by-Step Process
Step 1: Navigate to Text to Speech
- Click on Text to Speech in the navigation menu
- The TTS interface opens with the text editor and voice controls

Step 2: Enter Text Content
Enter or paste the text you want to convert to speech: Text Input Options:- Type directly: Type text in the text editor
- Paste text: Copy and paste text from another source
- Use clear, well-formatted text
- Add punctuation for natural pauses
- Break long paragraphs into shorter ones
- Check spelling and grammar
Step 3: Select a Voice
Choose the voice you want to use for speech generation. Click on the voice selection area to open the voice selection modal. Modal tabs:- Explore: Handpicked collections by use case (e.g. Arabic Narration, Social Media, Studio Conversational, Character Voices) and a “Weekly spotlight - New Voices” list. Use this to discover voices by context.
- My Voices: Your favorite voices in one place for quick access.
- All Voices: Full voice library with infinite scroll.
- Search: Type a voice name in the search field for instant results.
- Filter: Narrow by language, gender, style, dialect, or use case. Use “Clear filters” to reset.
- Explore collections: On the Explore tab, click a collection card to see voices for that use case.
- Open the voice selection area to open the modal.
- Switch between Explore, My Voices, or All Voices as needed.
- Preview a voice by clicking the play icon on a voice row.
- Click the voice (or Select in the modal) to apply it to your job. The modal closes and the chosen voice is shown in the TTS interface.
- System voices: Pre-trained voices from the library (Arabic and English, multiple styles and dialects).
- Custom voices: Your cloned or custom voices, same controls as system voices.
- Favorites: Voices you’ve marked as favorites appear under My Voices.
You can preview any voice before selecting it. Click the play icon next to a voice to hear a sample.
Step 4: Adjust Voice Controls (Optional)
Fine-tune the voice characteristics: Speed Control:- Range: 0x to 2x (default: 1x)
- Adjustment: Drag slider or enter value
- Effect: Controls how fast the voice reads
- Use cases: Slower for clarity, faster for quick playback
- Range: 0 to 2 (default: 1)
- Adjustment: Drag slider or enter value
- Effect: Controls emotional range and variation
- Use cases: More neutral for consistency, more expressive for dynamics
Step 5: Configure Additional Settings (Optional)
Dictionaries (Optional):- Open Manage (or “Click To Manage Dictionaries”) in the Dictionaries section to open the Dictionaries modal
- Add a new dictionary with ”+ New Dictionary”; delete a dictionary using the trash icon next to it
- Add and edit words inside a dictionary: click the pencil (edit) icon on a dictionary to open the editor, then add word–pronunciation pairs and save
- Select which dictionaries apply to this TTS job by checking the box next to each dictionary in the list; selected dictionaries apply custom pronunciation rules
- Useful for technical terms, proper nouns, or brand names
- Add supported emojis in text for emotion
- Affects voice delivery

- Add short or long pauses in text
- Creates natural speech rhythm
- Useful for emphasis or pacing

Step 6: Generate Audio
Create the TTS job:-
Review Settings
- Check text content
- Verify voice selection
- Review control settings
- Ensure everything is correct
-
Click Generate
- Click “Generate Speech” button
- Job is created and processing starts
-
Monitor Progress
- Live audio viewer shows progress while job is being generated
- Processing typically completes quickly
- Audio preview available when ready
-
Job Completion
- Audio is available for playback
- Download and share options available
Text Input Details
Text Editor Features
Editing Capabilities:- Inline editing: Edit text directly in the editor
- Copy and paste: Full clipboard support
- Undo/redo: Standard text editing functions
- Character count: Real-time character counting
Advanced Text Features
- Supported emojis: Add supported emojis in text for emotion
- Silence breaks: Insert pauses for natural pacing
- Short breaks: Brief pauses between phrases
- Long breaks: Extended pauses for emphasis
- Fillers: Add natural filler words (Uh, Umm) for realism
Voice Selection Details
Browsing Voices
Voice Library:- Scroll through available voices
- See voice names and metadata
- Preview voices with play button
- Filter and search options
- Name: Voice identifier
- Language: Supported language
- Dialect: Regional variant
- Gender: Male or female
- Style: Narrator, Conversational, etc.
Filtering Voices
Filter Options:- Language: Filter by language (Arabic, English, etc.)
- Gender: Filter by gender (Male, Female)
- Style: Filter by style (Narrator, Conversational)
- Dialect: Filter by regional dialect
- Search by voice name only
- Case-insensitive search
- Real-time results
- Clear search to reset
Voice Preview
Preview Features:- Click play icon to hear sample
- Sample audio plays automatically
- Compare different voices
- Helps choose right voice
- Preview multiple voices
- Compare similar voices
- Listen to sample quality
- Choose voice that matches content
Favorite Voices
Marking Favorites:- Click star icon on voice
- Voice added to favorites
- Quick access in favorites section
- Personal voice library
- Access favorites quickly
- Filter to show only favorites
- Organize frequently used voices
- Save time on voice selection
Voice Controls Details
Speed Control
Speed Range:- Minimum: 0x (very slow)
- Maximum: 2x (very fast)
- Default: 1x (normal speed)
- Step: 0.1x increments
- 0.5x - 0.8x: Very slow, clear delivery
- 0.9x - 1.1x: Normal conversational speed
- 1.2x - 1.5x: Fast, energetic delivery
- 1.6x - 2.0x: Very fast, quick playback
- Slower for important information
- Normal for general content
- Faster for quick summaries
- Adjust based on content type
Expressiveness Control
Expressiveness Range:- Minimum: 0 (more neutral)
- Maximum: 2 (more expressive)
- Default: 1 (balanced)
- Step: 0.1 increments
- 0 - 0.5: Neutral, consistent delivery
- 0.6 - 1.0: Balanced, natural variation
- 1.1 - 1.5: Expressive, dynamic delivery
- 1.6 - 2.0: Very expressive, emotionally varied
- Neutral for formal content
- Balanced for general content
- Expressive for engaging content
- Very expressive for dramatic content
Job Creation and Processing
Job Creation
Job Information:- Job ID assigned automatically
- Title (if supported)
- Creation timestamp
- Job saved to history
- Accessible from jobs list
- Can be viewed, edited, or deleted
- Links to generated audio
- Typically seconds to minutes
- Depends on text length
- Real-time or near real-time for short text
- Longer for very long text
Audio Generation
Generation Process:- Text is processed
- Voice model applied
- Controls applied
- Audio generated
- Available for playback
- High-quality output
- Natural speech patterns
- Clear pronunciation
- Professional quality
Credit Usage
Cost Calculation
Credit Usage:- Based on audio duration
- Credits per minute displayed
- Total cost estimated before generation
- Actual cost shown after completion
- Audio length (minutes)
- Credit rate per minute
- Voice type (some voices may vary)
- No additional charges for controls
Best Practices
Text Preparation
Content Quality:- Use clear, well-written text
- Check spelling and grammar
- Add appropriate punctuation
- Break long text into paragraphs
- Use optimize button for Arabic
- Add punctuation for pauses
- Consider text length
- Review before generating
Voice Selection
Choosing the Right Voice:- Match voice to content type
- Consider target audience
- Preview multiple voices
- Use favorites for consistency
- Use same voice for series
- Mark frequently used voices as favorites
- Maintain voice across related content
- Create voice guidelines
Control Settings
Starting Point:- Begin with default settings
- Adjust based on content
- Test different settings
- Save preferred settings
- Speed: Match content pace
- Expressiveness: Match content tone
- Adjust gradually
- Preview before final generation
Job Management
Organization:- Use descriptive titles (if supported)
- Organize jobs by project
- Review job history regularly
- Delete unused jobs
- Preview audio before using
- Review generated audio
- Regenerate if needed
- Export high-quality versions
Troubleshooting
Text Input Issues
Problem: Text not accepted Solutions:- Check text length limits
- Verify text format
- Remove special characters if needed
- Try simpler text
Voice Selection Issues
Problem: Voice not available Solutions:- Check voice filters
- Clear search/filters
- Verify voice availability
- Try different voice
Generation Issues
Problem: Job fails to generate Solutions:- Check text content
- Verify voice selection
- Check credit balance
- Try again with simpler text
Audio Quality Issues
Problem: Audio quality poor Solutions:- Check text quality
- Try different voice
- Adjust controls
- Review text formatting
Next Steps
After creating a TTS job:- Voice Selection - Learn about voice options and selection
- Voice Controls - Understand control settings
- Managing Jobs - Organize and manage your TTS jobs
- Overview - Learn about Text to Speech features


