Skip to main content

What is Speech to Text?

Speech to Text (STT) converts audio and video content into structured, editable text transcripts. It is the foundational feature that enables content reuse across the platform, providing accurate transcription with speaker detection, time-based segmentation, and comprehensive management tools.
Speech to Text enables you to:
  • Transcribe audio and video files into editable text
  • Automatically detect and separate speakers
  • Edit and refine transcriptions
  • Export transcripts in multiple formats
  • Manage multiple transcription jobs

Core Capabilities

Audio and Video Transcription

Transform your media files into accurate text transcripts with support for:
  • Multiple file formats: MP3, MP4, WAV, AVI, MOV, WMV, WEBM
  • File uploads: Direct upload from your device (max 200MB)
  • YouTube videos: Transcribe directly from YouTube links
  • Live recording: Record audio directly in the browser
  • Language selection: Arabic by default with English support

Speakers Detection and Separation

Automatically identify and separate different speakers in your recordings:
  • Automatic speaker separation: Distinguishes between different speakers
  • Speaker labeling: Automatically assigns labels (Speaker 1, Speaker 2, etc.)
  • Speaker management: Rename, merge, and organize speakers
  • Visual separation: Clear visual indicators for different speakers

Time-Based Segmentation

Transcripts are organized into segments synchronized with audio playback:
  • Automatic segmentation: Content divided into manageable segments
  • Timestamp synchronization: Each segment linked to specific time points
  • Audio playback: Click segments to play corresponding audio
  • Segment editing: Edit individual segments independently

Input Methods

File Upload

Upload audio or video files directly from your device:
  • Supported formats: MP3, MP4, WAV, AVI, MOV, WMV, WEBM
  • Maximum size: 200MB per file
  • Multiple files: Upload up to 5 files at once
  • Drag and drop: Easy file selection interface
Transcribe videos directly from YouTube:
  • URL input: Paste YouTube video links
  • Automatic extraction: System extracts audio from video
  • Video metadata: Automatically retrieves video title
  • Language selection: Specify the primary language

Live Audio Recording

Record audio directly in your browser:
  • In-browser recording: No external software required
  • Real-time monitoring: See recording status and duration
  • Pause and resume: Full control over recording process
  • Immediate processing: Start transcription right after recording

Transcript Management

Viewing and Organization

  • Transcript list: View all your transcription jobs
  • Status tracking: Monitor job progress (In Progress, Completed, Failed)
  • Search functionality: Find transcripts by file name
  • Filtering options: Filter by status
  • Pagination: Navigate through large lists efficiently

Editing Capabilities

  • Inline editing: Edit transcript text directly
  • Segment-level editing: Modify individual segments
  • Copy and move: Move text between segments
  • Create segments: Add new segments as needed
  • Merge content: Combine segments across speakers

File Management

  • Rename transcripts: Update job titles
  • Delete transcripts: Remove jobs with confirmation dialog
  • Job status: Track processing status in real-time
  • Job history: Access previous transcription jobs

Speaker Management

Speaker Organization

  • Automatic detection: System identifies speakers automatically
  • Speaker labels: View all speakers in your transcript
  • Rename speakers: Assign meaningful names to speakers
  • Add speakers: Manually add new speakers if needed

Speaker Operations

  • Merge speakers: Combine multiple speakers into one
  • Validation: System prevents invalid merge operations
  • Deletion protection: Cannot delete speakers with associated segments
  • Speaker count: Visual indicators for speaker distribution

Segment Editing

Segment Operations

  • Edit segments: Modify text content of individual segments
  • Copy segments: Duplicate segment content
  • Move segments content: Reorganize content between segments
  • Create segments: Add new segments manually
  • Delete segments: Remove unwanted segments

Text Management

  • Inline editing: Direct text editing within segments
  • Copy and paste: Easy text manipulation
  • Search within segments: Find specific content
  • Timestamp preservation: Maintain audio synchronization

Export Capabilities

Export your transcripts in multiple formats:

Document Formats

  • DOCX: Microsoft Word format for editing
  • PDF: Portable document format for sharing
  • TXT: Plain text format for simple use
  • HTML: Web-ready format with formatting

Structured Formats

  • JSON: Structured data format for integration
  • SRT: Subtitle format for video synchronization

Export Features

  • One-click download: Quick export in any format
  • Format-specific formatting: Optimized output per format
  • Metadata preservation: Includes timestamps and speaker information

Processing States

Transcription jobs move through different states:
StatusDescriptionNext Action
PENDINGTranscription is being processedWait for completion
COMPLETEDTranscription finished successfullyView and edit transcript
FAILEDProcessing encountered an errorReview error and retry
Failed jobs cannot be used until the error is resolved. Check the error message for details on what went wrong.

Key Features

Language Handling

  • Bilingual support: Handles arabic english bilingual content
  • Language selection: Specify language when creating job
  • Accent recognition: Handles various accents and dialects

Real-Time Status Updates

  • Live status tracking: See job progress in real-time
  • Status indicators: Clear visual status indicators
  • Completion notifications: Get notified when jobs complete
  • Error notifications: Immediate feedback on failures

Search and Filter

  • Global search: Search across all transcript titles
  • Status filtering: Filter by job status

Use Cases

Meeting Transcription

Transcribe meetings and conferences:
  • Upload meeting recordings
  • Automatic speaker separation
  • Export transcription

Content Creation

Create high-quality written content from audio and video using Speech-to-Text and AI Docs:
  • Accurately transcribe podcasts, interviews, and meetings
  • Transform audio recordings into well-structured blog posts and articles
  • Generate engaging social media posts from spoken content
  • Produce documentation from audio and video sources

Accessibility

Make content accessible:
  • Generate captions for videos
  • Create transcripts for audio content
  • Export in multiple formats

Research and Documentation

Document research and interviews:
  • Transcribe research interviews
  • Organize by speaker
  • Export for analysis
  • Maintain timestamps for reference

Getting Started

  1. Create Your First Transcription
    • Upload a file, paste a YouTube link, or record audio
    • Select the primary language
    • Enter a title for your transcript
    • Submit and wait for processing
  2. Review and Edit
    • View your completed transcript
    • Edit segments as needed
    • Rename speakers
    • Organize content
  3. Export Your Transcript
    • Choose your export format
    • Download your transcript
    • Share with your team

What’s Next?