What is Speech to Text?
Speech to Text (STT) converts audio and video content into structured, editable text transcripts. It is the foundational feature that enables content reuse across the platform, providing accurate transcription with speaker detection, time-based segmentation, and comprehensive management tools.Speech to Text enables you to:
- Transcribe audio and video files into editable text
- Automatically detect and separate speakers
- Edit and refine transcriptions
- Export transcripts in multiple formats
- Manage multiple transcription jobs
Core Capabilities
Audio and Video Transcription
Transform your media files into accurate text transcripts with support for:- Multiple file formats: MP3, MP4, WAV, AVI, MOV, WMV, WEBM
- File uploads: Direct upload from your device (max 200MB)
- YouTube videos: Transcribe directly from YouTube links
- Live recording: Record audio directly in the browser
- Language selection: Arabic by default with English support
Speakers Detection and Separation
Automatically identify and separate different speakers in your recordings:- Automatic speaker separation: Distinguishes between different speakers
- Speaker labeling: Automatically assigns labels (Speaker 1, Speaker 2, etc.)
- Speaker management: Rename, merge, and organize speakers
- Visual separation: Clear visual indicators for different speakers
Time-Based Segmentation
Transcripts are organized into segments synchronized with audio playback:- Automatic segmentation: Content divided into manageable segments
- Timestamp synchronization: Each segment linked to specific time points
- Audio playback: Click segments to play corresponding audio
- Segment editing: Edit individual segments independently
Input Methods
File Upload
Upload audio or video files directly from your device:- Supported formats: MP3, MP4, WAV, AVI, MOV, WMV, WEBM
- Maximum size: 200MB per file
- Multiple files: Upload up to 5 files at once
- Drag and drop: Easy file selection interface
YouTube Link Transcription
Transcribe videos directly from YouTube:- URL input: Paste YouTube video links
- Automatic extraction: System extracts audio from video
- Video metadata: Automatically retrieves video title
- Language selection: Specify the primary language
Live Audio Recording
Record audio directly in your browser:- In-browser recording: No external software required
- Real-time monitoring: See recording status and duration
- Pause and resume: Full control over recording process
- Immediate processing: Start transcription right after recording
Transcript Management
Viewing and Organization
- Transcript list: View all your transcription jobs
- Status tracking: Monitor job progress (In Progress, Completed, Failed)
- Search functionality: Find transcripts by file name
- Filtering options: Filter by status
- Pagination: Navigate through large lists efficiently
Editing Capabilities
- Inline editing: Edit transcript text directly
- Segment-level editing: Modify individual segments
- Copy and move: Move text between segments
- Create segments: Add new segments as needed
- Merge content: Combine segments across speakers
File Management
- Rename transcripts: Update job titles
- Delete transcripts: Remove jobs with confirmation dialog
- Job status: Track processing status in real-time
- Job history: Access previous transcription jobs
Speaker Management
Speaker Organization
- Automatic detection: System identifies speakers automatically
- Speaker labels: View all speakers in your transcript
- Rename speakers: Assign meaningful names to speakers
- Add speakers: Manually add new speakers if needed
Speaker Operations
- Merge speakers: Combine multiple speakers into one
- Validation: System prevents invalid merge operations
- Deletion protection: Cannot delete speakers with associated segments
- Speaker count: Visual indicators for speaker distribution
Segment Editing
Segment Operations
- Edit segments: Modify text content of individual segments
- Copy segments: Duplicate segment content
- Move segments content: Reorganize content between segments
- Create segments: Add new segments manually
- Delete segments: Remove unwanted segments
Text Management
- Inline editing: Direct text editing within segments
- Copy and paste: Easy text manipulation
- Search within segments: Find specific content
- Timestamp preservation: Maintain audio synchronization
Export Capabilities
Export your transcripts in multiple formats:Document Formats
- DOCX: Microsoft Word format for editing
- PDF: Portable document format for sharing
- TXT: Plain text format for simple use
- HTML: Web-ready format with formatting
Structured Formats
- JSON: Structured data format for integration
- SRT: Subtitle format for video synchronization
Export Features
- One-click download: Quick export in any format
- Format-specific formatting: Optimized output per format
- Metadata preservation: Includes timestamps and speaker information
Processing States
Transcription jobs move through different states:| Status | Description | Next Action |
|---|---|---|
| PENDING | Transcription is being processed | Wait for completion |
| COMPLETED | Transcription finished successfully | View and edit transcript |
| FAILED | Processing encountered an error | Review error and retry |
Key Features
Language Handling
- Bilingual support: Handles arabic english bilingual content
- Language selection: Specify language when creating job
- Accent recognition: Handles various accents and dialects
Real-Time Status Updates
- Live status tracking: See job progress in real-time
- Status indicators: Clear visual status indicators
- Completion notifications: Get notified when jobs complete
- Error notifications: Immediate feedback on failures
Search and Filter
- Global search: Search across all transcript titles
- Status filtering: Filter by job status
Use Cases
Meeting Transcription
Transcribe meetings and conferences:- Upload meeting recordings
- Automatic speaker separation
- Export transcription
Content Creation
Create high-quality written content from audio and video using Speech-to-Text and AI Docs:- Accurately transcribe podcasts, interviews, and meetings
- Transform audio recordings into well-structured blog posts and articles
- Generate engaging social media posts from spoken content
- Produce documentation from audio and video sources
Accessibility
Make content accessible:- Generate captions for videos
- Create transcripts for audio content
- Export in multiple formats
Research and Documentation
Document research and interviews:- Transcribe research interviews
- Organize by speaker
- Export for analysis
- Maintain timestamps for reference
Getting Started
-
Create Your First Transcription
- Upload a file, paste a YouTube link, or record audio
- Select the primary language
- Enter a title for your transcript
- Submit and wait for processing
-
Review and Edit
- View your completed transcript
- Edit segments as needed
- Rename speakers
- Organize content
-
Export Your Transcript
- Choose your export format
- Download your transcript
- Share with your team
What’s Next?
- Creating Transcriptions - Learn how to create transcription jobs
- Managing Transcripts - Organize and edit your transcripts
- Speaker Management - Work with speaker separation
- Export Options - Export in various formats