Voice-Over Glossary
Plain-English definitions of voice-over and audio production terminology — from IVR and ADR to M&E, ACX, walla, tajweed, and broadcast loudness.
Voice-over and audio production has its own vocabulary that can be confusing for clients new to the industry. This glossary defines the terms you will encounter when ordering or producing voice-over work — telephony terms (IVR, on-hold, µ-Law, A-Law), broadcast terms (LUFS, true-peak, M&E), narration terms (ACX, finished hour, pickup), dubbing terms (ADR, lip-sync, walla), and Arabic-specific terms (tajweed, tashkeel, MSA, Khaleeji).
IVR
Interactive Voice Response. The telephone menu system that answers business phone calls and routes callers to the right department. IVR voice-over consists of recorded prompts ("Press 1 for sales, 2 for support") combined at runtime by the IVR engine.
On-Hold (or Music On-Hold)
Audio content played to callers waiting in a phone queue. Best practice is rotating informational segments (services, promotions, location) interleaved with licensed music — not silence and not generic music alone.
µ-Law (Mu-Law) and A-Law
Audio compression formats used in telephone systems. µ-Law is standard in North America and Cisco systems; A-Law is standard in European telephony and many Avaya deployments. IVR voice-over is delivered as 8kHz WAV in one of these codecs.
Concatenable Prompts
IVR prompts recorded so that short phrases can be combined at runtime to form longer messages. ("Your balance is" + "five hundred" + "and twenty-three" + "dirhams"). Requires careful prosodic engineering so seams are inaudible.
ADR (Automated Dialogue Replacement)
Re-recording dialogue in post-production to replace on-set audio that was unusable. Performed in a recording studio with the actor watching the original scene and matching lip movements. Standard practice in film and TV post-production.
Lip-Sync
Recording voice-over to match the mouth movements of an on-screen actor — required for dubbing foreign-language film and TV into another language. Different from "off-screen" voice-over where lip movement doesn't need matching.
M&E (Music and Effects)
Audio mix containing all elements of a film or TV soundtrack except dialogue. Required for dubbing into other languages — the localised dialogue track is mixed against the M&E to produce the final dubbed version.
Walla
Crowd background noise dubbed into a scene — restaurants, stadiums, schools, conferences. Recorded with multiple voice talents to create realistic ambient sound. Often improvised with general "background talk" rather than scripted.
ACX
Audiobook Creation Exchange — Amazon's audiobook distribution platform and its associated technical standards. ACX-compliant audio: 192kbps or higher MP3, RMS -23 to -18 dB, peak -3 dB max, room tone before/after each chapter.
Finished Hour
One hour of completed, listener-ready audiobook content — not one hour of studio time. A skilled narrator produces 1.5-2.5 finished hours per studio day. Used as the standard pricing unit for audiobook work.
Pickup (or Pickup Credit)
A small re-recording to fix an isolated issue — mispronounced word, awkward pacing on one line, missed phrase. Most voice-over services include 2-3 free pickup credits per project before charging for additional revisions.
Room Tone
A brief recording of the recording booth's ambient sound with no one speaking. Used in post-production to fill silent gaps without sudden dead-silence cuts. Required for ACX audiobook delivery — usually 5-10 seconds at start and end of each chapter.
Source-Connect
Industry-standard software for high-quality remote-direction voice-over recording sessions. The director (anywhere globally) listens to the talent in the studio (Dubai) in real-time at broadcast quality, providing live direction.
ipDTL
Browser-based alternative to Source-Connect for remote-direction sessions. No software installation required for the directing end — runs in Chrome or Edge. Used widely in broadcast journalism for remote contributor audio.
LUFS (Loudness Units relative to Full Scale)
The modern standard for measuring perceived audio loudness. Broadcast TV targets vary by region: -23 LUFS for European broadcast, -24 LUFS for US ATSC, varies for streaming services. Replaces older "peak" measurements that didn't reflect actual perceived loudness.
True-Peak
The actual peak audio level after digital-to-analog conversion, including inter-sample peaks invisible to standard peak meters. Broadcast standards require true-peak ceiling of -1 dBTP or lower to prevent clipping in transmission chains.
MSA (Modern Standard Arabic)
The formal written and broadcast standard of the Arabic language — universally understood across all 22 Arabic-speaking countries but no one's native spoken dialect. Used for news, formal corporate communications, government, religious texts, and education.
Tashkeel
The diacritical marks added to Arabic text to indicate vowel sounds — necessary for precise pronunciation, particularly in religious content, formal MSA broadcasting, and educational material. Most Arabic texts are written without tashkeel; reading them correctly requires native fluency or marked text.
Tajweed
The rules of proper Quranic recitation — covering pronunciation, articulation points (makharij), elongation (mudood), nasalisation (ghunnah), and percussive consonants (qalqalah). Required for Quranic content; specialist talents are trained specifically in tajweed-compliant recitation.
Khaleeji
A pan-Gulf Arabic register that draws features common across UAE, Saudi, Kuwaiti, Bahraini, Qatari, and Omani dialects. Used for cross-border GCC content where you want regional reach without specifying one country.
Ready to commission voice-over? Quote in 20 minutes.
Free quote in 20 minutes during studio hours.