- Voice Cloning Recording Script
- Recording Setup Checklist
- Section 1 — Phoneme Coverage (All English Sounds)
- Section 2 — Prosody and Intonation
- Section 3 — Conversational and Casual
- Section 4 — Emotional Range
- Section 5 — Numbers, Dates, Technical Terms
- Section 6 — Sustained Narration (60-90 seconds)
- Section 7 — Multilingual Samples (Optional but Recommended)
- Recording Tips
Voice Cloning Recording Script
Recording Setup Checklist
- Quiet room, minimal reverb (your home studio with acoustic treatment is ideal)
- Consistent mic distance (~15-20 cm), slight off-axis to reduce plosives
- 48 kHz / 24-bit WAV (no compression)
- Pop filter on
- Record a 10-second silence clip for noise floor reference
- Keep hydrated, avoid dairy and coffee beforehand
Section 1 — Phoneme Coverage (All English Sounds)
Read each sentence clearly at a natural, conversational pace. These are designed to hit every English phoneme.
The quick brown fox jumps over the lazy dog near a huge vat of bubbling wax.
She sells sea shells on the sea shore, and the shells she sells are sea shells, I'm sure.
Please call Stella and ask her to bring these things with her from the store: six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.
The beige hue on the waters of the loch impressed all, including the French queen, before she heard that symphony again, just as young Arthur wanted.
That red truck rushed through the moist, foggy path, kicking chunks of earth behind it with a thud.
Joe gazed at the azure sky, visualizing the journey ahead, then whispered a casual "yes" to the pleasurable thought.
Section 2 — Prosody and Intonation
These cover questions, exclamations, lists, and emphasis shifts.
Are you coming to the meeting tomorrow, or should I reschedule it to Thursday?
Wait — you actually finished the whole thing in one night? That's incredible!
I need three things: a new charger, a decent pair of headphones, and maybe some coffee.
It's not that I don't want to go. It's that I can't go. There's a difference.
Well, I suppose we could try it your way. But if it fails, don't say I didn't warn you.
So the question isn't whether it works — it's why it works so well.
Section 3 — Conversational and Casual
Read these as if you're talking to a colleague on a video call.
Yeah, so basically what happened was — the whole system went down around three, and nobody could figure out why until someone checked the logs.
Honestly, I think the second option makes more sense. But hey, I'm open to other ideas if you've got them.
Right, so let me just pull that up real quick... okay, here we go. So if you look at the numbers from last quarter...
I mean, it's a good product. No question. The issue is positioning — we're not telling the right story yet.
Cool, let me send that over after the call. Shouldn't take more than a few minutes.
Section 4 — Emotional Range
Label each one before recording so you can stay in the right register.
Confident / Assertive:
We've built something that solves a real problem, and the results speak for themselves. This is exactly where we need to be.
Warm / Encouraging:
You've done great work here. Seriously — take a moment to appreciate how far you've come. I'm impressed.
Concerned / Serious:
I want to be upfront with you — the timeline is tight, and if we don't address this now, we're going to run into real issues down the road.
Excited / Enthusiastic:
This is huge! I mean, do you realize what this means? We just unlocked an entirely new use case that nobody else is doing yet!
Calm / Explanatory:
So the way it works is actually quite straightforward. You upload the file, the system processes it in the background, and within a few seconds you get the result.
Frustrated / Impatient:
Look, we've been going back and forth on this for weeks. At some point, we just need to make a decision and move forward.
Section 5 — Numbers, Dates, Technical Terms
These train the model on tricky non-word content.
The total revenue for Q3 2025 was fourteen point seven million euros, up twelve percent year over year.
The meeting is scheduled for Wednesday, June eighteenth, at two thirty PM Central European Time.
Our API handles roughly forty-two thousand requests per second with a p99 latency of under two hundred milliseconds.
The serial number is Alpha-Bravo-Seven-Four-Two-Niner, and the firmware version is three point one point eight.
You can reach me at plus forty-nine, one-seven-one, five-five-five, zero-three-two-one.
Section 6 — Sustained Narration (60-90 seconds)
Read this as a single continuous passage. Keep energy consistent.
Let me walk you through how we approached this. About six months ago, we started noticing a pattern in the way our enterprise clients were using the platform. They weren't just uploading content and distributing it — they were building entire workflows around it. Training programs, onboarding sequences, compliance modules, the whole thing. And what struck us was that most of them were doing this manually. They'd record a video, upload it, tag it, assign it to a group, and then track completion in a spreadsheet somewhere. It was functional, but it was slow and error-prone. So we asked ourselves: what if the platform could handle all of that automatically? What if you could just drop a video in, and the system would generate chapters, suggest tags, create a quiz, and route it to the right audience — all without human intervention? That's essentially what we built. And the feedback has been overwhelmingly positive. Clients are saving hours per week, engagement is up, and completion rates have nearly doubled in some cases.
Section 7 — Multilingual Samples (Optional but Recommended)
If the cloning service supports multilingual voices, add short passages in your other languages.
German:
Guten Tag, mein Name ist Nic. Ich arbeite im Bereich Enterprise-Video und helfe Unternehmen dabei, ihre Kommunikation und Schulungen mit KI-gestützten Lösungen zu verbessern.
French:
Bonjour, je m'appelle Nic. Je travaille dans le domaine de la vidéo d'entreprise et j'aide les organisations à transformer leur communication interne grâce à l'intelligence artificielle.
Recording Tips
- Do two takes of each section — one careful, one natural
- Pause 2-3 seconds between sentences
- If you stumble, pause, then re-read the full sentence from the beginning
- Total recording time: aim for 20-30 minutes of clean audio
- Export individual sections as separate WAV files for easier processing