Google Play currently offers authors auto-narration for free, generated from ebooks you sell (or provide a preview of) in their store. While Google does not recommend auto-narration by A.I. for fiction, I decided to give it a try for my novel Little Sister Song. This post isn’t a “how to create” auto-narration, or even a “why use” auto-narration, but addresses some problems and solutions as you edit and fine-tune your auto-narrated audiobook.
There are many voice options, although only one that was suitable for my book (I needed a youngish American woman) – I chose “Michelle”, sped up to x1.20 because at normal speed the dialogue, in particular, seemed to drag. And while Michelle can’t give me whispering, screaming, sobbing, and other high emotion, the tone and inflection of the voice isn’t bad at all, and the audio quality is high.
But a few fixes were needed before I was satisfied that a listener would have a great experience with the audiobook.
The first important thing to consider is that a human actor can vary the main characters’ spoken voices enough to make it obvious who’s speaking, and on the book page it’s visually clear when a new speaker starts speaking, while an A.I. narrator can’t do either of these things or even make a clear distinction between narration and dialogue. So I added some he saids and she saids for clarity. An alternative is to have one speaker use the other’s name in the middle of a lengthy back-and-forth, just in case the listener has lost track. Don’t overdo this, as it’s not the way people naturally converse.
Below I talk about the main problems I found with Michelle’s narration, and how I fixed them. I’ve included short snippets of audio from my book to illustrate the points I’m making, but remember these are specific to the Michelle narrator voice. Other voices do use different inflections, so take these tips as a guide and figure out what works for your story and your chosen narrator.
Please note: I recorded the snippets with other software, and the sound quality is echoey and nowhere near as good as Google’s A.I., which you can more accurately judge by listening to this sample from Little Sister Song – or click Play below to hear a short sample from the middle of chapter 1:
Errors in pronunciation are probably going to annoy your listeners the most. Three common errors with the Google Play A.I. narrator are:
- homographs (read vs. read; project vs project) – even when you’d think it’s obvious from context
- invented words that the A.I. has to guess
- the infamous English schwa and other minor pronunciation irritations that native listeners will notice as “wrong”.
When the A.I. mispronounces a word, right click on it and “Edit pronunciation”. This almost always brings up a box showing the word spelled phonetically, along with options. For example, “read” will give: ɹɛd and ɹiːd. (TIP: List of phonetics.) You can listen to each and pick the correct one. Or, you can speak into the mic and record your own, which will be converted into phonetics for the narrator. Or, you can simply change the script (e.g. write “red” instead of “read”). This won’t alter your ebook text, by the way, so don’t worry about making weird changes to the audio script.
Here’s one of those OMG you’re such a perfectionist! examples that I found in my book:
After several tense minutes, Joy stopped at a T-intersection with no signposts.
Can you hear the error? A native English speaker doesn’t say “a” in this sentence, but uses schwa (ə), pronounced “uh”:
For the most part, Michelle does get this sort of thing right – but this one bugged me, so I fixed it.
To my surprise, Michelle fairly accurately pronounced the few French and Spanish words in my books without the need to fix the pronunciation: Répondez s’il vous plaît. Mijo. Cajones.
TIP: Once you’ve changed the pronunciation, the word gets a blue underline in the script. It may now become unsearchable in the search box. In the above example, searching my audio script for the phrase “Joy stopped at a T-intersection” now returns zero hits because I changed the pronunciation of “a”.
The A.I. has a problem with short words alone in a sentence, such as: “No.” Both the inflection and the quality of the sound is really off – it sounds robotic rather than “breathed” by a living voice, always with exactly the same flat inflection – perhaps because the A.I has no context for the word? Listen to these examples, where Michelle says some one-word sentences, followed by the same word in a longer sentence:
I haven’t really found a solution to this, except in cases where it’s possible to run the word on to the next sentence using a semi-colon or slash (more about these tricks below) which in turn forces the A.I. to add some inflection to the first word. On the other hand, certain short phrases are said with the perfect natural emotional tone, exemplified here:
And don’t expect exclamations like “Hey!” to sound very… exclamatory. On the other hand, whenever Michelle reads the word “interested” or “interesting”, her pitch goes up and she sounds interested!
Michelle also clips the start of some short words that begin sentences. This is barely noticeable… until you notice it, and then you can’t stop noticing it. I mostly notice it with the word “He” – in this example, the three times she says “he” within a sentence are correct, but she says “ee” at the start of two sentences (underlined):
Indio thought of all the things he could say to prolong the fight. He was good with words that way. He used to talk circles around Harry, poking the hornet’s nest for fun when he could be bothered. Other times he just walked away.
There is a similar problem when words like why and where start the sentence – the “w” is clipped.
Michelle does some very odd things with initials and acronyms. Listen to how she reads this sentence:
“There’s a chance she’s on her way to Portland, or traveling through there.”
While I can see why this happened, it’s not what I wanted! I placed “or” in quotes to fix it. (TIP: It doesn’t matter that the quotes are already within speech quotes.)
“There’s a chance she’s on her way to Portland, “or” traveling through there.”
Michelle does few other random things in a similar vein:
- She says “representative” when she reads “rep” (when I meant “reputation”).
- She says “county” for CO (commanding officer).
- She says “double A” for “AA” (Alcoholics Anonymous).
- If a sentence ends with a single capital letter and period, the A.I. narrator may think it’s an initial and will run that sentence on with the next one.
- She may get this sort of thing wrong: “a Class A misdemeanor” – and instead says: “uh class uh misdemeanor”
These are easy fixes (put the word/acronym in quotes or add extra periods or spaces) but occur unpredictably.
A huge barrier to a believable listening experience is the dialogue. And Google’s auto-narration has a problem here that I’d have thought was an easy fix (in terms of training the A.I. better). Instead, I had to manually fix it hundreds of times. Here’s the problem: when a piece of dialogue ends with an exclamation point or question mark, and the dialogue tag starts with a capital letter, Michelle thinks it’s two sentences, with a resulting nonsensical inflection. In this example she is getting it right, because the tag starts with a lower case word (she):
“Why are there so many kinds of bread?” she asked Jesse.
And here she is getting it wrong, because the tag starts with a capital letter (Wynter):
“Can’t we stay a bit longer?” Wynter said.
To fix the inflection, I had to add a comma to every single one of these instances. I did a search-and-replace, and then reassessed any that still sounded wrong when I listened back to the entire book, because of course sometimes the dialogue really is followed by a new sentence. (TIP: If your script uses smart quotes, they won’t be found if you use straight quotes in the search box.)
“Can’t we stay a bit longer?,” Wynter said.
You’ll notice the voice no longer goes up at the end of the question after applying this fix (because the question mark is being ignored), but in most cases this still sounds natural. Replacing the question mark with a comma (instead of just adding the comma) has almost the same effect, but if you search-and-replace you’ll lose question marks that you’d wanted to keep.
Inflection: pauses and emphasis
There are several ways to improve the pauses (or lack of) within the text, and each method slightly changes the inflection too. Try adding commas or replacing them with slashes, semi-colons, ellipses, or text dashes (or spaced hyphens). In longer sentences, these punctuation marks may make the A.I. narrator take a breath. This may or may not be appropriate, so it’s something you’ll need to play around with.
Look at this short piece of dialogue, where the speaker – Jesse – is introducing himself:
“I’m his brother, Jesse. You can talk to me.”
Michelle gets the inflection wrong and thinks the speaker is talking to someone called “Jesse”:
Removing the comma, or replacing it with a text dash, didn’t help in this case. To fix the inflection, I changed the audio script to this:
“I’m his brother / Jesse; You can talk to me.”
That slash creates a very short pause and (apparently randomly) changes the inflection, so now it sounds like Jesse is introducing himself. I also changed the period to a semi-colon to reduce the pause between sentences:
It turns out the slash is the best option for this phrase, too: “Maybe, maybe not.” A comma or semi-colon doesn’t slow down the narrator enough, but a slash is just right.
The pause between sentences is slightly shorter than the pause between paragraphs, as you’d expect. In my opinion, both pauses are a few milliseconds too long (even when I speed up the narrator to x1.20) and this is especially tedious when a character speaks several sentences of dialogue at once. (I have a few characters who always speak carefully, where the longer pauses sound fine.) Where the pauses are too long, I changed some of the periods to semi-colons or slashes. This does change the inflection, too, and it can work very well. Here’s an example with periods, as it appears in the ebook – there are just too many long pauses when Michelle reads it because the character (Jesse) is a fast-talker and quite excited here:
“This room is great. This house is a mansion. It’s huge. She has Zulu masks in the dining room, and a didgeridoo. Did you see that, Caleb? I’m dying to play it. You have to breathe in through your nose while you blow out with your mouth. It’s called circular breathing.”
By running some sentences together with semi-colons (and a colon), the dialogue sounds more natural:
“This room is great: This house is a mansion; It’s huge. She has Zulu masks in the dining room, and a didgeridoo; Did you see that, Caleb? I’m dying to play it; You have to breathe in through your nose while you blow out with your mouth; It’s called circular breathing.”
On the other hand, in this sentence (from book 2), the commas are pretty ineffective and I had to change them to slashes to slow Michelle down this time:
Now she slept with the drapes open and the lamp on, ignoring Rosa’s lights out order, and still when she closed her eyes she felt herself floating away.
With the slashes instead of commas:
Now she slept with the drapes open / and the lamp on, ignoring Rosa’s lights out order, / and still when she closed her eyes / she felt herself floating away.
Again, making these changes to the script does not affect the ebook in the Google Play store.
The audio script strips out all italics and bold. Yikes! But you can usually (doesn’t always work) add emphasis to a single word manually by using quotes (all-caps won’t work). This is how Michelle recites the following few sentences from the ebook:
For an instant she felt it was all a mistake. She wasn’t supposed to be here. Maybe she wasn’t here. Maybe she’d never left Arizona and this was all a dream.
The italics dropped out, so the emphasis is in the wrong place. But, first things first: those pauses between sentences are just too long – this internal monologue is supposed to “sound” nervous and breathless, not stilted and ponderous. I used semi-colons instead, and got this:
For an instant she felt it was all a mistake; She wasn’t supposed to be here; Maybe she wasn’t here; Maybe she’d never left Arizona and this was all a dream.
That flows more smoothly. Now to fix the emphasis: I’ve added quotes around two words – “wasn’t”, to replace the italics in the original, and also “all” – which was okay in the first example but went wrong in the second.
For an instant she felt it was all a mistake; She wasn’t supposed to be here; Maybe she “wasn’t” here; Maybe she’d never left Arizona and this was “all” a dream.
The takeaway here is that you’ll need to try different punctuation until you get the sound you want. After a while, you’ll start to instinctively know what changes you need to make to the audio script before you even listen to it.
TIP: While the A.I. appears to read ellipses correctly (with inflection indicating an unfinished sentence), a comma can have almost the same effect – but again, can change the inflection of other parts of the sentence. Be sure you actually do use the ellipsis symbol in your ebook (which gets converted to the audio script), not three dots. Three dots will be read as a period, so the inflection may be wrong.
My books are dialogue-heavy and my characters interrupt each other a lot! On the page, I use a text dash when someone is cut off, and what comes next goes in a new paragraph, since it’s a new speaker. This, of course, doesn’t work for an A.I. narrator because it creates a long pause in exactly the place you don’t want any pause at all. From the book:
“This afternoon we’ll attend Wynter’s emergency placement hearing, to make sure she’s placed with you. In the longer term—”
“Caleb, wait.” Joy had gone pale. “I don’t think… I thought you would take care of her.”
Michelle leaves a long pause between the paragraphs, whereas I want Joy to interrupt Caleb. So, here’s what the script looks like:
“This afternoon we’ll attend Wynter’s emergency placement hearing, to make sure she’s placed with you. In the longer term—” “Caleb, wait.” Joy had gone pale. I thought “you” would take care of her.
First, I removed the paragraph break. I also deleted Joy’s “I don’t think…” because no matter what I did, I couldn’t get the “uncertain” inflection right. (This was one of the very few instances where I made a more drastic change to the ebook than just adding he said, she said.) I also put quotes around “you” for the correct emphasis.
Another example, this one with two interruptions from a very impatient person. From the book:
“Where is he?”
“Clackamas County Jail, about half an hour south of the university. I can text you the address—”
“I’ll find it. Are you there now?”
“No, man, I gotta get to class. He made me swear not to call you but I couldn’t just—”
“I’ll deal with it. Thanks for calling.”
The narration isn’t working at all for this exchange:
After running on the paragraphs and adding some other punctuation to fix the inflexion, the script looks and sounds like this:
“Where is he?”
“Clackamas County Jail about half an hour south of the university. I can text you the address?—”“I’ll find it; Are you there now?”
“No, man, I gotta get to class / He made me swear not to call you but I couldn’t just—”“I’ll deal with it; Thanks for calling.”
I’m not talking about bulleted lists here, but a run-on list of items separated by commas. Michelle rushes through these lists comically fast, whether each item on the list is one word or several. Listen:
“They’re different brands and different flours. White, whole-wheat, grain, buckwheat, rye, sourdough.”
The simplest way to fix this is use a colon to introduce the list (which you may already do, but I generally never use colons in fiction). I guess this tells Michelle a list is coming up, because it makes her slow down between each item:
“They’re different brands and different flours: white, whole-wheat, grain, buckwheat, rye, sourdough.”
Note that Michelle says the first part of the sentence faster (it’s slower in the first example, where it ends with a period). This still sounds too fast, though. Replacing the commas with slashes works best of all:
“They’re different brands and different flours: White / whole-wheat / grain / buckwheat / rye / sourdough.”
Here’s the difference between commas and slashes when the items in the list are phrases, not single words:
Wynter paused at the bottom, breathless with anticipation, to drink in the view. The moon, the trees with crooked trunks, the street lamps in the parking lot, the tall square buildings across the street with dozens and dozens of windows.
Replacing the commas with slashes makes the sentence easier to digest. In fact, I add slashes wherever a long sentence needs a slight pause, such as the additional one here between street and with.
Wynter paused at the bottom, breathless with anticipation, to drink in the view / The moon / the trees with crooked trunks / the street lamps in the parking lot / the tall square buildings across the street / with dozens and dozens of windows.
Putting it all together
Here’s a complex paragraph from the last chapter, exactly as it appears in the ebook:
“You were strong enough to reject Momma once you found out the truth. What she was offering, that was never gonna be enough for you. Nowhere near what you deserve.” Pain hardened his expression and his voice went very quiet. “I understand why you were tempted to go to her. I would’ve gone, too. I don’t think they would—Caleb’s tied to his life here, and Jesse doesn’t care about her at all—but I would go.”
You’ll notice a few problems as you listen. Here’s the final audio script, where I’ve attempted to fix those problems:
Indio said: “You were strong enough to reject Momma / once you found out the truth. What she was offering, that was never gonna be enough for you; Nowhere near what you deserve.” Pain hardened his expression and his voice went very quiet. “I understand why you were tempted to go to her; I would’ve gone, too. I don’t think “they” would—Caleb’s tied to his life here, and Jesse doesn’t care about her at all—but “I” would go.”
Here’s what I changed:
- Added “Indio said” at the start, because this paragraph immediately follows Wynter’s dialogue – on the page it’s obvious this is a new speaker, but it’s not obvious in an audiobook.
- Added a slash after Momma to create a tiny pause
- Changed two periods to semi-colons, to avoid too many long pauses.
- Emphasized “they” using quote marks, where the word was italics in the ebook, and also “I” because Michelle was emphasizing “would” instead.
Note that although Indio’s voice goes “very quiet”, Michelle’s of course does not. Here’s the final audio:
There are parts of the original I do like a little better, so I had to compromise. And I could probably play around with the punctuation forever to find different ways for Michelle to “act” that piece of dialogue. But the important parts for ease of listening and clarity – the reduced pause between sentences and the emphasized words – are fixed.
Oddities & unfixables
If you have section breaks in your ebook (three asterisks or similar), they will be stripped out of the audio script. Even if you add them back in, they will disappear when you click Publish. This is a fairly major problem, because I haven’t found a way to force a longer pause between paragraphs. My section breaks indicate POV changes and time jumps, and the listener may experience a few seconds of confusion without them.
One of my characters is named Joy. Michelle often just can’t say her name correctly, especially when it starts a sentence. It sounds like “Jo” even though the phonetic spelling is correct. I haven’t been able to reliably fix this. You can hear it in the example below:
“Joy, please tell me—was she mistreated? Were you?”
I altered the script for this sentence to look like this:
Joy / please tell me; was she mis-treated? Were “you”?
By forcing a small pause with the slash, “Joy” comes out correctly – but this hasn’t always worked. (Additionally, the hyphen in mistreated makes Michelle enunciate the word better, and the quotes around “you” are to place the emphasis correctly.) Changing “Joy’s” to “Joyz” globally sometimes helped Michelle say that form of her name properly (instead of “Jo’s”). On a slightly different note, “Jesse’s” was being spoken as “Jessuz”, so I changed it globally to “Jesseez” and now it’s fine.
Michelle did a good job with the numerals in my book. I use HH:MM to format time, e.g. 3:25AM, which is read as “three twenty-five A.M.” I did change the way she said dollar amounts – she reads “$17.95” as “seventeen dollars and ninety-five cents” which is laborious. I changed the script to “17 95” where it was clear from the context that it was a dollar amount.
You wouldn’t think an A.I. narrator’s voice would crack, but sometimes Michelle’s voice does come out… well, damaged. Listen to this:
“She doesn’t seem to know much about… well, about the world. Never tasted hot chocolate.”
Hear that weird crack on the word “know”? There are several instances like this in the book, and I didn’t worry about them – I tell myself they humanize Michelle. And the bizarre thing is, if I replace the ellipsis with a comma…
“She doesn’t seem to know much about, well, about the world. Never tasted hot chocolate.”
…the crack is gone! And the inflection on “the world” has changed. All this to reiterate that you’ll want to play around with the punctuation to get the sound you want, especially on important emotional phrases.
Having occasionally used text-to-speech and found those voices hard to listen to, I was pleasantly surprised by Google Play’s A.I. narration – at least with Michelle. I’m fascinated by A.I. and look forward to playing around with this system as I produce the other books in the series, as well as my science fiction novels which will have completely different challenges to address. Overall, the audio book is about 10 hours long (120K words) and took me about 25 hours to edit as I learned the process. Future books should faster – perhaps only 20 to 30% additional to the running time.
The full audiobook for Little Sister Song is available for purchase from Bookfunnel.