December 28 2016

Using the Speech Synthesis API

Michael Lynch Game Development JavaScript, tts 4

For a recent project, we decided to try to use the built-in browsers support for speech while we were waiting for the final script to be finalized. We needed a way to get an idea of how long it would run and to be able to quickly have our writers review the current script in the running app. We decided it would be a perfect time to try out the speech synthesis API!

Hearing all the voices

The first thing I wanted to accomplish was choosing a voice. Each operating system has a variety of voices built in, so I wanted a way to hear each voice. I built this simple script to be able to accomplish this.

      var voices = window.speechSynthesis.getVoices();
      var idx = 0;
      function playVoice() {
        var say = 'Hi! My name is ' + voices[idx].name;
        console.log('[' + idx + ']' + say);
        var msg = new window.SpeechSynthesisUtterance(say);
        msg.voice = voices[idx];
        window.speechSynthesis.speak(msg);
        if (idx <= voices.length) {
           idx++;
          setTimeout(function() { playVoice(); }, 2000);
        }
      }
      playVoice();

This works by setting a new voice and calling it every 2 seconds (or so, since the plays stack, they may play slightly early or late. You can play with the timeout if you want to prevent that.) The first thing it does is grab all the voices into an array and then keeps calling setTimeout until it has played all the voices.

The first thing I noticed was that many of the 64 voices that my Mac has are not English! A slight correction helps this. This version only plays voices in English. One thing to note is that the non-us speaking voices might give an NPC some character, for instance, if they were from another country but speaking in English, you might use “Ting-ting.”

      var voices = window.speechSynthesis.getVoices();
      var idx = 0;
      function playVoice() {
        var say = 'Hi! My name is ' + voices[idx].name;
        console.log('[' + idx + ']' + say);
        var msg = new window.SpeechSynthesisUtterance(say);
        msg.voice = voices[idx];
        window.speechSynthesis.speak(msg);
        var notFound = true;
        idx++;
        while ((notFound) && (idx < voices.length)) {
          if (voices[idx].lang === 'en-US') {
            setTimeout(function() { playVoice(); }, 2000);
            notFound = false;
          } else {
            idx++;
          }
        }
      }
      playVoice();

By the way, you can stop the voices by running this line, since I did not bother saving my timeout vars…

var highestTimeoutId = setTimeout(";");
for (var i = 0 ; i < highestTimeoutId ; i++) {
    clearTimeout(i); 
}

Choosing a voice

After listening to the voices a few times, I picked one that was available. Well, actually I picked a few since we have a few characters to voice. The best way to use the voices is not to use the index that is shown, but the name when looking up a voice. This is in case the user listening is using another system or has a different set of voices on their operating system. Easily done with this code.

      var voiceIdx = speechSynthesis.getVoices().filter(function(voice) { return voice.name == 'Boing'; })[0];

Small Modifications

After choosing a voice, you can make some modifications to it. You can adjust pitch, rate, and volume. Lowering the volume helped, and I also adjusted the pitch slightly.

          var msg = new window.SpeechSynthesisUtterance(say);
          msg.pitch = 0.9;
          msg.volume = 0.8;
          window.speechSynthesis.speak(msg);

Using the voice

Using the voice is no different than how I used them when I was looping before. Simply find the voice you want, store that for later and then create an utterance and play it. One other thing to note is that you can stop the utterance with the cancel() method.

      var voices = window.speechSynthesis.getVoices();
      var characters = {};
      function setupCharacters() {
        var characters = {};

        var male1 = new window.SpeechSynthesisUtterance();
        male1.voice = voices[findVoice('Boing')];
        male1.volume = 0.9; // 0 to 1
        male1.rate = 0.9; // 0.1 to 10
        male1.pitch = 1;  // 0 to 2
        male1.lang = 'en-US';
        characters['male1'] = male1;

        var male2 = new window.SpeechSynthesisUtterance();
        male2.voice = voices[findVoice('Bruce')];
        male2.volume = 0.9; // 0 to 1
        male2.rate = 1.0;   // 0.1 to 10
        male2.pitch = 1;    // 0 to 2
        male2.lang = 'en-US';
        characters['male2'] = male2;

        var female1 = new window.SpeechSynthesisUtterance();
        female1.voice = voices[findVoice('Karen')];
        female1.volume = 0.8; // 0 to 1
        female1.rate = 1.0;   // 0.1 to 10
        female1.pitch = 1;    // 0 to 2
        female1.lang = 'en-US';
        characters['female1'] = female1;

        return characters;
      }

      function findVoice(voiceName) {
        return voices.findIndex(function (voice) {
          if (voice.name === voiceName) {
            return voice;
          }
        });
      }

      function stopDialog() {
        window.speechSynthesis.cancel();
      }

      function sayDialog(who, say) {
        if (typeof(characters[who] !== undefined)) {
          var msg = characters[who];
          msg.text = say;
          window.speechSynthesis.speak(msg);
        }
      }

      characters = setupCharacters();

      sayDialog('male1', 'While you were out the police came by, and they asked about your sister.  I told them I did not know where she was but they may have known I was lying.');

Tips

Some tips I found on the web for making your text better:

Using a comma or period will cause a small pause and sometimes change the emphasis of the prior word
Changing the order of words can subtly affect the pronunciation
Every 100 characters the speech will make a pause. A comma or period will reset the count
Using quotes around words can change their pronunciation
Putting words together like GoPro, or with a hyphen may improve the pronunciation
Letters on their own are spoken well as in A P I whereas API will be pronounced like ‘appy’