10.5 Programming a Talking Lion

Writing good speech synthesis and recognition software is hard. That’s why we’re going to take advantage of all the hard work Apple speech software engineers have poured into OS X. The engine can be accessed a variety of ways, via the preferred method of Objective-C to Perl, Python, Ruby, and other scripting language hooks. But the easiest way I have found to tinker and quickly modify and test on the fly is via AppleScript.

I’ll be the first to admit that I am not a big fan of AppleScript. Its attempt to turn script writing into a natural English sentence structure works only on a superficial level. It breaks down pretty quickly for any intermediate developer fluent in more elegant scripting languages like Ruby or Python. Even simple tasks like string manipulation turn out to be a real pain in AppleScript. That said, AppleScript trumps these other languages when it comes to effortless automation integration with other AppleScript-aware OS X applications. Bundled programs like iTunes, Mail, Safari, and Finder are fully scriptable, as are a number of third-party OS X programs like Skype, Microsoft Office, and the like. In the case of this project, Apple’s speech recognition server is also highly scriptable, and that’s what we’re going to call upon in this project to make the magic work.

While AppleScript can be written using any text editor, it should come as no surprise that it’s best hosted within the AppleScript Editor application. This can be found in the Applications/Utilities folder. Launching the AppleScript editor for the first time will open a blank, two-pane coding window. The top half of the window is used to enter code, while the bottom consists of three tabs for monitoring events, replies, and results of the executing script. The editor aids in writing script by color coding AppleScript syntax, but it doesn’t offer IDE-friendlier features like code completion or on-the-fly compiling. Fortunately, scripts are typically short, so these omissions are not crippling.

AppleScript has its own vocabulary, keywords, and idioms. Learning AppleScript isn’t difficult, but it can get maddening at times when you have to massage the syntax just right to make the script do what you intended. For example, parsing a string for an email address is easy in most scripting languages. Not so in AppleScript. Partly due to its historical ties and partly due to the way AppleScript expects you to work, it’s complicated. So with regard to the code we will write for this project, you will just have to trust me and try to follow along. If you find AppleScript to your liking or want to see what else it can do to further extend the code for this project, review Apple’s online documentation for more information.[108]

Before writing the script, let’s think about what we want it to do. First, we want it to respond to a select group of spoken words or phrases and act on those commands accordingly. What commands should we elicit? For starters, how about having the script hit the URLs we exposed in some of our networked projects, like the Web-Enabled Light Switch or the Android Door Lock? While we’re at it, let’s make use of some of the bundled OS X applications like Mail and iTunes to check and read our unread email and play music we want to hear. Let’s also ask our house what time it is.

We need to initialize the SpeechRecognitionServer application and populate the set of words or phrases that we want it to listen to. Using a series of if/then statements, we can react to those recognized commands accordingly. For example, if we ask the computer to play music, we will call upon the iTunes application to take an inventory of music tracks in its library, sort these by artist and album, populate these as more words/phrases to interpret, and have the text-to-speech engine ask us which artist and album we want to listen to. Similarly, we can have our unread email read to us via a check mail command. Doing so will launch the Mail application, poll your preconfigured Mail accounts for new mail, check the inbox for unread messages, and perform a text-to-speech reading of unread sender names and message titles.

Now let’s take a closer look at the details of the script’s execution. Here’s the full script in its entirety. Most of the syntax should be easy to follow, even if you are not familiar with AppleScript.

GivingYourHomeAVoice/osx-voice-automation.scpt
  with timeout of 2629743 seconds​
  set exitApp to "no"
  repeat while exitApp is "no"
  tell application "SpeechRecognitionServer"
  activate
  try
  set voiceResponse to listen for {"light on", "light off", ¬​
  "unlock door", "play music", "pause music", ¬​
  "unpause music", "stop music", "next track", ¬​
  "raise volume", "lower volume", ¬​
  "previous track", "check email", "time", "make a call", ¬​
  "hang up", "quit app"} giving up after 2629743​
  on error -- time out
  ​ return​
  end try
  end tell
  ​​
  if voiceResponse is "light on" then
  -- open URL to turn on Light Switch
  ​ open location "http://192.168.1.100:3344/command/on"
  ​ say "The light is now on."
  ​​
  else if voiceResponse is "light off" then
  -- open URL to turn off Light Switch
  ​ open location "http://192.168.1.100:3344/command/off"
  ​ say "The light is now off."
  ​​
  else if voiceResponse is "unlock door" then
  -- open URL to unlock Android Door Lock
  ​ open location "http://192.168.1.230:8000"
  ​ say "Unlocking the door."
  ​​
  else if voiceResponse is "play music" then
  tell application "iTunes"
  set musicList to {"Cancel"} as list
  set myList to (get artist of every track ¬​
  of playlist 1) as list
  repeat with myItem in myList​
  if musicList does not contain myItem then
  set musicList to musicList & myItem​
  end if
  end repeat
  end tell
  ​​
  ​ say "Which artist would you like to listen to?"
  tell application "SpeechRecognitionServer"
  set theArtistListing to ¬​
  ​ (listen for musicList with prompt musicList)​
  end tell
  if theArtistListing is not "Cancel" then
  ​ say "Which of " & theArtistListing & ¬​
  "'s albums would you like to listen to?"
  tell application "iTunes"
  tell source "Library"
  tell library playlist 1​
  set uniqueAlbumList to {}​
  set albumList to album of tracks ¬​
  ​ where artist is equal to theArtistListing​
  ​​
  repeat until albumList = {}​
  if uniqueAlbumList does not contain ¬​
  ​ (first item of albumList) then
  copy (first item of albumList) to end of ¬​
  ​ uniqueAlbumList​
  end if
  set albumList to rest of albumList​
  end repeat
  ​​
  set theUniqueAlbumList to {"Cancel"} & uniqueAlbumList​
  tell application "SpeechRecognitionServer"
  set theAlbum to (listen for the theUniqueAlbumList ¬​
  with prompt theUniqueAlbumList)​
  end tell
  end tell
  if theAlbum is not "Cancel" then
  if not ((name of playlists) contains "Current Album") then
  set theAlbumPlaylist to ¬​
  ​ make new playlist with properties {name:"Current Album"}​
  else
  set theAlbumPlaylist to playlist "Current Album"
  delete every track of theAlbumPlaylist​
  end if
  tell library playlist 1 to duplicate ¬​
  ​ (every track whose album is theAlbum) to theAlbumPlaylist​
  play theAlbumPlaylist​
  else
  ​ say "Canceling music selection"
  end if
  end tell
  end tell
  else
  ​ say "Canceling music selection"
  end if
  ​​
  else if voiceResponse is "pause music" or ¬ ​
  ​ voiceResponse is "unpause music" then
  tell application "iTunes"
  ​ playpause​
  end tell
  ​​
  else if voiceResponse is "stop music" then
  tell application "iTunes"
  stop
  end tell
  ​​
  else if voiceResponse is "next track" then
  tell application "iTunes"
  ​ next track​
  end tell
  ​​
  else if voiceResponse is "previous track" then
  tell application "iTunes"
  ​ previous track​
  end tell
  ​​
  -- Raise and lower volume routines courtesy of HexMonkey's post:
  -- http://forums.macrumors.com/showthread.php?t=144749
  else if voiceResponse is "raise volume" then
  set currentVolume to output volume of (get volume settings)​
  set scaledVolume to round (currentVolume / (100 / 16))​
  set scaledVolume to scaledVolume + 1​
  if (scaledVolume > 16) then
  set scaledVolume to 16​
  end if
  set newVolume to round (scaledVolume / 16 * 100)​
  set volume output volume newVolume​
  else if voiceResponse is "lower volume" then
  set currentVolume to output volume of (get volume settings)​
  set scaledVolume to round (currentVolume / (100 / 16))​
  set scaledVolume to scaledVolume - 1​
  if (scaledVolume < 0) then
  set scaledVolume to 0​
  end if
  set newVolume to round (scaledVolume / 16 * 100)​
  set volume output volume newVolume​
  ​​
  else if voiceResponse is "check email" then
  tell application "Mail"
  activate
  ​ check for new mail​
  set unreadEmailCount to unread count in inbox​
  if unreadEmailCount is equal to 0 then
  ​ say "You have no unread messages in your Inbox."
  else if unreadEmailCount is equal to 1 then
  ​ say "You have 1 unread message in your Inbox."
  else
  ​ say "You have " & unreadEmailCount & ¬​
  " unread messages in your Inbox."
  end if
  if unreadEmailCount is greater than 0 then
  ​ say "Would you like me to read your unread email to you?"
  tell application "SpeechRecognitionServer"
  activate
  set voiceResponse to listen for {"yes", "no"} ¬​
  ​ giving up after 1 * minutes​
  end tell
  if voiceResponse is "yes" then
  set allMessages to every message in inbox​
  repeat with aMessage in allMessages​
  if read status of aMessage is false then
  set theSender to sender of aMessage​
  set {savedDelimiters, AppleScript's text item delimiters} ¬
  to {AppleScript's text item delimiters, "<"}
  set senderName to first text item of theSender​
  set AppleScript's text item delimiters ¬
  to savedDelimiters​
  ​ say "From " & senderName​
  ​ say "Subject: " & subject of aMessage​
  ​ delay 1​
  end if
  end repeat
  end if
  end if
  end tell
  ​​
  else if voiceResponse is "time" then
  set current_time to (time string of (current date))​
  set {savedDelimiters, AppleScript's text item delimiters} to ¬
  ​ {AppleScript's text item delimiters, ":"}
  set hours to first text item of current_time​
  set minutes to the second text item of current_time​
  set AMPM to third text item of current_time​
  set AMPM to text 3 thru 5 of AMPM​
  set AppleScript's text item delimiters to savedDelimiters
  ​ say "The time is " & hours & " " & minutes & AMPM​
  --else if voiceResponse is "make a call" then
  -- tell application "Skype"
  -- -- A Skype API Security dialog will pop up first
  -- -- time accessing Skype with this script.
  -- -- Select "Allow this application to use Skype" for ¬
  -- -- uninterrupted Skype API access.
  -- activate
  -- -- replace echo123 Skype Call Testing Service ID with ¬
  -- -- phone number or your contact's Skype ID
  -- send command "CALL echo123" script name ¬
  -- "Place Skype Call"
  -- end tell
  -- else if voiceResponse is "hang up" then
  -- tell application "Skype"
  -- quit
  -- end tell
  else if voiceResponse is "quit app" then
  set exitApp to "yes"
  ​ say "Listening deactivated. Exiting application."
  ​ delay 1​
  do shell script "killall SpeechRecognitionServer"
  end if
  end repeat
  end timeout​

The first thing we should do to keep the script running continuously is wrap the script in two loops. The first is a with timeout... end with loop to prevent the script from timing out. The timeout duration must be set in seconds. In this case, we’re going to run the script for one month (there are roughly 2.6 million seconds in an average month).

The second loop is a while loop that repeats until the exitApp variable is set to yes via the “Quit app” voiceResponse, as shown toward the end of the code listing.

Next, initialize the Speech Recognizer Server and pass it an array of the key words and phrases via the listen for method. We will keep the recognizer alive for a month so it can await incoming commands without having to restart the script when the listening duration times out. You can extend this month-long duration by changing the giving up value.

If the incoming phrase is interpreted as lights on, we will open the default browser and direct it to the on URL of our web-enabled light switch. “Lights off” will request the off URL from that project. We can also perform the same open location URL call for the Android door lock project too.

Besides triggering URL calls via voice, we can also interact with AppleScript-able OS X applications like iTunes and Mail. In this code snippet, we do the following:

  1. Open iTunes.

  2. Create an empty list array.

  3. Populate that array with every song track in our local iTunes library, eliminating duplicate titles along the way.

  4. Extract the artist names from the array of tracks.

  5. As long as there is at least one artist in the array, pass the array of artist names to the speech recognition server via its listen for method.

  6. Ask the user to pick an artist to listen to. If the user responds with the name of an artist in the library, populate the speech recognizer with the name(s) of that artist’s album(s). Users can also exit the play music routine at this point by saying the word “Cancel.”

  7. If an artist has more than one album in the library, use the same type of procedure as the artist selection process to select the desired artist’s album. Otherwise, start playback of the album immediately.

The pause/unpause and stop music commands, along with the next and previous track commands, call iTunes’s similarly named methods.

The raise and lower volume commands capture the Mac’s current output volume and raises or lowers it equivalent to a single press of the up and down volume keys on the Mac’s keyboard. These commands are especially helpful when having to raise or lower music playback volume hands-free.

This portion of the script expects that you have already configured your desired email accounts to work with OS X’s built-in Mail application. In the Mail snippet, we do this:

  1. Open Mail.

  2. Poll all configured mail servers for new, unread email messages.

  3. Count the number of unread mail messages in the unified inbox and speak that amount.

  4. If there are any unread messages, ask users if they would like to have their unread messages read to them.

  5. If the user answer is yes, create an array of the unread messages and read the name and the subject line of the email. Otherwise, exit the routine.

This routine extracts the current time from AppleScript’s current date routine. From there, we do this:

  1. Assign the current time to the string current_time.

  2. Use AppleScript’s savedDelimiters function to split the current_time string via the : delimiter. This breaks the string apart into its constituent hour and minute values. The remainder of the string contains the a.m. or p.m. designation.

  3. Assign these time values to their appropriate variables (hours, minutes, AMPM) and speak them accordingly.

Uncomment these lines (remove the double-dash [--] characters used to indicate a comment in AppleScript) if you have the Mac Skype client installed and you want to place a hands-free call. Configure the account name of your choice in the echo123 Skype call testing service account.

This command exits the script and ensures that the speech recognition server process is indeed killed by issuing a killall SpeechRecognitionServer command from the shell.

Once you have entered the script in the AppleScript editor, save it and click the Compile button on the editor’s toolbar. If the script contains any typos or errors, it will fail to compile. Clean up whatever problems occur and make sure you attain a clean compile. Also make sure that your calibrated wireless headset is turned on and the input audio levels are properly set. Turn up the volume on your external speakers loud enough to hear the responses and music playback. Then click the Run button and get ready to talk.

Programming Your Home
cover.xhtml
f_0000.html
f_0001.html
f_0002.html
f_0003.html
f_0004.html
f_0005.html
f_0006.html
f_0007.html
f_0008.html
f_0009.html
f_0010.html
f_0011.html
f_0012.html
f_0013.html
f_0014.html
f_0015.html
f_0016.html
f_0017.html
f_0018.html
f_0019.html
f_0020.html
f_0021.html
f_0022.html
f_0023.html
f_0024.html
f_0025.html
f_0026.html
f_0027.html
f_0028.html
f_0029.html
f_0030.html
f_0031.html
f_0032.html
f_0033.html
f_0034.html
f_0035.html
f_0036.html
f_0037.html
f_0038.html
f_0039.html
f_0040.html
f_0041.html
f_0042.html
f_0043.html
f_0044.html
f_0045.html
f_0046.html
f_0047.html
f_0048.html
f_0049.html
f_0050.html
f_0051.html
f_0052.html
f_0053.html
f_0054.html
f_0055.html
f_0056.html
f_0057.html
f_0058.html
f_0059.html
f_0060.html
f_0061.html
f_0062.html
f_0063.html
f_0064.html
f_0065.html
f_0066.html
f_0067.html
f_0068.html
f_0069.html
f_0070.html
f_0071.html
f_0072.html
f_0073.html
f_0074.html
f_0075.html
f_0076.html
f_0077.html
f_0078.html
f_0079.html
f_0080.html
f_0081.html
f_0082.html
f_0083.html
f_0084.html
f_0085.html
f_0086.html
f_0087.html
f_0088.html
f_0089.html
f_0090.html
f_0091.html
f_0092.html
f_0093.html
f_0094.html
f_0095.html
f_0096.html
f_0097.html
f_0098.html
f_0099.html
f_0100.html
f_0101.html
f_0102.html
f_0103.html
f_0104.html
f_0105.html
f_0106.html
f_0107.html
f_0108.html
f_0109.html
f_0110.html
f_0111.html
f_0112.html
f_0113.html
f_0114.html
f_0115.html
f_0116.html
f_0117.html
f_0118.html
f_0119.html
f_0120.html
f_0121.html
f_0122.html
f_0123.html
f_0124.html
f_0125.html