Programming Your Home Mike Riley

Prev Next

10.5 Programming a Talking Lion

Writing good speech synthesis and recognition software is hard. That’s why we’re going to take advantage of all the hard work Apple speech software engineers have poured into OS X. The engine can be accessed a variety of ways, via the preferred method of Objective-C to Perl, Python, Ruby, and other scripting language hooks. But the easiest way I have found to tinker and quickly modify and test on the fly is via AppleScript.

I’ll be the first to admit that I am not a big fan of AppleScript. Its attempt to turn script writing into a natural English sentence structure works only on a superficial level. It breaks down pretty quickly for any intermediate developer fluent in more elegant scripting languages like Ruby or Python. Even simple tasks like string manipulation turn out to be a real pain in AppleScript. That said, AppleScript trumps these other languages when it comes to effortless automation integration with other AppleScript-aware OS X applications. Bundled programs like iTunes, Mail, Safari, and Finder are fully scriptable, as are a number of third-party OS X programs like Skype, Microsoft Office, and the like. In the case of this project, Apple’s speech recognition server is also highly scriptable, and that’s what we’re going to call upon in this project to make the magic work.

While AppleScript can be written using any text editor, it should come as no surprise that it’s best hosted within the AppleScript Editor application. This can be found in the Applications/Utilities folder. Launching the AppleScript editor for the first time will open a blank, two-pane coding window. The top half of the window is used to enter code, while the bottom consists of three tabs for monitoring events, replies, and results of the executing script. The editor aids in writing script by color coding AppleScript syntax, but it doesn’t offer IDE-friendlier features like code completion or on-the-fly compiling. Fortunately, scripts are typically short, so these omissions are not crippling.

AppleScript has its own vocabulary, keywords, and idioms. Learning AppleScript isn’t difficult, but it can get maddening at times when you have to massage the syntax just right to make the script do what you intended. For example, parsing a string for an email address is easy in most scripting languages. Not so in AppleScript. Partly due to its historical ties and partly due to the way AppleScript expects you to work, it’s complicated. So with regard to the code we will write for this project, you will just have to trust me and try to follow along. If you find AppleScript to your liking or want to see what else it can do to further extend the code for this project, review Apple’s online documentation for more information.[108]

Before writing the script, let’s think about what we want it to do. First, we want it to respond to a select group of spoken words or phrases and act on those commands accordingly. What commands should we elicit? For starters, how about having the script hit the URLs we exposed in some of our networked projects, like the Web-Enabled Light Switch or the Android Door Lock? While we’re at it, let’s make use of some of the bundled OS X applications like Mail and iTunes to check and read our unread email and play music we want to hear. Let’s also ask our house what time it is.

We need to initialize the SpeechRecognitionServer application and populate the set of words or phrases that we want it to listen to. Using a series of if/then statements, we can react to those recognized commands accordingly. For example, if we ask the computer to play music, we will call upon the iTunes application to take an inventory of music tracks in its library, sort these by artist and album, populate these as more words/phrases to interpret, and have the text-to-speech engine ask us which artist and album we want to listen to. Similarly, we can have our unread email read to us via a check mail command. Doing so will launch the Mail application, poll your preconfigured Mail accounts for new mail, check the inbox for unread messages, and perform a text-to-speech reading of unread sender names and message titles.

Now let’s take a closer look at the details of the script’s execution. Here’s the full script in its entirety. Most of the syntax should be easy to follow, even if you are not familiar with AppleScript.

GivingYourHomeAVoice/osx-voice-automation.scpt
	`with timeout of 2629743 seconds`
	`set exitApp to "no"`
	`repeat while exitApp is "no"`
①	`tell application "SpeechRecognitionServer"`
	`activate`
	`try`
	`set voiceResponse to listen for {"light on", "light off", ¬`
	`"unlock door", "play music", "pause music", ¬`
	`"unpause music", "stop music", "next track", ¬`
	`"raise volume", "lower volume", ¬`
	`"previous track", "check email", "time", "make a call", ¬`
	`"hang up", "quit app"} giving up after 2629743`
	`on error -- time out`
	`return`
	`end try`
	`end tell`

②	`if voiceResponse is "light on" then`
	`-- open URL to turn on Light Switch`
	`open location "http://192.168.1.100:3344/command/on"`
	`say "The light is now on."`

	`else if voiceResponse is "light off" then`
	`-- open URL to turn off Light Switch`
	`open location "http://192.168.1.100:3344/command/off"`
	`say "The light is now off."`

	`else if voiceResponse is "unlock door" then`
	`-- open URL to unlock Android Door Lock`
	`open location "http://192.168.1.230:8000"`
	`say "Unlocking the door."`

③	`else if voiceResponse is "play music" then`
	`tell application "iTunes"`
	`set musicList to {"Cancel"} as list`
	`set myList to (get artist of every track ¬`
	`of playlist 1) as list`
	`repeat with myItem in myList`
	`if musicList does not contain myItem then`
	`set musicList to musicList & myItem`
	`end if`
	`end repeat`
	`end tell`

	`say "Which artist would you like to listen to?"`
	`tell application "SpeechRecognitionServer"`
	`set theArtistListing to ¬`
	`(listen for musicList with prompt musicList)`
	`end tell`
	`if theArtistListing is not "Cancel" then`
	`say "Which of " & theArtistListing & ¬`
	`"'s albums would you like to listen to?"`
	`tell application "iTunes"`
	`tell source "Library"`
	`tell library playlist 1`
	`set uniqueAlbumList to {}`
	`set albumList to album of tracks ¬`
	`where artist is equal to theArtistListing`

	`repeat until albumList = {}`
	`if uniqueAlbumList does not contain ¬`
	`(first item of albumList) then`
	`copy (first item of albumList) to end of ¬`
	`uniqueAlbumList`
	`end if`
	`set albumList to rest of albumList`
	`end repeat`

	`set theUniqueAlbumList to {"Cancel"} & uniqueAlbumList`
	`tell application "SpeechRecognitionServer"`
	`set theAlbum to (listen for the theUniqueAlbumList ¬`
	`with prompt theUniqueAlbumList)`
	`end tell`
	`end tell`
	`if theAlbum is not "Cancel" then`
	`if not ((name of playlists) contains "Current Album") then`
	`set theAlbumPlaylist to ¬`
	`make new playlist with properties {name:"Current Album"}`
	`else`
	`set theAlbumPlaylist to playlist "Current Album"`
	`delete every track of theAlbumPlaylist`
	`end if`
	`tell library playlist 1 to duplicate ¬`
	`(every track whose album is theAlbum) to theAlbumPlaylist`
	`play theAlbumPlaylist`
	`else`
	`say "Canceling music selection"`
	`end if`
	`end tell`
	`end tell`
	`else`
	`say "Canceling music selection"`
	`end if`

④	`else if voiceResponse is "pause music" or ¬`
	`voiceResponse is "unpause music" then`
	`tell application "iTunes"`
	`playpause`
	`end tell`

	`else if voiceResponse is "stop music" then`
	`tell application "iTunes"`
	`stop`
	`end tell`

	`else if voiceResponse is "next track" then`
	`tell application "iTunes"`
	`next track`
	`end tell`

	`else if voiceResponse is "previous track" then`
	`tell application "iTunes"`
	`previous track`
	`end tell`

	`-- Raise and lower volume routines courtesy of HexMonkey's post:`
	`-- http://forums.macrumors.com/showthread.php?t=144749`
⑤	`else if voiceResponse is "raise volume" then`
	`set currentVolume to output volume of (get volume settings)`
	`set scaledVolume to round (currentVolume / (100 / 16))`
	`set scaledVolume to scaledVolume + 1`
	`if (scaledVolume > 16) then`
	`set scaledVolume to 16`
	`end if`
	`set newVolume to round (scaledVolume / 16 * 100)`
	`set volume output volume newVolume`
	`else if voiceResponse is "lower volume" then`
	`set currentVolume to output volume of (get volume settings)`
	`set scaledVolume to round (currentVolume / (100 / 16))`
	`set scaledVolume to scaledVolume - 1`
	`if (scaledVolume < 0) then`
	`set scaledVolume to 0`
	`end if`
	`set newVolume to round (scaledVolume / 16 * 100)`
	`set volume output volume newVolume`

⑥	`else if voiceResponse is "check email" then`
	`tell application "Mail"`
	`activate`
	`check for new mail`
	`set unreadEmailCount to unread count in inbox`
	`if unreadEmailCount is equal to 0 then`
	`say "You have no unread messages in your Inbox."`
	`else if unreadEmailCount is equal to 1 then`
	`say "You have 1 unread message in your Inbox."`
	`else`
	`say "You have " & unreadEmailCount & ¬`
	`" unread messages in your Inbox."`
	`end if`
	`if unreadEmailCount is greater than 0 then`
	`say "Would you like me to read your unread email to you?"`
	`tell application "SpeechRecognitionServer"`
	`activate`
	`set voiceResponse to listen for {"yes", "no"} ¬`
	`giving up after 1 * minutes`
	`end tell`
	`if voiceResponse is "yes" then`
	`set allMessages to every message in inbox`
	`repeat with aMessage in allMessages`
	`if read status of aMessage is false then`
	`set theSender to sender of aMessage`
	`set {savedDelimiters, AppleScript's text item delimiters} ¬`
	`to {AppleScript's text item delimiters, "<"}`
	`set senderName to first text item of theSender`
	`set AppleScript's text item delimiters ¬`
	`to savedDelimiters`
	`say "From " & senderName`
	`say "Subject: " & subject of aMessage`
	`delay 1`
	`end if`
	`end repeat`
	`end if`
	`end if`
	`end tell`

⑦	`else if voiceResponse is "time" then`
	`set current_time to (time string of (current date))`
	`set {savedDelimiters, AppleScript's text item delimiters} to ¬`
	`{AppleScript's text item delimiters, ":"}`
	`set hours to first text item of current_time`
	`set minutes to the second text item of current_time`
	`set AMPM to third text item of current_time`
	`set AMPM to text 3 thru 5 of AMPM`
	`set AppleScript's text item delimiters to savedDelimiters`
	`say "The time is " & hours & " " & minutes & AMPM`
⑧	`--else if voiceResponse is "make a call" then`
	`-- tell application "Skype"`
	`-- -- A Skype API Security dialog will pop up first`
	`-- -- time accessing Skype with this script.`
	`-- -- Select "Allow this application to use Skype" for ¬`
	`-- -- uninterrupted Skype API access.`
	`-- activate`
	`-- -- replace echo123 Skype Call Testing Service ID with ¬`
	`-- -- phone number or your contact's Skype ID`
	`-- send command "CALL echo123" script name ¬`
	`-- "Place Skype Call"`
	`-- end tell`
	`-- else if voiceResponse is "hang up" then`
	`-- tell application "Skype"`
	`-- quit`
	`-- end tell`
⑨	`else if voiceResponse is "quit app" then`
	`set exitApp to "yes"`
	`say "Listening deactivated. Exiting application."`
	`delay 1`
	`do shell script "killall SpeechRecognitionServer"`
	`end if`
	`end repeat`
	`end timeout`

①

The first thing we should do to keep the script running continuously is wrap the script in two loops. The first is a with timeout... end with loop to prevent the script from timing out. The timeout duration must be set in seconds. In this case, we’re going to run the script for one month (there are roughly 2.6 million seconds in an average month).

The second loop is a while loop that repeats until the exitApp variable is set to yes via the “Quit app” voiceResponse, as shown toward the end of the code listing.

Next, initialize the Speech Recognizer Server and pass it an array of the key words and phrases via the listen for method. We will keep the recognizer alive for a month so it can await incoming commands without having to restart the script when the listening duration times out. You can extend this month-long duration by changing the giving up value.

②

If the incoming phrase is interpreted as lights on, we will open the default browser and direct it to the on URL of our web-enabled light switch. “Lights off” will request the off URL from that project. We can also perform the same open location URL call for the Android door lock project too.

③

Besides triggering URL calls via voice, we can also interact with AppleScript-able OS X applications like iTunes and Mail. In this code snippet, we do the following:

Open iTunes.
Create an empty list array.
Populate that array with every song track in our local iTunes library, eliminating duplicate titles along the way.
Extract the artist names from the array of tracks.
As long as there is at least one artist in the array, pass the array of artist names to the speech recognition server via its listen for method.
Ask the user to pick an artist to listen to. If the user responds with the name of an artist in the library, populate the speech recognizer with the name(s) of that artist’s album(s). Users can also exit the play music routine at this point by saying the word “Cancel.”
If an artist has more than one album in the library, use the same type of procedure as the artist selection process to select the desired artist’s album. Otherwise, start playback of the album immediately.

④

The pause/unpause and stop music commands, along with the next and previous track commands, call iTunes’s similarly named methods.

⑤

The raise and lower volume commands capture the Mac’s current output volume and raises or lowers it equivalent to a single press of the up and down volume keys on the Mac’s keyboard. These commands are especially helpful when having to raise or lower music playback volume hands-free.

⑥

This portion of the script expects that you have already configured your desired email accounts to work with OS X’s built-in Mail application. In the Mail snippet, we do this:

Open Mail.
Poll all configured mail servers for new, unread email messages.
Count the number of unread mail messages in the unified inbox and speak that amount.
If there are any unread messages, ask users if they would like to have their unread messages read to them.
If the user answer is yes, create an array of the unread messages and read the name and the subject line of the email. Otherwise, exit the routine.

⑦

This routine extracts the current time from AppleScript’s current date routine. From there, we do this:

Assign the current time to the string current_time.
Use AppleScript’s savedDelimiters function to split the current_time string via the : delimiter. This breaks the string apart into its constituent hour and minute values. The remainder of the string contains the a.m. or p.m. designation.
Assign these time values to their appropriate variables (hours, minutes, AMPM) and speak them accordingly.

⑧

Uncomment these lines (remove the double-dash [--] characters used to indicate a comment in AppleScript) if you have the Mac Skype client installed and you want to place a hands-free call. Configure the account name of your choice in the echo123 Skype call testing service account.

⑨

This command exits the script and ensures that the speech recognition server process is indeed killed by issuing a killall SpeechRecognitionServer command from the shell.

Once you have entered the script in the AppleScript editor, save it and click the Compile button on the editor’s toolbar. If the script contains any typos or errors, it will fail to compile. Clean up whatever problems occur and make sure you attain a clean compile. Also make sure that your calibrated wireless headset is turned on and the input audio levels are properly set. Turn up the volume on your external speakers loud enough to hear the responses and music playback. Then click the Run button and get ready to talk.

Prev Next