10.5 Programming a Talking Lion
Writing good speech synthesis and recognition software is hard. That’s why we’re going to take advantage of all the hard work Apple speech software engineers have poured into OS X. The engine can be accessed a variety of ways, via the preferred method of Objective-C to Perl, Python, Ruby, and other scripting language hooks. But the easiest way I have found to tinker and quickly modify and test on the fly is via AppleScript.
I’ll be the first to admit that I am not a big fan of AppleScript. Its attempt to turn script writing into a natural English sentence structure works only on a superficial level. It breaks down pretty quickly for any intermediate developer fluent in more elegant scripting languages like Ruby or Python. Even simple tasks like string manipulation turn out to be a real pain in AppleScript. That said, AppleScript trumps these other languages when it comes to effortless automation integration with other AppleScript-aware OS X applications. Bundled programs like iTunes, Mail, Safari, and Finder are fully scriptable, as are a number of third-party OS X programs like Skype, Microsoft Office, and the like. In the case of this project, Apple’s speech recognition server is also highly scriptable, and that’s what we’re going to call upon in this project to make the magic work.
While AppleScript can be written using any text editor, it should come as no surprise that it’s best hosted within the AppleScript Editor application. This can be found in the Applications/Utilities folder. Launching the AppleScript editor for the first time will open a blank, two-pane coding window. The top half of the window is used to enter code, while the bottom consists of three tabs for monitoring events, replies, and results of the executing script. The editor aids in writing script by color coding AppleScript syntax, but it doesn’t offer IDE-friendlier features like code completion or on-the-fly compiling. Fortunately, scripts are typically short, so these omissions are not crippling.
AppleScript has its own vocabulary, keywords, and idioms. Learning AppleScript isn’t difficult, but it can get maddening at times when you have to massage the syntax just right to make the script do what you intended. For example, parsing a string for an email address is easy in most scripting languages. Not so in AppleScript. Partly due to its historical ties and partly due to the way AppleScript expects you to work, it’s complicated. So with regard to the code we will write for this project, you will just have to trust me and try to follow along. If you find AppleScript to your liking or want to see what else it can do to further extend the code for this project, review Apple’s online documentation for more information.[108]
Before writing the script, let’s think about what we want it to do. First, we want it to respond to a select group of spoken words or phrases and act on those commands accordingly. What commands should we elicit? For starters, how about having the script hit the URLs we exposed in some of our networked projects, like the Web-Enabled Light Switch or the Android Door Lock? While we’re at it, let’s make use of some of the bundled OS X applications like Mail and iTunes to check and read our unread email and play music we want to hear. Let’s also ask our house what time it is.
We need to initialize the SpeechRecognitionServer application and populate
the set of words or phrases that we want it to listen to. Using a
series of if/then statements, we can react to those recognized
commands accordingly. For example, if we ask the computer to play
music, we will call upon the iTunes application to take an
inventory of music tracks in its library, sort these by artist and
album, populate these as more words/phrases to interpret, and have
the text-to-speech engine ask us which artist and album we want to
listen to. Similarly, we can have our unread email read to us via a
check mail command. Doing so will launch the Mail application, poll
your preconfigured Mail accounts for new mail, check the inbox for
unread messages, and perform a text-to-speech reading of unread
sender names and message titles.
Now let’s take a closer look at the details of the script’s execution. Here’s the full script in its entirety. Most of the syntax should be easy to follow, even if you are not familiar with AppleScript.
| GivingYourHomeAVoice/osx-voice-automation.scpt | |
with
timeout of 2629743
seconds |
|
set exitApp to "no" |
|
repeat while exitApp is "no" |
|
| ① |
tell application "SpeechRecognitionServer" |
activate |
|
try |
|
set voiceResponse to listen for
{"light on", "light
off", ¬ |
|
"unlock door", "play
music", "pause music", ¬ |
|
"unpause music", "stop
music", "next track", ¬ |
|
"raise volume", "lower
volume", ¬ |
|
"previous track", "check
email", "time", "make a call", ¬ |
|
"hang up", "quit
app"} giving up after
2629743 |
|
on error -- time
out |
|
return |
|
end try |
|
end tell |
|
|
|
| ② |
if voiceResponse is "light on" then
|
-- open URL to turn on Light
Switch |
|
open
location "http://192.168.1.100:3344/command/on" |
|
say
"The light is now on." |
|
|
|
else if voiceResponse is "light
off" then |
|
-- open URL to turn off Light
Switch |
|
open
location "http://192.168.1.100:3344/command/off" |
|
say
"The light is now off." |
|
|
|
else if voiceResponse is "unlock
door" then |
|
-- open URL to unlock Android Door
Lock |
|
open
location "http://192.168.1.230:8000" |
|
say
"Unlocking the door." |
|
|
|
| ③ |
else if voiceResponse is "play
music" then |
tell application "iTunes" |
|
set musicList to {"Cancel"}
as list |
|
set myList to (get artist
of every track ¬ |
|
of playlist 1) as list |
|
repeat with myItem in
myList |
|
if musicList does not contain myItem then |
|
set musicList to musicList & myItem |
|
end if |
|
end repeat |
|
end tell |
|
|
|
say
"Which artist would you like to listen
to?" |
|
tell application "SpeechRecognitionServer" |
|
set theArtistListing to ¬ |
|
(listen
for musicList with prompt musicList) |
|
end tell |
|
if theArtistListing is
not "Cancel" then |
|
say
"Which of " & theArtistListing &
¬ |
|
"'s albums would you like to listen
to?" |
|
tell application "iTunes" |
|
tell source "Library" |
|
tell library playlist
1 |
|
set uniqueAlbumList to {} |
|
set albumList to album of
tracks ¬ |
|
where
artist is equal to
theArtistListing |
|
|
|
repeat until albumList =
{} |
|
if uniqueAlbumList does
not contain ¬ |
|
(first
item of albumList) then |
|
copy (first item of albumList) to
end of ¬ |
|
uniqueAlbumList |
|
end if |
|
set albumList to rest of
albumList |
|
end repeat |
|
|
|
set theUniqueAlbumList
to {"Cancel"} & uniqueAlbumList |
|
tell application "SpeechRecognitionServer" |
|
set theAlbum to (listen for
the theUniqueAlbumList ¬ |
|
with prompt
theUniqueAlbumList) |
|
end tell |
|
end tell |
|
if theAlbum is not "Cancel"
then |
|
if not ((name
of playlists) contains "Current Album") then |
|
set theAlbumPlaylist to ¬ |
|
make
new playlist with properties
{name:"Current
Album"} |
|
else |
|
set theAlbumPlaylist to playlist "Current
Album" |
|
delete every track of theAlbumPlaylist |
|
end if |
|
tell library playlist 1
to duplicate ¬ |
|
(every
track whose album is theAlbum) to
theAlbumPlaylist |
|
play theAlbumPlaylist |
|
else |
|
say
"Canceling music selection" |
|
end if |
|
end tell |
|
end tell |
|
else |
|
say
"Canceling music selection" |
|
end if |
|
|
|
| ④ |
else if voiceResponse is "pause
music" or ¬ |
voiceResponse is "unpause music"
then |
|
tell application "iTunes" |
|
playpause |
|
end tell |
|
|
|
else if voiceResponse is "stop
music" then |
|
tell application "iTunes" |
|
stop |
|
end tell |
|
|
|
else if voiceResponse is "next
track" then |
|
tell application "iTunes" |
|
next
track |
|
end tell |
|
|
|
else if voiceResponse is "previous
track" then |
|
tell application "iTunes" |
|
previous track |
|
end tell |
|
|
|
-- Raise and lower volume routines courtesy of
HexMonkey's post: |
|
--
http://forums.macrumors.com/showthread.php?t=144749 |
|
| ⑤ |
else if voiceResponse is "raise
volume" then |
set currentVolume to output volume of (get volume
settings) |
|
set scaledVolume to round
(currentVolume / (100 / 16)) |
|
set scaledVolume to scaledVolume + 1 |
|
if (scaledVolume > 16)
then |
|
set scaledVolume to 16 |
|
end if |
|
set newVolume to round
(scaledVolume / 16 * 100) |
|
set volume output volume
newVolume |
|
else if voiceResponse is "lower
volume" then |
|
set currentVolume to output volume of (get volume
settings) |
|
set scaledVolume to round
(currentVolume / (100 / 16)) |
|
set scaledVolume to scaledVolume - 1 |
|
if (scaledVolume < 0)
then |
|
set scaledVolume to 0 |
|
end if |
|
set newVolume to round
(scaledVolume / 16 * 100) |
|
set volume output volume
newVolume |
|
|
|
| ⑥ |
else if voiceResponse is "check
email" then |
tell application "Mail" |
|
activate |
|
check
for new mail |
|
set unreadEmailCount to unread count in inbox |
|
if unreadEmailCount is equal
to 0 then |
|
say
"You have no unread messages in your
Inbox." |
|
else if unreadEmailCount is equal to 1 then |
|
say
"You have 1 unread message in your
Inbox." |
|
else |
|
say
"You have " & unreadEmailCount &
¬ |
|
" unread messages in your
Inbox." |
|
end if |
|
if unreadEmailCount is greater than
0 then |
|
say
"Would you like me to read your unread email to
you?" |
|
tell application "SpeechRecognitionServer" |
|
activate |
|
set voiceResponse to listen for
{"yes", "no"}
¬ |
|
giving
up after 1 * minutes |
|
end tell |
|
if voiceResponse is "yes" then |
|
set allMessages to every message
in inbox |
|
repeat with aMessage in
allMessages |
|
if read status of
aMessage is false then |
|
set theSender to sender of
aMessage |
|
set {savedDelimiters,
AppleScript's text item delimiters}
¬ |
|
to {AppleScript's text item delimiters, "<"} |
|
set senderName to first text
item of theSender |
|
set AppleScript's text item delimiters ¬ |
|
to savedDelimiters |
|
say
"From " & senderName |
|
say
"Subject: " & subject of aMessage |
|
delay
1 |
|
end if |
|
end repeat |
|
end if |
|
end if |
|
end tell |
|
|
|
| ⑦ |
else if voiceResponse is "time" then
|
set current_time to (time
string of (current date)) |
|
set {savedDelimiters,
AppleScript's text item delimiters} to
¬ |
|
{AppleScript's text item delimiters,
":"} |
|
set hours to first text
item of current_time |
|
set minutes to the second text item of
current_time |
|
set AMPM to third text
item of current_time |
|
set AMPM to text 3 thru 5
of AMPM |
|
set AppleScript's text item delimiters to
savedDelimiters |
|
say
"The time is " & hours & " " & minutes & AMPM |
|
| ⑧ |
--else if voiceResponse is "make a call"
then |
-- tell application "Skype" |
|
-- -- A Skype API Security dialog will pop up
first |
|
-- -- time accessing Skype with this
script. |
|
-- -- Select "Allow this application to use
Skype" for ¬ |
|
-- -- uninterrupted Skype API
access. |
|
-- activate |
|
-- -- replace echo123 Skype Call Testing
Service ID with ¬ |
|
-- -- phone number or your contact's Skype
ID |
|
-- send command "CALL echo123" script name
¬ |
|
-- "Place Skype Call" |
|
-- end tell |
|
-- else if voiceResponse is "hang up"
then |
|
-- tell application "Skype" |
|
-- quit |
|
-- end tell |
|
| ⑨ |
else if voiceResponse is "quit
app" then |
set exitApp to "yes" |
|
say
"Listening deactivated. Exiting
application." |
|
delay
1 |
|
do shell script "killall SpeechRecognitionServer" |
|
end if |
|
end repeat |
|
end
timeout |
|
- ①
-
The first thing we should do to keep the script running continuously is wrap the script in two loops. The first is a
with timeout... end withloop to prevent the script from timing out. The timeout duration must be set in seconds. In this case, we’re going to run the script for one month (there are roughly 2.6 million seconds in an average month).The second loop is a
whileloop that repeats until theexitAppvariable is set to yes via the “Quit app”voiceResponse, as shown toward the end of the code listing.Next, initialize the Speech Recognizer Server and pass it an array of the key words and phrases via the
listen formethod. We will keep the recognizer alive for a month so it can await incoming commands without having to restart the script when the listening duration times out. You can extend this month-long duration by changing thegiving upvalue. - ②
-
If the incoming phrase is interpreted as lights on, we will open the default browser and direct it to the on URL of our web-enabled light switch. “Lights off” will request the off URL from that project. We can also perform the same
open locationURL call for the Android door lock project too. - ③
-
Besides triggering URL calls via voice, we can also interact with AppleScript-able OS X applications like iTunes and Mail. In this code snippet, we do the following:
-
Open iTunes.
-
Create an empty list array.
-
Populate that array with every song track in our local iTunes library, eliminating duplicate titles along the way.
-
Extract the artist names from the array of tracks.
-
As long as there is at least one artist in the array, pass the array of artist names to the speech recognition server via its
listen formethod. -
Ask the user to pick an artist to listen to. If the user responds with the name of an artist in the library, populate the speech recognizer with the name(s) of that artist’s album(s). Users can also exit the
play musicroutine at this point by saying the word “Cancel.” -
If an artist has more than one album in the library, use the same type of procedure as the artist selection process to select the desired artist’s album. Otherwise, start playback of the album immediately.
-
- ④
-
The pause/unpause and stop music commands, along with the next and previous track commands, call iTunes’s similarly named methods.
- ⑤
-
The raise and lower volume commands capture the Mac’s current output volume and raises or lowers it equivalent to a single press of the up and down volume keys on the Mac’s keyboard. These commands are especially helpful when having to raise or lower music playback volume hands-free.
- ⑥
-
This portion of the script expects that you have already configured your desired email accounts to work with OS X’s built-in Mail application. In the Mail snippet, we do this:
-
Open Mail.
-
Poll all configured mail servers for new, unread email messages.
-
Count the number of unread mail messages in the unified inbox and speak that amount.
-
If there are any unread messages, ask users if they would like to have their unread messages read to them.
-
If the user answer is yes, create an array of the unread messages and read the name and the subject line of the email. Otherwise, exit the routine.
-
- ⑦
-
This routine extracts the current time from AppleScript’s
current dateroutine. From there, we do this:-
Assign the current time to the string
current_time. -
Use AppleScript’s
savedDelimitersfunction to split thecurrent_timestring via the : delimiter. This breaks the string apart into its constituent hour and minute values. The remainder of the string contains the a.m. or p.m. designation. -
Assign these time values to their appropriate variables (hours, minutes, AMPM) and speak them accordingly.
-
- ⑧
-
Uncomment these lines (remove the double-dash [--] characters used to indicate a comment in AppleScript) if you have the Mac Skype client installed and you want to place a hands-free call. Configure the account name of your choice in the echo123 Skype call testing service account.
- ⑨
-
This command exits the script and ensures that the speech recognition server process is indeed killed by issuing a
killall SpeechRecognitionServercommand from the shell.
Once you have entered the script in the AppleScript editor, save it and click the Compile button on the editor’s toolbar. If the script contains any typos or errors, it will fail to compile. Clean up whatever problems occur and make sure you attain a clean compile. Also make sure that your calibrated wireless headset is turned on and the input audio levels are properly set. Turn up the volume on your external speakers loud enough to hear the responses and music playback. Then click the Run button and get ready to talk.