Murmuring, Stuttering with Just a Pinch of Mumbling

There’s an awesome new standard for web developers that has been floating around the web for a short time that allows you to speak to your web apps and web sites. While the documentation is currently an evolving specification for the new standard, you can still dig in without too much trepidation and find yourself mesmerized by the sheer brilliance of modern technology…or find yourself potentially chucking your old $10 Logitech headset in favor of something with a better microphone.

I am, of course, writing to you today on Google’s fancy new Web Speech API.

What is this new API you ask? Well, for an extremely short description, it is a way for you to allow people to interact with your website or web app with simply their voice. It doesn’t end there, either, as it also allows your site to talk back through their Speech Synthesis. Sounds an awful lot like Siri, Cortana or Google Now, but of course it comes with no almost SI like understanding of what you want from it. No, this is just a means of interacting with the person on the outside of all of the 1’s and 0’s…and what a simple method of interaction it is!

I implemented it recently within an AngularJS project that I have been working on called Project Voxie (More posts on that topic soon). I utilized Angular’s awesome service model to abstract away a good chunk of the API workings away from the controller code:

Speech API Service

voxie.service('speechSVC', [
    '$window', 'speechMockSVC', function($window, speechMockSVC) {
    return function() {
        var speechRec, speechSVC;
        speechRec = $window.SpeechRecognition || $window.webkitSpeechRecognition || speechMockSVC;
        speechSVC = new speechRec();
        speechSVC.continuous = true;
        speechSVC.interimResults = true;
        speechSVC.onResultCallback = angular.noop;
        speechSVC.onresult = function(event) {
            if (speechSVC.onResultCallback) {
                return speechSVC.onResultCallback(event.results);
        speechSVC.onerror = function(event) {
            console.log(event.error + ": " + event.message);
            if (speechSVC.onResultCallback) {
                return speechSVC.onResultCallback("Unfortunately, speech recognition has failed.");
        return speechSVC;
voxie.service('speechMockSVC', [function() {
    return function() {
        var NOT_SUPPORTED, showError, speechMock;
        NOT_SUPPORTED = "Your browser does not support speech recognition. If you wish to use this application, please upgrade to a browser with speech recognition support.";
        showError = function() {
            return $window.alert(NOT_SUPPORTED);
        return speechMock = {
            start: function() {
                return showError();
            stop: function() {
                return showError();

I used the service to provide an extended public API to my controller where I could specify a callback to perform once I had some results from Google. I added in support for the non-prefixed API (Chrome utilizes the webkitSpeechRecognition, but the specs call it simply SpeechRecognition), which also gave me the ability to handle any browser that didn’t have any of the speech API’s loaded on the window.

As you can see, it’s fairly simple. You tell it whether or not it should continue recording for an extended period of time (continuous), whether or not it should send you any fuzzy results that haven’t been confidently matched yet (interimResults), and wire up your success and error events. Once complete, you can use it within your controller:

Speech API Controller Integration

$scope.listen = function() {
    var listener;
    listener = speechSVC();
    listener.onResultCallback = discernCommand;
    return listener.start();


Here, discernCommand is simply a callback that joins the results array into a single string in order to be displayed on the page. The $scope.listen function is wired to a button as it was incredibly difficult to get it to work automatically on page load alongside Angular’s digest cycle. I’m not saying it can’t be done…but I’m not entirely sure that the two can play nicely together either. So for now, it’s wired to a button. When pressed, your callback is hooked into your service, and it starts recording. Pretty easy.


What’s any write up of a new technology without a bit of a review at the end? The API is fairly easy to use once you decipher the specs. It integrates fairly well with the well known JS libraries (with a few idiosyncrasies like the aforementioned auto-listening and Angular’s digest cycle) and is…well…almost accurate. This is the most heart-breaking portion of it all. When I said “contest”, it sent back “Comcast”. Blegh. Insult on injury with that match up. But, I am hopeful, as Google Now seems to be fairly accurate, and who knows, it may be due to my $10 logitech headset. So, for now, I would stick to playing around with it until the results become more accurate and it becomes more widely accepted amongst browsers.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s