These past weeks have seen the Beta release of Cortana. The sidekick heroine of the Halo series represents Microsoft’s foray into the ever more sophisticated digital assistant field. Cortana, like Apple’s Siri, is intended to support Windows users through a complex mix of AI and audio recognition software.
Virtual sidekicks were long confined to the realms of science fiction. Complex digital (personal) assistants such as Halo’s Master Chief’s Cortana, and Samantha in the recent film “Her” have always seemed much too complex to bring to reality. However, with each passing year and iteration we take one step closer to making that fiction a reality.
What implications does this voice-activated future have on the field of UX – and more specifically it’s counterpart, User Interface (UI)?
First things first, if you’re yet to watch the film “Her”, do it. From a UX perspective it is a fascinating watch. Set in 2025 it provides us with a glimpse of what the near future may hold. Touchscreens already appear a thing of the past as digital assistants have taken over. On the daily commute everyone is seen and heard confined to their own little bubble, talking to their device – commanding it. Just about everything we use our phones for (Email, news, calls) is now controlled by the assistant; audio replacing touch and visual. The man we follow, Theodore, only pulls out his mobile device when absolutely necessary – for the camera or for visuals.
This is a world entirely dependant on digital assistants. Devices are built around them, UIs are designed around them. Is this truly a glimpse into the future, or merely an artistic representation?
Is a voice activated future feasable?
Conversations with Siri allow the user control of their device
The focus of Her paints an artistic future, portraying an Artificially Intelligent (AI) digital assistant with thought and reasoning – a human without a body – a far cry from the technologies we have today. Indeed, having our very own Samantha is a little far fetch for even 11 years; what we see before her arrival, less so.
Apple (Siri), Microsoft (Cortana/Kinect), and Google (Google Now) have made huge strides in the intuitiveness and ‘intelligence’ of this assistive software. Once the barrier of interpreting our voices was overcome, we’ve seen an explosion in the applications of it. Google especially has shown how it can scan information such as emails and calendars to conveniently remind us of what we should be doing at this very moment. Siri has allowed user to perform tasks handsfree, and even have some interesting (if slightly scripted) conversation. Kinect, like Siri, helps create a hands-free living room on the Xbox One with the ability to voice commands (with some great results..). Cortana, soon to be integrated with Kinect, looks all the more interesting.
Google Now predicts the user’s behaviour through analysis of available user data
So feasible, yes. Huge gains have been made in the past few years which shall only continue into the future. Though fully artificial intelligence could be a way off for the time being, this two way interaction with our devices creates emotional connections which can only improve the User Experience. Producing memorable comedic moments, and creating a sense of trust through hearing the voice of another. Still, Operating System love will have to wait for now.
Roadblocks of the digitally assisted future
I feel the largest roadblock for the technology lies not with itself, but with us. In the film Her we see commuters idly talking to their smartphones on the walk to work or on the subway. It is this social barrier I feel that is the hardest to overcome. I, like many others, don’t feel too comfortable walking down the street commanding my smartphone. Even less sitting at home talking to my Xbox. Not to mention the privacy connotations such ubiquitous voice activation could have. What happens to the incognito search? I’d rather not have the whole bus know what I’m looking at. Of course, the classic pub quiz may benefit here..!
Crowded subway train with everyone talking to their device. Potential privacy issues?
Trust is a further issue. Though hearing a fellow human voice should increase trust, we still have the knowledge we are talking to a corporate controlled robot of sorts. Will users be willing to talk about private issues, unbeknownst on where that information will end up? This could create key limitations to the UX and overall usefulness of such devices.
The barriers to a voice activated world then may not be technological, but rather social. Society however, has an uncanny ability to adapt.
The Effect on Devices
The growth in digital assistants and their underlying technologies will undoubtedly open up a whole swath of new options for mobile and wearable devices. The usefulness of smartwatches and items like Google Glass has long been in question; how exactly can we control a device with little or no room for user input? Voice commands remove this strain. Interfaces can do away with space-hogging input elements and move towards a more visual and content rich environment.
This is helpful for Google Glass in particular. Being inconvenient for touch controls, voice is the perfect way in which to command it – at the cost of looking slightly silly. Aside from mobile devices, static applications – around the home for instance – could be of huge benefit. We currently have voice activation for the Xbox One, Samsung Smart TVs and soon Windows 8.
Lately we saw Google’s acquisition of Nest, a smart thermostat system which learns our daily habits and adjusts accordingly. This brings into play the possibility of Smart Homes. Voice controlled lights, heating, music, and whatever else you can think of – as well as audio feedback on those devices. With sound’s flexibility and compactness over the constraints of touch and gesture control, a plethora of opportunities will open up.
The Effect on UI
Simple, tailormade, user Interfaces are a possiblity
As the primary input method moves from touches and gestures towards voice, less screen real estate needs to be consumed by controls and the like. This opens up more area for providing rich visual content. However, even this visual content may see a decline. More advanced digital assistants allow us to be spoken to. To have emails, calendar reminders, the weather, and the news all read to us – we can vocally react to what we hear to delve deeper.
For UI this should mean a continuation of the minimalism trend. With less need for essential buttons and controls there is greater flexibility. Her’s interface designer, Geoff McFeteridge, comments on this flexibility. He notes that this allows interfaces to be tailored more to individual users due to fewer design constraints brought about by essential inclusions.
“I do feel like the future Apple is going to be really different than Apple is now,” he says. For one, interface design won’t be standard, and it all won’t emanate from a single corporation, McFetridge thinks. “The horsepower of a single person is going to be so magnified in the future so this interface could be made by a 17-year-old.”
Each interface can be designed to reflect it’s user’s personality, rather than to reflect the device it is being used on.
All of this is of course fanciful thinking. Yesterday Siri presented me with a recipe for home baked cookies when I asked for the latest sports scores. Either Siri is more advanced than we think, and knows something I don’t, or we have a while to wait yet. I can’t see myself falling in love with Cortana any time soon, but the future looks exciting regardless.