According to all software giants, voice-based UI and conversational AI represent the next-gen computing interface. After decades of lower-level approaches, including character-based, graphical, Web, and mobile interfaces, it’s time to use voice.
This is a huge advancement in the way we interact with computers. Unlike menus, touchscreens, or mouse clicks, voice conversation is one of the most natural ways to communicate; it requires no learning curve if the listener is clever enough. It has some drawbacks, too.
Voice has been dubbed as a ready-to-use interface more than once in the past, but now, thanks to the voice-based UI (or VUI, Voice User Interface), it is the time for this to come true. Current technology offers both an AI back office to cope with the meaning of each language, and a hardware-driven ecosystem hungry for applications. In the voice paradigm, the situation is central to both sides of the conversation.
Amazon’s evangelist at Codemotion Milan 2019 has been Andrea Muttoni, the Italian Senior Solution Architect at Alexa, who is also a VUI technology evangelist. Andrea is also a musician: he enjoys composing music, another way to approach alien languages.
VUI: situational Design is the masterplan
A comparison between audio and video experiences can help us through the VUI world. We often think as if every programming pattern should please a human being, but this is not the truth. Let’s compare video and audio contents: video expects uniformity, while audio needs variety. With Alexa, as with any other voice processing device, you have to write for the ears, not for the eyes.
Voice makes easy to build a mock conversation, a true script, to exploit the potentiality of the situation. There is the need to abstract from the basic programming and focus on the situation: that’s why Muttoni prefers not to call this voice programming, but “situational design“, SitDes in short. That’s why Andrea’s speech was entitled “Situational Design – a New Way to Design for Voice“.
Thinking to old IVR designs now we have the certainty of no nesting, no flowcharting: what a change! “SitDes is what we see everyday”, says Andrea; “it’s not an obligation, but nothing else than a proposal, directly coming from the experience of the Alexa staff”.
This change can be good, or bad. SitDes asks to be crystal clear. We humans sometimes ask imprecise questions, sometimes we chain words and phrases to say the same thing, sometimes we tell jokes. A well-designed voice experience becomes a conversation when it takes all of these elements into account.
Natural-born voice variations
The building block of this new approach is the situation card, bringing four basic elements: utterance, situation, response, and prompt.
The utterance is what the user says; the situation is the contest; the response is what Alexa says immediately; the prompt is what Alexa asks.
One card describes one situation, while more cards can be combined to build a full storyboard.
More cards for the same situation can be created inside a single storyboard. These are called variations and show themselves very useful to have a natural conversation.
Variations are very interesting, as they model a true conversational approach. The voice assistant can switch to a “courtesy mode” if the software detects the person’s voice is more stressed than usual. Children can interact with a “child mode”; all voices could be modelled over the voice of a Vip or Champ or whatsoever. Multilingual systems are also being pursued.
“We really believe that voice programming will be one of the new big things in VUI development”, states Muttoni. This wave has to be supported, so “Amazon is not charging anything for software development to help to develop a strong community and many use cases”; for any success story, then, “many resources will be needed from the pool that Amazon sells regularly”.
The whole process is streamlined and easy to implement, also thanks an online guide provided by Amazon. The four phases of an Alexa skill development are design, build, test, and certificate, and are all free. Part of them can be implemented as lambda functions, thus exploiting the serverless business model.
The Alexa trend also shows a significant increment in a variety of hardware devices. In this case, the verification process needs a small fee to pay the service company who performs the certification.
Having most new devices as embedded in IoT mobile devices, a completely new family of services that are based on navigation APIs will soon hit the market. Here technologies, heir to Navteq and Nokia development, will be one of the keys to this new market. Car assistants are the second wave of voice-enabled mobile devices after smartphones.
In Codemotion Milan 2019, Michael Palermo’s speech, “Integrating Location in Conversational UX“, helped developers to understand and integrate map data and related location services. Before working at Here, Michael evangelized “smart home” at Amazon on the Alexa team. VUI and voice-enabled solutions need a different approach to localization, too.