Conversations with a digital assistant — a voice UI case study

When you look at any design, it is an exchange of intent. In a way, every good design is a conversational interface. The prime difference between chatbots née conversational UI and a GUI is this — With the former, the user can be involved in a literal conversation using words and intonation (voice UI) whereas the latter lets the user communicate intent by touching or clicking around.

In this case study, I am going to touch on one important question: ‘What does a conversation constitute?’

A lot of writing on the web talks about a variety of techniques to keep the conversation natural. In order to make it natural, we need to be able to analyze conversations between humans.

Throughout this case study, I will be focusing on the key elements that make a conversation natural through a variety of scenarios and some example statements. As an example, we will be looking at a coffee ordering flow with Siri to demonstrate how the user interacts with different devices in different scenarios to order coffee.

It is important to note that these are examples of design proposals and not how Siri works today.

A NOTE: I don't work for Apple and these are design concepts that happen to use Siri as an example. The intention here is not to redesign Siri, disregard Apple's design work or anything of that sort. This case study is a combination of conversation analysis and a design concept. 

A few scenarios where Siri could help with coffee ordering

There are a few scenarios where I think Siri could help with coffee ordering. I came up with the scenarios based on the following criteria –

Time of the day — late night, early in the morning etc.

Mood of the user — are they rushing, tired, sleepy etc.

Location — driving alone, at home getting ready for work, in a dorm room working on an assignment last minute, etc.

I picked the scenarios where coffee ordering (or ordering anything) would be easy, fast and delightful.

Scenario 1 : Tired traveller who needs coffee

A driver who has been driving alone for long is tired and wants to take a coffee break. He also wants to use the restroom, so he thinks about stopping at the next Starbucks. Since this is one of the only Starbucks within many miles, he spends quite a bit of time in the line to order his coffee before he can sit down with it.

Current experience

Siri could help by helping the driver place an order as he is driving to the coffee place.

How Siri could help

Scenario 2 : Student in dorm room trying to pull an all-nighter.

A student working through an assignment the night before deadline is bored and is expressing his current state (sleepiness) to Siri. As Siri works today, this scenario leads to Siri expressing care for the human by ensuring he/she is not driving or that there is nothing wrong with feeling sleepy. Can Siri do more?

Scenario 3 : Groggy office goer who takes the subway to work.

A groggy office goer on his daily commute thinks about grabbing a cup of coffee on the way to his office from the train station. The coffee place has a long line and as a result, this person goes straight to office without getting coffee.

Current experience

Siri could help by helping him order a cup of coffee through the watch and pick it up as soon as he gets off the train.

How Siri could help

Scenario 4: The late night blogger who loves his smart coffee machine.

A late night blogger is excited about the even he is live blogging, but ends up consuming too much coffee from his “smart coffee maker”. 4 hrs later, he finds himself wide awake as he battles to sleep.

Current experience

Siri could help the blogger by suggesting a decaf so that he doesn’t consume too much caffeine.

How Siri could help

Conversational analysis — Coffee ordering

With the scenarios in place, the next step is to form statements that the assistant can take in as inputs. I have included statements that range from a broad, polite request to an indirect intent.

How were the statements chosen?

  1. For ideas on what people say when they step into a coffee shop, I recorded real conversations between people at coffee shops and observed the barista-customer relationship.
  2. I mapped what the user could say to the coffee ordering flow — by keeping the advantages of conversational design in mind.


I went to coffee shops and observed the interaction between the Barista and the Customer. Both these coffee shops are local coffee shops (not chains) with a lot of repeat customers.
I chose these over a prominent chain because the barista- customer relationship is deeper in these coffee shops. Much like an assistant like Siri, the barista knows the customer and their preferences well.

From my observation of barista-customer conversations, each statement can tell a lot about the conversation as a whole –

* The request and User’s intent
* User’s mood and or situation
* A hint towards the direction the conversation may take

Analysis of the statements

Keeping the flow above in mind, let us analyse some of the statements. Why is this analysis important? The type of statement is a key factor in determining how the assistant responds naturally.

Analyzing the type of statement is a key factor in determining how the assistant responds naturally.

“Can I have some coffee?”

In my observation, the people used a combination of terms to suggest a declaration. Thus, it is important to support different kinds of initiation statements.

In the above statement, the request — Can I have + Coffee — clearly tells us that the user needs coffee. The word some guides the conversation towards an exchange of menu and choice (or presenting a menu in a multimodal interface).

In essence, this statement is a “Request” with some clear directions.

“I just need a latte.”

In this statement, the user’s state of mind is clearly reflected. The person really needs a coffee (not a nice to have) and wants it quickly. The just in the statement shows a clear emphasis on what they need. ‘a latte’ makes the declaration pretty concrete, as the user has specified the quantity (one) and the type of drink (a type of coffee) in just two short words.

In essence, this statement is a very clear directive.

“I’d like to have a hot latte; small, with non-fat milk”

In this statement, the user is very polite with the request but the latter part of the request is extremely clear in directions leaving very little customization. The ‘;’ shows what may be coming — completing the adjacency pair of request and customization.

Depending on the tone, Siri should be able to tell if the user is going to customize the request.

In essence, this statement is a very fine directive.

Full Conversation

With some of these statements, and the aforementioned coffee ordering flow, let’s stitch everything together into one full conversation.

What goes into a conversation?

Learning from the basics of conversation analysis,

Adjacency Pairs — greeting is met with a greeting, request is answered with an accept/reject, complaint is answered with apology/excuse/remedy and so on. Adjacency pairs are the most basic building block of a conversation.

Selection of next speaker through implicit or explicit addressing. Examples are “What would you like to do.. “, “you can…”, “I need you to..”

Commit Tokens — People commit to requests by saying certain things like “Sure”, “Consider it Done” and so on.

Advance Remedy — This goes with commit tokens. If something cannot be done right away, Siri builds the right expectation by providing advance remedy like “this will be done when..” or “before we get to that, I need..”

Coffee ordering on different devices


Author: Shankar

Collect by: