Intelligent servants, eloquent parrots and conversational user interface

My colleague, a data scientist, completely unimpressed by the hype around ChatGPT, summed it up as, “In the end, it’s just an eloquent parrot”. So what’s so special about it?

The setting 

Let me start with the big picture. Human-computer interaction (HCI) relies on three primary metaphors or underlying models: the tool (device), the machine (robot), and the social actor (companion). Each of these models shapes user expectations and, in turn, the core affordance of the experience.

Adobe Photoshop or an iPad are examples of a tool model designed to be effective, easy to learn, transparent, and flexible. A browser is an example of a machine model, because it takes in a web address (input) and outputs a website. Siri or Alexa are examples of a companion model.

Tools are expected to be transparent in their operation and built to be manipulated, machines are valued for visibility of internal strategies and control over automated processes, and social agents are expected to anticipate user needs and be polite. (Politeness, by the way, is used to assure users that they’re are in charge of the intentionally unbalanced relationship, not to respect human etiquette.)

The road to companion development is paved with well-known experiments, such as Microsoft’s Clippy, which is the most annoying example, or Siri, which some say is the most widely used cooking timer.

The dreams of a virtual servant

The dream for a computer that accompanies the user and effortlessly anticipates and fulfills their wants and needs predates Google Assistant or Alexa.

Here’s a quote from Nicholas Negroponte in 1995:

“The best metaphor I can conceive of for a human-computer interface is that of a well-trained English butler. The ‘agent’ answers the phone, recognizes the callers, disturbs you when appropriate, and may even tell a white lie on your behalf. The same agent is welltrained in timing, versed in finding the opportune moments, and respectful of idiosyncrasies. People who know the butler enjoy considerable advantage over a total strangers. That is just fine … Such ‘interface agents’ are buildable … It has become obvious that people want to delegate more functions and prefer to directly manipulate computers less. The idea is to build computer surrogates that possess a body of knowledge both about something (a process, a field of interest, a way of doing ) and about you in relation to that something (your taste, your inclinations, your acquaintances).”

– Nicholas Negroponte, Being Digital

Here’s an even older vision of the future, Apple’s Knowledge Navigator from 1987. (Unlike Siri, it still had a face).

What would Freud say about these dreams?

There is no effective product design (or psychotherapy) without understanding the user. So why are we obsess so obsessed with it?

“The fantasy persists because it reflects deeper psychological needs, rooted in human infancy. The master whose unspoken needs are met so completely and seamlessly, the servant who meets those needs without being asked or expecting anything in return can be interpreted in psychoanalytic terms as a variation on the infantile fantasy of the perfect parent. And unlike a human mother, a virtual butler makes no emotional demands on us in return for hist absolute loyalty.” (…) The servant fantasy reflects a desire to have our needs met absolutely without the trouble of actively engaging with complex information, ambiguous real-world choices or emotionally demanding human beings.”

– Janet H. Murray, Inventing the Medium

What makes it seem human-like?

Let’s go back even further: Joseph Weizenbaum developed the first computer software for natural language processing in the 1960s. It mimicked a psychotherapist, and a Carl Rogers-type (think lots of open-ended questions and mirroring back patient responses). It provided scripted responses at a level good enough to give its users the illusion of understanding, using a system of keywords, rules, and templates (but no real knowledge or large data sets).

In the words of its inventor, “I did not realise … that an extremely brief exposure to a relatively simple computer program could induce strong delusional thinking in quite normal people.” [source]

Think about it: even when the user understood that it was not true, an algorithmically programmed prompter was sufficient to convey much more understanding, emotional involvement, or interest than was (and is) technically possible. The perceived responsiveness of the machine, rather than the size of the data set or the mansplanatory confidence, was enough for users to fall for anthropomorphisation strong enough to evoke human-to-human interaction.

Numerous subsequent studies show that this type of reaction is automatic, inevitable, and more common than people realise. When you combine this with the cognitive bias of associative thinking – if it speaks language, it must have knowledge of the domain or user – it is hard not to believe that the Generative Pretrained Transformer genuinely understands and knows.

This brings us to 

Conversation as interface

Conversational interfaces use language, whether text or speech (at least for now). Language is a purely human and non-instinctive method of communicating ideas, emotions, and desires through voluntarily produced symbols (Credits: Edward Sapir). In a sense, language serves as an interface to a rich, dynamic, and multidimensional human experience.

Conversational interface, even though human-like, has some limitations. Dialogue is not always the most effective way to convey information because it is linear and sequential. It also requires a certain level of redundancy to ensure reliable and understandable communication, which may not be practical in all scenarios. For these reasons, discovery (‘How do I learn what the app can do?’) and feedback (‘How do I know the app is doing the right thing?’) are usually only achievable through trial and error.

Manipulation using conversational interfaces is far more difficult than it appears, as you may have experienced, or noticed with an increasing number of prompt engineering sites and elaborate techniques for describing the desired end result.

Last but not least, designers are often reminded to incorporate non-volatile memory to supplement human short-term memory, as the brain is not optimised for abstract thinking or storing data and has certain cognitive limitations. For this reason, we take notes during a conversation, scroll through ChatGPT, or ask Siri to repeat instructions.

Consequently, the complexity of conversational interfaces is limited by the degree of built-in automations. Even then, complex tasks may still require complex user interfaces that the language is not able to provide in an efficient manner.

So where are we now?

We are somewhere between the dream of a sentient assistant and a state-of-the-art natural language processing interface. How big is the gap? There is no simple answer to that.

Anticipating user needs is a hybrid of prediction and mind reading. Designing user experiences that rely on accurate prediction of user behaviour is notoriously difficult; anyone who has tried can attest to this. Data on WHAT the user is doing is a weak basis for understanding the WHY (motivations and context). Many of the use cases can be solved statistically, since humans act statistically rather than rationally (hence why machine learning often works), but user expectations are high. 

Our inherent tendency to perceive conversational interfaces as more sentient than they are can be either a liability or a stepping stone. For now, “Parrots make great pets. They have more personality than goldfish.” (Chevy Chase)