General 4 — Nicole Soh

Designing credibility into AI systems.

UCL Interaction Centre

Designing credibility into AI systems.

My Role

Lead User Researcher

Designer

Front-end developer

Timeline

4 months

Context

UCL MSc. Dissertation Project

The problem space.

AI conversational agents are increasingly used in financial environments– e.g. robo-advisors, service chatbots. Companies relying on these agents would expect their customers to use these chatbots. For this to happen, it is important for their users to first trust the chatbot. HCI research find that trust, perceived credibility, and information-seeking go hand-in-hand. Users who don’t trust chatbots are likelier to find them less credible, and will seek further information to support their beliefs.

However, limited studies have tested how users trust and rely on chatbot recommendations, especially in environments such as financial trading scenarios. Testing the impact of a credibility feature is valuable in providing the design direction of chatbot features, because such features can influence user’s behaviour around it.

I formulated a research plan to test for a credibility-centric feature in a chatbot recommender system. This involved an A/B study conducted using a functional investment platform that allowed users to interact with a roboadvisor and make functional trades.

I aimed to evaluate:

How will users behave when presented with a chatbot interface embedded with credibility cues?

How can credibility cues be effectively designed into chatbot interfaces?

The findings.

Chatbots containing credibility cues were trusted less.

Users felt the chatbot violated their expectations. Many expected the chatbot to be more trustworthy due to the credibility cue, but when the chatbot’s recommendations did not come true, users’ trust decreased more so for these chatbots compared to chatbots that had no credibility cues.

The solutions.

Remind users to adjust their expectations

Chatbot recommendation systems should remind users that advice produced by the chatbot may not be 100% accurate. This allows users to adjust their expectations of the system.

Reduce menu diving for important information

Information that we want users to view should be presented upfront. In a cognitively demanding interface, users might not have the capacity to search for hidden information.

The research and design process.

1 User Research

To develop an effective credibility cue, research was done to understand how the credibility heuristic could be evoked in users that viewed it.

Being verified is a common association with credibility, with this being seen on many social media platforms. Using a confident tone to communicate product credibility and endorsing a product’s message are also found to increase users’ perceived credibility of the product.

2 Iterative Design

The trading interface used to test my proposed credibility feature

The credibility cue used in the A/B test

initial iterations

prototype coded for pilot testing

The feature requirements set out by the research defined how the credibility cue was to be developed.

As this feature was to be tested on a pre-existing trading interface, I modified the front-end code structure (Python, JS, HTML) to incorporate the feature into the interface. I also developed a database through back-end code to ensure that user data was tracked – investment behaviours made on the interface, and interaction with the credibility cue.

To ensure an efficient design flow, initial iterations were first developed as mock-ups, before a final iteration was coded to be pilot tested for feature functionality and user feedback.

I conducted a pilot test with 5 users to test that data was accurately captured in the back-end, and to assess for design viability. Feedback suggested that the credibility cue was lengthy, and could be shortened. It was also suggested that the message shown when the cue was clicked on should be more specific and contain information pertaining to financial sources.

Feedback was integrated into the final design that would be shown in the A/B test – I incorporated a visual logo to the credibility cue, and specified the endorsement messaging content.

The interactive messaging content used in the A/B test

3 User Testing & Insights

The user testing flow followed an A/B testing format. Participants were obtained through a user-testing platform– Prolific, and randomly stratified to ensure that the A/B test involved a fair and random distribution of participants. Once all users completed the study, I compiled user data from the database and ran it for quantitative analysis through python, excel; and qualitative analysis through FigJam.

Thematic Analysis Insights

A thematic analysis to understand how the chatbot could make users trust it more revealed 4 key elements that should be incorporated into its design.

Accuracy – users would trust chatbots that provided correct information.
Interaction – users would trust chatbots that were more interactive.
Understandability – users would trust comprehensible chatbots.
Explainable – users would trust chatbots that explained information provided.

Statistical Analysis Insights

Data insights about the differences in user behaviour across the A/B interfaces were revealed through correlational and t-test analyses. While differences between the A/B group were not statistically significant, the trends point towards:

When users have a lower trust in AI/chatbots, they may seek for more information from the credibility cue on the chatbot.
Users trusted the credibility-cue chatbot less – the recommendations provided by the credibility-cue chatbot was less likely to be followed, and overall trust was less for the credibility-cue chatbot.