Designing credibility into AI systems.

UCL Interaction Centre

Feature Testing

Feature Deployment

A/B Testing

Design Recommendations

My Role

Lead User Researcher

Designer

Front-end developer

Timeline

4 months

Context

UCL Interaction Centre Project

The problem space.

AI conversational agents are increasingly used in financial environments– e.g. robo-advisors, service chatbots. Companies relying on these agents would expect their customers to use these chatbots. For this to happen, it is important for their users to first trust the chatbot. HCI research find that trust, perceived credibility, and information-seeking go hand-in-hand. Users who don’t trust chatbots are likelier to find them less credible, and will seek further information to support their beliefs.

With the rise of generative AI use in FinTech products, there are risks for errors that need to be addressed by FinTech organizations employing these models: hallucinatory/false responses by genAI given to their clients or users. This led me to question whether there was a way to design FinTech interfaces in a manner which reassured users who used them– by establishing surface credibility of these agents.

Testing the impact of a credibility feature is valuable in providing the design direction of chatbot/genAI interface features, because such features can influence user’s behaviour around it.

From here, I formulated a research plan to test for a credibility-centric feature in a chatbot recommender system. This involved an A/B study conducted using a functional investment platform that allowed users to interact with a roboadvisor and make functional trades.

I aimed to evaluate:

How will users behave when presented with a chatbot interface embedded with credibility cues?

How can credibility cues be effectively designed into chatbot interfaces?

The findings.

Chatbots containing credibility cues were trusted less.

Users felt the chatbot violated their expectations. Many expected the chatbot to be more trustworthy due to the credibility cue, but when the chatbot’s recommendations did not come true, users’ trust decreased more so for these chatbots compared to chatbots that had no credibility cues.

The solutions.

Remind users to adjust their expectations

Chatbot recommendation systems should remind users that advice produced by the chatbot may not be 100% accurate. This allows users to adjust their expectations of the system.

Reduce menu diving for important information

Information that we want users to view should be presented upfront. In a cognitively demanding interface, users might not have the capacity to search for hidden information.

The research and design process.

1 User Research

To develop an effective credibility cue, research was done to understand how the credibility heuristic could be evoked in users that viewed it.

Being verified is a common association with credibility, with this being seen on many social media platforms. Using a confident tone to communicate product credibility and endorsing a product’s message are also found to increase users’ perceived credibility of the product.

2 Iterative Design

The trading interface used to test my proposed credibility feature

initial iterations

prototype coded for pilot testing

The feature requirements set out by the research defined how the credibility cue was to be developed.

As this feature was to be tested on a pre-existing trading interface, I modified the front-end code structure (Python, JS, HTML) to incorporate the feature into the interface. I also developed a database through back-end code to ensure that user data was tracked – investment behaviours made on the interface, and interaction with the credibility cue.

To ensure an efficient design flow, initial iterations were first developed as mock-ups, before a final iteration was coded to be pilot tested for feature functionality and user feedback.

I conducted a pilot test with 5 users to test that data was accurately captured in the back-end, and to assess for design viability. Feedback suggested that the credibility cue was lengthy, and could be shortened. It was also suggested that the message shown when the cue was clicked on should be more specific and contain information pertaining to financial sources.

Feedback was integrated into the final design that would be shown in the A/B test – I incorporated a visual logo to the credibility cue, and specified the endorsement messaging content.

The credibility cue used in the A/B test

The interactive messaging content used in the A/B test

3 User Testing & Insights

The user testing flow followed an A/B testing format. Participants were obtained through a user-testing platform– Prolific, and randomly stratified to ensure that the A/B test involved a fair and random distribution of participants. Once all users completed the study, I compiled user data from the database and ran it for quantitative analysis through python, excel; and qualitative analysis through FigJam.

Thematic Analysis Insights

A thematic analysis to understand how the chatbot could make users trust it more revealed 4 key elements that should be incorporated into its design.

  1. Accuracy – users would trust chatbots that provided correct information.

  2. Interaction – users would trust chatbots that were more interactive.

  3. Understandability – users would trust comprehensible chatbots.

  4. Explainable – users would trust chatbots that explained information provided.

Statistical Analysis Insights

Data insights about the differences in user behaviour across the A/B interfaces were revealed through correlational and t-test analyses. While differences between the A/B group were not statistically significant, the trends point towards:

  • When users have a lower trust in AI/chatbots, they may seek for more information from the credibility cue on the chatbot.

    • This means that to foster user trust in a chatbot system, the interface should allow users to interact with the chatbot and gain more information if they wish to.

  • However, not a lot of users interacted with the credibility cue even when given the chance to.

    • This means that any key information that should be known to users should be presented at-first-opportunity, to quickly establish a simple form of credibility and trust.

  • Users trusted the credibility-cue chatbot less – the recommendations provided by the credibility-cue chatbot was less likely to be followed, and overall trust was less for the credibility-cue chatbot.

    • This means that designing for credibility should carefully consider how it influences user trust and expectations of functionality by the chatbot.

Reflections

Understand Technical Feasibility.

Developing and implementing both front-end and back-end code for the project taught me that there are limitations to how product requirements translate to code. Checking in with engineering teams before executing a product test can save valuable time.

Define KPIs Early.

I learned the importance of defining Key Performance Indicators early in the product development cycle, because this allows for appropriate databases to be developed for the user testing process.

Keep It Simple!

The research can be complex, but keep the defining aims of it simple. This means that hypotheses laid out should be clear, easy to test and can be translated into real world actions. I had to revise my hypotheses to ensure that they could be tested and KPIs could be attributed to the differences in A/B tests.

Want to read the full project?