I put the Klarna chatbot through its paces

...and I was surprised by what I found

May 10, 2024

At the end of February, the buy-now-pay-later company Klarna made waves in the contact centre industry with a big announcement about their new chatbot. According to the Klarna press release, the bot is handling two-thirds of customer service requests, the equivalent work of 700 Customer Service Representatives.

Naturally, that big claim piqued my curiosity. So when it was time to buy some new clothes, I paid for them using Klarna and eagerly logged into the app to try out the bot.

Here’s what I found, including a personal grading for performance on a variety of questions.

Non-personal information retrieval: A

As I had hoped, the bot is very competent at providing non-personal information from an FAQ. One useful feature of a Large Language Model based bot is that it can go beyond answering the basic question and also provides answers to some possible follow-up questions. Responses can be quite verbose though, often stretching beyond one screen, so it can feel hard work to read everything.

Personal information retrieval: A+

Perfect score here. The bot provides answers that are accurate and relevant to my personal situation. Again, a little wordy, but everything the bot provides is useful information.

Completing actions: B

The bot does not seem to be able to perform any actions on its own, like changes to my account. It gets round this by offering personalised links to existing self-service functionality in the app. The experience is not too clunky but it would be even easier to be able to complete the whole action through a conversational interface.

Human on request: A

Bot is very efficient at offering a human agent on request, and provides a useful indication of wait time. Some companies might choose to have the bot offer to attempt to answer the customer’s question first before passing them to a human.

Passing sensitive questions to a human: A+

I tried a number of sensitive or potentially controversial questions on the bot and each time it politely suggested that I chat with a human specialist. In general the bot seems robust at avoiding hallucinations and not answering questions that it shouldn't.

Adding value: C

The bot is not able to offer me advice on future purchases, although this is possibly company policy rather than any technical limitations.

Handling questions out of scope: A+

I was really impressed by this. As Klarna is a payments service, customers are likely to ask questions that can only be answered by the merchant that made the sale. The bot gives instructions on how to contact the right customer service team, and even offers an escalation option if I don't receive the support I expect from the merchant.

Parsing ambiguous requests: A+

The bot did not get confused when I used words that have a different meaning in another context, like “pay”. One big advantage of LLM bots is that they are parsing the whole question at once, not just trying to match individual keywords or phrases.

Parsing multiple intents: B-

The bot is able to pick up on multiple intents in the same question, but does not always answer these fully. For example, in the above screenshot I mention in passing that I am moving house but the bot does not instruct me on how to change my address in the Klarna app.

Context persistence: C (but grade A for smart design to get around this limitation)

The bot didn't seem to remember much between questions and provided a more generic answer if I asked a short follow-up question. However, there is some clever design going on here: at the start of the chat you are asked to click on a specific purchase or payment that you are asking about, and that fixes the context for the rest of the conversation, preventing the bot from getting confused. This means you need to manually switch context if you want to ask about another transaction, but at least the UI makes this relatively easy.

Speed: C

Some of the answers were noticeably slow to arrive: up to 10 seconds before the bot started typing, slower than we have become used to with ChatGPT. I don’t know for sure, but this might indicate some level of human-in-the-loop checking of answers before they get sent out.

Language and style: B

This is definitely an LLM, in that it comes across as helpful and keen to please, but also a little too verbose. Some of the answers took up more than a single screen on my phone and would have been easier to follow had they been shorter.

Poetry: Ungraded

I tried, but could not persuade the Klarna bot to write poetry.

Conclusion: no miracles, just smart design leading to a great experience

Overall, I was impressed by the Klarna bot. It is clearly putting the new features of LLM-based chatbots to good use and was generally simple and effective. For anyone looking to emulate this success, there are three important design lessons to take away:

The bot can only point customers towards actions that are already available as self-serve processes in the app. It makes these self-service choices more accessible, but for a bot like this to work effectively, companies need to have good self-service capabilities already set up.
The bot requires customers to set the context for the conversation up front. This helps to control the outputs and prevents the bot from getting confused.
In cases where the question is out of the bot’s scope, it provides clear and simple links to the place where the customer can get help - and provides an option for the customer to feed back if the linked solution does not work

Generative AI could soon decimate the call center industry, says CEO
Intradiem awarded U.S. patent for burnout and attrition indicator
Customer-obsessed companies grow 28% faster by doing four things
German police hail takedown of Europe's "largest" call center scam network in operation Pandora

Latest perspectives from BCG

Deepfakes: the race is on

As deepfakes become more common, traditional detection methods are becoming less reliable
Raising public awareness about the existence and capabilities of deepfakes, deploying new tactics like digital watermarking, and training detection algorithms are just a few of the many ways organizations can help minimize risks through proactive measures
Find out more from BCG X’s Sylvain Duranton