Measures of Success: Our Top 10 Chatbot Evaluation Metrics

Written by Amy Wallace on Dec 23, 2021

As with any large project or undertaking, metrics are key in order to gauge performance. Are your current processes adequate enough, or is there room for improvement? Metrics are important to track for all aspects of a business, from sales, marketing and financial performance, to overall customer service. When it comes to improving the automation processes of customer service, chatbot evaluation metrics are also critical to track. They’re critical to track not only for the sake of measuring progress, but also for knowing how it’s impacting your customer experience.

Here, we break down the top 10 chatbot evaluation metrics to have on your radar. 

These metrics fall under two categories: measuring your users and measuring the conversions that take place. As chatbots are in place to help your customers, a first-rate user experience is the ultimate aim. So, it makes sense to start by looking at your user base. What is the overall user experience like? How are they interacting with the chatbot? As all significantly impact the user experience, it is important to analyze:

  1. Conversation Length: It is one thing that users interact with the chatbot, but whether they remain engaged is another part of the story. What is the length of interaction between your chatbot and its users? If increased efficiency is one of your goals, this may be a key metric on which to concentrate. This metric varies from case to case, and is dependent upon the situation. For marketing chatbots focused on user engagement, longer conversation time would be viewed as positive. Yet for support chatbots aimed at resolving issues and queries in a timely manner, seeing a decrease in conversation length over time is favorable.  
  2. Goal Completion Rate (GCR): How often are users receiving the adequate information that they need? This metric tracks the percentage of those users who reach the goal that the chatbot was designed to accomplish, whether that may be scheduling an appointment, resolving an issue or clicking on a CTA button1
  3. Bounce Rate: Similar to users navigating away after viewing only one page on a website, this metric indicates how useful customers found the chatbot to be. Who is leaving, at what point in the conversation are they dropping off, and why? If a customer ends a session prematurely upon finding it not of use to them, for instance, this would constitute a high bounce rate.  
  4. New Users: Growth in new users is a strong indication that your chatbot is engaging and performing well. During the COVID-19 pandemic, for instance, airline WestJet saw a 45X spike in new users and customer requests, with a slew of questions regarding cancellations and flying restrictions. Leveraging Netomi’s AI-powered chatbot, Juliet, the company was able to scale its support team and deflect tens of thousands of calls from human agents. 
  5. Return Users: Aside from new users, are there customers enjoying it enough and finding it useful enough to come back?
  6. Customer Satisfaction: How useful or helpful was the chatbot, in the eyes of the customer? Sent immediately following an interaction, customer satisfaction (CSAT) surveys allow you to understand if you’re meeting customer expectations, enabling you to pinpoint where things go wrong in the customer journey and optimize the experience as necessary. Such customer service tools allow customers to feel appreciated and that their voices and concerns are being heard.
  7. Escalation Rate: There are some situations in which a chatbot will need to escalate a conversation to a human agent. This could be for several reasons: it does not understand a user’s input, has not been trained on a specific topic or a user explicitly asks for a human agent. If experiencing a high escalation rate, it might be worth conducting additional training and increasing the chatbot’s scope to deliver a boost in performance. 
  8. Out of Scope Accuracy or True Negative Rate: How can you improve the accuracy of your chatbot? In conversational AI, accuracy relates to how often the chatbot provides the correct response. Out-of-scope accuracy, on the other hand, recognizes when the chatbot understands which topics it is not trained on and follows the appropriate behavior (i.e., escalating to a human agent or directing to another channel) rather than guessing a response.
  9. Message Volume: How many lines of chat go between the user and the chatbot? Examining the volume of messages sent back and forth helps you to determine how many questions your chatbot needs to be asked before it can provide users with the necessary information2. In customer service, you want to reduce effort for your customers, so you can use this metric to identify topics that have high message volume and try to simplify the flow as much as possible. 
  10. Number of Conversations: When a user is successfully serviced by a chatbot without any human intervention taking place, this is considered to be a completed chatbot conversation. Looking at the end result – how many queries were resolved successfully – is key. 

Related content: Hear how support teams benefit from smart automation in this webinar featuring Mike McCarron from Gladly.

Chatbot Evaluation Metrics: Pick a Keystone Metric

While the above metrics are simply suggestions, and tracking all of them is not necessary, it is up to you to choose the one that matters the most, based on your business goals. This list of metrics also provides an overview of what changes you can make to the parts of the chatbot experience that might require some further finessing. Tracking various metrics, and continuing to monitor them over time, will help to paint a comprehensive picture of overall chatbot success. 

Curious to dive into the world of chatbots? Join us for a free chatbot consultation to see Netomi in action. We’ll show you the out-of-the-box tools you can use to measure and optimize your AI chatbot over time, how you can scale your support across each and every channel, and more!