AI for CX 101: Conversational AI Metrics that Matter

Written by Shail Gupta on Aug 10, 2022

Metrics are critical in order to gauge the performance of both support teams and the technology solutions behind them in any project. In our customer case studies, we frequently talk about milestones such as our customers reaching deflection rates of X%, yet what do these terms really mean? As you progress with your conversational AI journey, which key customer experience metrics can you see improve with AI? In this second iteration of our ‘AI for CX 101’ series (Part 1 covers the initial steps in deploying an AI solution), we dive into the key AI-centered metrics to hone in on, to help you hit your CX and AI targets with precision.

The Top 6 Conversational AI Metrics that Matter

1. Deflection Rate

Deflection rate refers to the percentage of customer support requests that are resolved by AI, those that would otherwise be serviced by agents. This includes both:

Full deflection – In such cases, a customer receives a response from the AI, such as details on a hotel’s cancellation policies, thus obtaining a full resolution to their query (case, closed)!

Assisted deflections – Although the AI is able to identify the scope and intent of the customer query, depending on the use case, it may not be authorized to respond with a complete answer. This is an instance in which the AI and human agents work together, in ‘co-pilot’ mode. For instance, for an item that is being returned, the AI may not be authorized to cancel an order and issue a full refund to a customer. In this case, the customer may be asked additional questions, such as the date of purchase. Acting as an assistant who is conducting research prior to a presentation, the AI is gathering key pieces of data to arm the human support agent with to eventually take over. So, even though it may not be capable of fully deflecting the customer query, the AI is still significantly moving the needle, saving valuable agent time, and ensuring that the agent is armed with full context.

Why it matters:

The deflection rate metric has its benefits, as it provides an indication of the impact that your AI solution is having on your customer experience functions – both your support agents and ultimately, your customers.


CSAT score measures customers’ sentiment towards your product, service or a specific interaction. It is important to separate the CSAT into two parts, to give credit where credit is due: the CSAT score that is attributed to the AI should encompass the AI-only use cases that took place without human intervention, while, if a human agent handled the ticket, the score should be attributed to them. What constitutes a satisfactory CSAT score? “While 60% is a good starting base for CSAT ranges to begin, a score near 80% is the “holy grail,” Partho Nath, Netomi’s Head of Applied AI, noted.

Why it matters:

Key to customer retention, the CSAT score can provide insights into where and when your company is at risk of losing customers. If a customer provides a negative CSAT score because their query was not sufficiently addressed, this is an opportunity for businesses to refine their workflow to include additional customer use cases, and identify areas for improvement. Another consideration is the context of the issue itself. For instance, is a customer providing a negative score because they don’t agree with a certain business policy or an item arrived damaged, or are they basing their feedback on the support interaction?

3. Classification Rate

This refers to the volume of tickets understood by the AI – how many conversations, chats or emails did the AI understand and therefore could make an attempt at responding to?

An AI that is “well-behaved” (in the words of Partho) will not attempt to answer each and every query that comes its way, thus potentially sending the customer down the wrong path. Rather, it will first make a judgment call on whether the conversation is one in which it can confidently identify and map to one of the business workflows. Our Conversational AI Benchmarking Report revealed that Netomi has the highest accuracy in comparison to other AI platforms, meaning that the AI is responding accurately, thus causing less user frustration than if it provided a response that was incorrect or irrelevant. Netomi was also found to have the highest out-of-scope accuracy, meaning that it understands which topics it has not been trained on and follows the appropriate behavior (such as escalating a customer query to a human agent or directing them to another channel).

Why it matters:

The classification rate measures the total volume of tickets handled by your AI, and it indicates if there are additional topics or use cases that can be brought under the purview of the AI, those that it may not be authorized to currently handle. For instance, perhaps the AI has not been trained on issuing refunds, yet, as this is becoming a fairly common query asked by your customers, perhaps it is time to consider bringing this use case under its scope and focus.

4. Goal Completion Rate (GCR)

Oftentimes, users may exit a support conversation, perhaps if they do not have a specific detail of their query right in front of them, or if something else interferes with the task. Yet how many users successfully completed the journey, completing the goals you originally set for your chatbot to meet? For instance, for an airline, this could involve changing a flight, and for a retailer, finding an order status.

Why it matters:

To provide an indication of whether a customer ended their journey successfully, Netomi’s AI platform asks for feedback. If a customer remained until the end of the workflow, we can safely assume that they have been successfully helped. Focusing on this metric can provide some surface-level insights, which can warrant further digging: why didn’t the user complete their support journey? Were there any moments of friction along the way? How can these be removed?

While not directly related to the performance of the AI itself, these final two metrics pertain to the capacity of support teams, and most importantly, the time it takes for the resolution of customer queries.

5. Average Resolution Time (ART), also known as Average Handling Time (AHT) 

This is the average time it takes an agent to resolve a customer conversation. Here, you can identify potential areas for improvement and reduce resolution, such as making enhancements to your company’s knowledge base to make it easier for customers to utilize options for self-service, or removing any operational efficiencies that impact support agents performing their jobs. This also ties into smart conversational design, and reimagining the resolution paths for certain queries. By looking at the average resolution time per topic, you might be able to identify ways to streamline the journey. For instance, if a customer wishes to cancel a flight, they could be asked whether or not they purchased a refundable ticket, in order to cut down on the amount of text sent by the AI.

6. Average First Response Time (FRT) 

This refers to the amount of time that has elapsed between a customer raising a ticket and an agent first responding to it. How long does it take for a company to provide an initial response to a ticket? Research has found that the average first response time is 12h 10m, yet 75% of customers expect it within five minutes! By looking at the Average FRT, the customer service team can see how this decreases overall because the chatbot is adding value.

As AI has 24X7 availability, working around the clock outside of standard working hours, it significantly lowers the FRT, making that five-minute window quite feasible.