4 Lessons Learned from Leading the Industry in Customer-Facing LLM-Powered AI Assistants

As businesses strive to enhance user experiences and streamline customer interactions, deploying advanced LLM assistants to automate customer interactions has great potential. It’s something we’ve learned a thing or two (or four) about as the vendor that’s launched the most LLM-powered AI assistants in the market to date. In this article, I’ll share four invaluable lessons we’ve learned as a business.

Lesson 1: Scope out response classification and agent escalation.

One of the initial decisions that significantly impacts the performance of a customer-facing LLM assistant is the design guardrails for what and how the AI Assistant should respond to customers. To maximize efficiency, it is crucial to scope out the use case and establish clear guidelines for agent escalation upfront.

Businesses must decide whether they aim to automate responses, escalate queries to human agents, or find a balance between the two. While automation offers efficiency and speed, certain complex or sensitive queries may require human intervention.

When it comes to customer interactions, having a well-defined human escalation path ensures that issues are resolved promptly and to the customer’s satisfaction. Identifying scenarios where escalation is necessary helps in maintaining a positive customer experience, particularly in situations where the assistant might struggle to provide accurate responses.

Which responses will be generative vs. static?

Choosing between generative and static responses is another critical decision here. While you can have both generative and static responses within the same Assistant, it’s important to know the capabilities and limitations of both response types.

Generative responses allow for very personalized answers to complex multi-part questions, but since they are AI generated they have the potential, albeit very small, to contain inaccuracies. Static responses, on the other hand, are pre-defined and give absolute certainty of the answer a customer will receive. We recommend our clients choose AI generation for most topics because the answer is tailored to specifically answer a customer’s question and rely on static responses for sensitive topics like policies and legal terms.

The balance between automation and human intervention, as well as generative and static responses, is a delicate one. Striking the right equilibrium is key to developing a customer-facing LLM assistant that enhances user experience and meets business objectives.

Lesson 2: Build in testing from the beginning.

Quiq’s AI Studio includes the ability to build test sets which are a compilation of customer utterances and the expected Assistant behavior. Building a test set early in the development process is fundamental to the success of a customer-facing LLM assistant. Test sets serve as the foundation for regression testing, allowing developers to identify and rectify issues promptly. The ability to replay events and experiment with prompts enables a more iterative and feedback-driven development process.

Building a test set early in the development phase helps in identifying potential that arise while refining the Assistant’s performance over time. It provides a benchmark for evaluating the accuracy and effectiveness of the Assistant’s responses across various scenarios.

The iterative nature of LLM development necessitates continuous testing. Regression testing allows developers to identify any regression in quality, ensuring that new updates or changes do not adversely affect the existing functionality. A solid test set contributes to the overall stability and reliability of the AI assistant.

Identifying and retesting elements

In the evolving landscape of customer interactions, it is essential to swiftly pinpoint and refine areas in the LLM assistant’s responses that require enhancements and adjustments.

Whether it’s a misclassification or an inability to understand certain queries, being able to identify and iterate in these areas ensures a more agile and responsive development process. Having a robust test set will allow developers to verify that a new change doesn’t break any existing functionality.

Lesson 3: Know what you want to measure so you can build measurement tactics.

Understanding the outcomes you want to achieve is fundamental because it allows you to assess the assistant’s performance and make informed decisions for improvement. There are many different measures for success. Is the goal to resolve as many questions as possible? That implies that agent escalation is available only as a last resort. But that could also have an impact on CSAT. Maybe there is also a revenue conversion goal to transfer to an agent when a customer indicates an interest in purchasing?

By defining success goals before building the AI, developers can make decisions during AI development that will optimize competing goals and build in measurements to monitor performance.

It is important to measure not only what the AI Assistant does, but it is also important to measure what it doesn’t do. For instance measuring is the performance of the AI when asked about a sensitive topic. It’s essential to gauge how well the LLM handles questions that are out of scope or touch on sensitive topics. Monitoring the number of instances where users trigger sensitive topic classifiers provides valuable insights into whether too many or too few utterances are being treated as sensitive.

Lesson 4: Don’t try to solve the long tail edge cases at the expense of introducing hallucinations.

You definitely want to solve for the most common use cases first, since these are where the majority of your customer interactions will be and resolving their issues efficiently is the biggest win.

However, don’t neglect the long tail of edge cases.

These edge cases can encompass a variety of scenarios that may not be encountered frequently but are crucial to providing a comprehensive and satisfactory user experience.

For instance, in a hackathon or rapid development scenario, it’s feasible to build 80% of an assistant quickly. But, the remaining 20%—often the most challenging part of refining an LLM-powered AI assistant—requires meticulous attention. This is especially true for the final 5%, which can be exceptionally difficult to fine-tune to prevent hallucinations.

Also recognize that there is a balance whereby you introduce more pathways for hallucinations if you solve for every possible answer. The art of AI Assistant development is knowing when to stop trying to answer more questions because doing so decreases the overall AI performance.

By the way, when you are troubleshooting, focus on injecting knowledge or refining existing knowledge t first instead of changing prompts. Balancing the use of prompts with a robust knowledge base ensures that the assistant provides accurate and contextually relevant responses. Plus, it’ll help you fill content gaps.

Final thoughts on building LLM-powered AI assistants.

As you can see, we’ve learned a lot at Quiq by deploying the most customer-facing LLM-powered AI assistants in the market. For the process to be successful, it demands a strategic approach that encompasses thoughtful response classification, meticulous test set development, and a clear focus on defining and measuring outcomes.

In the quest for efficiency and accuracy, businesses must remain vigilant against the potential pitfalls of hallucinations, carefully fine-tuning their AI assistants to provide reliable and precise responses.

Furthermore, recognizing the role of prompting and knowledge injection as complementary tools in the development process ensures that the AI assistant continues to learn and adapt without compromising its foundational knowledge base.

Guest post written by Mike Myer, CEO and founder of Quiq.

Quiq will be joining us at the Execs In The Know Customer Response Response Summit (CRS) on March 12-15, 2024. Learn more about CRS Tucson here.