Why Enterprise Conversational AI is Ripe for a New Leader

The Enterprise Conversational AI market is fundamentally broken. According to a survey by Forrester Consulting in 2023, about 75% of consumers said that chatbots cannot provide them helpful answers, and 50% of consumers said they often feel frustrated in their interactions with chatbots. [Forbes 2023]  On the other hand, the demand is the highest it’s ever been with the Generative AI market size expected to reach $66.62 billion by the end of 2024 and expected to grow to $1.3 trillion by 2032. [Bloomberg 2023] The supply is also the highest it’s ever been, with around 67,200 AI companies recorded in 2024, more than doubling since 2017. [Tracxn 2024] Basic economic principles teach us that with supply and demand at such high thresholds, the frequency of transactions in this space should also be at an all-time high. Then, why is it that when you interact with enterprise brands via self-service, you’re still talking to the same clunky chatbot or dialing random numbers to get through the IVR prompts? These market dynamics illustrate the ripeness for a new leader, but one with a collection of talent and objectives unique to this market.

Conversational AI, often referred to as chatbots or virtual assistants, has promised many industries that the exorbitant costs of human-led support would decrease significantly through the automation of customer interactions. In the first generation of Natural Language applications, while trying to deliver on that promise, this industry took the problem it was trying to solve and reinserted it right back into the solution: Human support is expensive and not scalable. Building these semi-competent virtual assistants required so many data scientists, engineers, and designers that a return on investment was hardly realized, if ever. Considered one of the top-tier virtual assistants in the financial services market, Bank of America leveraged a team of over 100 technologists and spent two years before they initially launched Erica in 2018. [Tearsheet 2017] The Bank of America Erica team has since grown to 200 people and is considered one of the rare success stories in this industry — the return on their investment is not public knowledge but it’s understandable why other companies have apprehension that investing hundreds of millions of dollars to build a virtual assistant will net them a profitable return.

Before large language models (LLMs), the methods of training these virtual assistants were extremely data-intensive and far too cumbersome to scale the way needed for customer interactions. It’s not an accident that most chatbots in the market have buttons restricting you to their ‘happy paths’ instead of starting the conversation with “How can I help you.” LLMs have effectively changed how this market will function for the foreseeable future. Not only has the paradigm shifted in how we build these virtual assistants, but consumers’ expectations have only risen. If you are an enterprise, it will not be long before your customers start to wonder why they can use ChatGPT to plan their vacations, create recipes based on ingredients in their fridge, or write their high schooler’s term paper. However, they still have to call their utility company to update their address.

Most enterprise IT teams are enamored with the idea of building virtual assistants themselves, more recently demonstrated by those leveraging foundational models from Open AI or Anthropic. Time will tell how that turns out, but the data strongly suggests that buying software rather than building in-house is usually the safe bet. According to Mendix, 54% of projects exceed their original budgets by almost 200%, and organizations cancel 31% of projects. Additionally, 3Pillar Global reported that the failure rate of building is two in every three projects. As we enter May, ChatGPT has now been out for 1.5 years, and very minimal progress has been made in releasing experiences for direct-to-consumer self-service. The ones that have released experiences are grabbing headlines for the wrong reasons (i.e. Air Canada). As a result, many enterprise executives are timid to rush forward with this uncertainty in the market.

The latest trend in the enterprise world is leveraging Generative AI as a tool to assist agents. It seems ironic that the industry has collectively decided that since we cannot trust these models to be direct-to-customer, the next best option is to let our agents filter out the harmful and misleading information these models can produce. It feels like the “lazy” solution to the real problem, which is that we know these models are not 100% accurate. A solid argument against this approach is that it simply minimizes the financial impact the solution can have. If you could engage in a quick thought exercise to understand how agents will use this tool, it will confirm this perspective. Agents can ask this virtual assistant any questions they might have while interacting with customers. Unfortunately, any time the agent is unsure of the answer, they are required to verify that the output generated by the assistant is correct. For clarification, the agent only doesn’t have to verify the answer if they are absolutely certain the output is correct, in which case they wouldn’t need to ask the question in the first place. So if the agent only uses this virtual assistant when they are not certain of the answer, and then they are required to verify that answer from the source of truth every time, then how much time is the agent really saving? To substantiate these claims, scholars from the Stanford Digital Economy Lab within Stanford HAI and the Massachusetts Institute of Technology studied the impact of generative AI deployed at scale in the customer service sector at a call center. They found that access to AI assistance only increased agent productivity by 14%, with the most significant impact on less experienced workers. [Stanford HAI 2023]

If there’s anything I want you to take away from this blog is that waiting around for a model to get “good enough,” where the hallucinations are so infrequent enterprises will finally feel comfortable putting them in front of customers, is not a sound strategy. The companies creating these foundational models continue to release new versions, but the progress made toward lowering the number of hallucinations has been negligible, at best. And in reality, we need these models to be near perfect before they can be put in front of customers. The reason for that is how we as a society are wired. We are absolutely fine with humans getting into cars and causing accidents every day, but everyone is in uproar as soon as an autonomous vehicle causes one accident. Although the stakes are different, the fact that people are not okay with computers or machines making mistakes remains true. So if waiting for a foundational model to get “good enough” is not a sound strategy, what is the best path forward for these enterprises?

If you look at what the top-tier research papers tell us, LLMs need more reasoning in their outputs. To further prove that assertion, I will use computer vision as an example of how these models lack reasoning. Computer vision and LLMs have more in common than most people realize: both deep neural networks trained on billions of parameters of data that essentially allow the models to understand real-life patterns at scale. The issue is these models behave much more like a “knee-jerk” reaction to an input rather than developing a coherent, reasoned output. In computer vision, there are studies around how these models are able to subitize compared to humans [CVPR’15, Zhang et al.]. Subitization is a technique that is used to help elementary students understand numbers. It essentially is the ability to look at a small set of objects and instantly know how many there are without counting them. For example, if I showed you a die with two dots on it, you would instantly say there are two dots on there. You would not need to count the dots one by one, or use reasoning in any way to conclude that there are two dots. However, if I were to give you a die with nine dots on it, there is a better than not chance you wouldn’t be able to immediately recognize there are nine dots without using some type of reasoning. You may do it very quickly, but you will more than likely group the dots into sections (e.g. four and five) and then use reasoning to add those numbers together. What researchers initially noticed is that computer vision performs pretty terribly at this exercise compared to humans. Humans can essentially count to infinite numbers if they are not bound by time. However, what was later realized is that if you add a “layer of logic” to the model, it was able to perform orders of magnitude better [CVPR’23, Xu et al.]. This “layer of logic” can be thought of as human-created reasoning that allows the model to perform at a much higher level than without it. This idea of adding human-created reasoning to these models has recently taken form across numerous research papers in the NLP space [EMNLP’23, HK Poly., Feng et al]. The only issue is this reasoning or logic looks very different for every use case so it’s not something that can be generalized to these foundational models through the likes of prompt engineering.

Since foundational models are not (alone) going to solve direct-to-customer automation safely, the opportunity for a new category leader has presented itself. This leader must have the capabilities of a research lab, similar to universities or the Big Tech companies, to uniquely harness the power of these LLMs, but provide the control and explainability brands need for customer experience quality and accuracy. Uniquely wielding these capabilities into automation ease is not enough; however, the right leader will help orchestrate automation seamlessly into journeys and touchpoints for effortless customer experiences. This rare combination of dedication to research and focus on the customer experience will unlock an entirely different paradigm of how we build, manage, and scale direct-to-customer automation. If designed correctly, this paradigm will transform our traditional views of deploying virtual assistants with less technical reliance and instead allow business teams to act as supervisors of ‘AI building AI.’

While Big Tech has previously developed Conversational AI platforms for the enterprise, the majority of their efforts are moving towards building foundational models rather than applications. [Stanford HAI 2024] Most CXM application providers are integrating Generative AI to complement their core capabilities but not reimagining the approach to how Conversational AI solutions are built. This creates an excellent opportunity for purpose-built solutions that can deliver on the original promise of reducing the need for human-led support and offer a unique design with control and ease for business operators (i.e., those currently training and managing live agents). Knowbl’s Autonomous AI Platform allows these business operators to build, design, manage, and control automated direct-to-customer experiences for the Fortune 500 Enterprise market leveraging the latest research and development in the NLP domain. Our platform leverages LLMs in innovative ways that remove the difficulty and friction of creating virtual assistants that customers will find usable and enjoyable for the first time in this industry.

Guest post, written by: Matt Taylor, Co-Founder and Chief Product Officer of Knowbl

Knowbl is leading the transformation of Enterprise Conversational AI with a modern approach to leveraging LLMs for unsurpassed Speed, Ease, and Scalability.

To learn more about “Enterprise Conversational AI” go to: www.knowbl.com