How to talk to someone who doesn't trust AI

and

Apr 25, 2024

If you’re like us, you spend a lot of time thinking about AI — both in the context of your work and also in how it can improve your personal life. What that means is that you probably also think about what form factors AI products will take — and you probably have opinions about whether open LLMs are a good idea. If you’re like us, most of your coworkers and your friends are also excited about Llama-3. What that means is that if you’re like us, you live in a bit of a bubble.

At RunLLM, we spend a lot of our time now talking to people who don’t live in this bubble. These are almost all very technical people, but surprisingly most of these people don’t yet trust AI-powered tools as a category. They’re skeptical of how valuable the technology will be and whether LLMs will be valuable for their workflows.

Naturally, we disagree! But in order to convince people otherwise, we have to understand why they think what they think. Of course, every reason doesn’t apply to every person — and there are definitely more reasons than we cover here — but these are the biggest trends we’ve noticed.

“It’s a great demo, but it doesn’t actually work.” This is one of the most common reasons — and there are some good reasons for the skepticism. Usually, this opinion is driven by an understanding of pre-LLM machine learning techniques, where models were narrowly trained to accomplish a particular task rather than to be general-purpose. Relying on those priors, it seems unlikely that an LLM really can coherently speak on a wide variety of topics, especially when given good data. Most often, this also means that the person hasn’t spent the time to push the bounds of the models themselves and to understand what they are or aren’t capable of.

Counterargument: The first barrier is to get these folks to try using an LLM — any LLM, but probably ChatGPT. Almost always, they’ll be pleasantly surprised by what it can do. Everything from the fact that it can recite random general knowledge to the fact that it can tell you when you’re wrong gives them a sense of what LLMs can accomplish. The goal is to get them to experiment with what’s possible.

“Look, it can’t answer my question!” This is a common objection we hear from skeptics, especially in the context of evaluating our product. It’s usually meant as a gotcha. The skeptic is trying to prove that if the technology was all that it was cracked up to be, it should be able to deal with every question it sees — no matter how obscure or vague the question might be. Because it can’t answer this particular question that I concocted, then the whole AI must be empty hype.

Counterargument: This is a misunderstanding of the core technology. Of course, LLMs have limitations. Identifying one of them is not surprising, and we aren’t here to defend every mistake LLMs make. This is sort of like pointing at Salesforce and complaining, “It didn’t close my deal for me!”

In fact, when we build good LLM products, we account for these shortcomings, and we put guardrails on the application — the fact that an LLM gracefully bows out of answering a question is much better than if it were to hallucinate. Nonetheless, at the end of the day, models are probabilistic tools that will occasionally miss and make mistakes. Just like a human, one mistake doesn’t reduce the value of the other work an LLM can do.

“Every response from a chatbot looks the same.” This critique usually comes later, and we have to admit that there’s some validity to it. It’s easy to tell if an LLM gets the first president of the US wrong, but once you get into more nuanced topics, it’s genuinely difficult to know whether the nuances of the LLM’s response are real facts or hallucinated nonsense. This is exacerbated by the fact that most LLMs are extremely verbose. Most of the words in the response look more or less the same, so how are you supposed to know the difference?

Counterargument: This is a genuinely difficult objection to handle. Most LLMs and most LLM-based products do really have generic-seeming answers. Even if the quality of one vs. the other is better, it’s hard to tell — and some folks might not care. There is no general-purpose solution to this problem — every AI-powered product needs to focus on customizing its answers based on its users’ priorities and giving them control to customize what they see. The easiest thing to do is to reduce verboseness and increase answer quality — but that’s easier said than done. It takes time, and you’ll also need to pair it with thoughtful product UX.

The objections we’re talking about here are the ones we hear most often, but as with faster horses, they don’t quite get to the underlying desire. Today, evaluating LLMs and LLM products means mostly relying on spot checking answer to see if they are correct.

Converting skeptics means moving from vibes-based evals to empirically understanding the value LLMs are creating — which means empirical evaluations. That means we need to find better ways to quantify model and application quality as well. This will be critical to converting the staunchest of the skeptics. After all, if we can’t quantify the impact of what we’re building, how do we expect to close deals?

The AI Frontier

Discussion about this post