As 2025 shapes up to be the year in which larger organizations finally get around to (thinking about maybe) adopting AI, trust is front and center. The relatively loud concerns about hallucinations from 2023 have quieted down, but the underlying fear that AI applications can’t be trusted to provide reliable, high-quality outputs has remained. With our focus at RunLLM on building an AI Support Engineer, that means we spend a significant amount of our time thinking about how we can optimize the process of building trust with customers. If our customers don’t trust us to represent their product well, they’re simply never going to use us in front of their customers.
“Trust” sounds conceptually simple — do you believe what the AI system is telling you or not? Unfortunately, it’s more complex than that. The most important nuance is that trust is highly contextual: You would trust the best chef in the world to tell you how to make a great lasagna, but you wouldn’t immediately trust their guidance for how you should fix your car engine. In other words, just because someone has your trust in one area, that trust shouldn’t blindly be translated elsewhere.
Because general-purpose LLMs have been trained on all the data available on the internet, we pretend that they know about everything. To an extent, that’s reasonable, but we’ve also learned that we should be a little skeptical of random unvetted facts that LLMs give us. As these models are applied to more and more fields, we should switch our mindset to think about AI applications more as specialists (like a world class chef or a world class mechanical engineer) rather than as generalists who know a little bit about everything.
That puts the burden on application developers to build trust with their customers. Trust today is difficult to build — because many users are default skeptical of AI — and easy to lose. To illustrate the contextual nature of trust here, an AI application for clinical medicine should (correctly) be dramatically more careful than an application that’s optimized for vibe coding. Neither is better or worse, but the context requires a different application design.
That said, across almost any application area, there are a few guiding principles that we think everyone should be following. These are things that we’ve learned from customer feedback over the past 2 years at RunLLM. The theme that unifies all these principles is humility; you should think about how you build a system that’s humble enough to know what it doesn’t know and learn quickly — in the same way that a good employee would on your team. So let’s dive into how you build trust.
First, know what you don’t know. It might sound silly, but we think that one of the most important features we have at RunLLM is our confidence in saying, “I don’t know.” Given how prevalent trust issues are, seeing an AI application say that it simply doesn’t have an answer or can’t do a task goes a long, long way towards building trust. Knowing your limits is a critical first step towards building trust.
Without getting too deep into the technical weeds, we believe you can’t rely on a single large LLM call — that simply leaves too much up to chance and increases the likelihood we provide an unreliable answer. Instead, we carefully analyze each data source we think might be relevant to a question; if we think that none of them actually help solve the problem, we never give ourselves a chance to produce a misleading answer. That confidence in “I don’t know” answers has meant that many of our customers use those answers as signposts for areas that need to be better documented.
Second, don’t make the same mistake twice. It’s okay to make a mistake — as humans, this is something that we all do regularly. When this becomes a problem for a person is when a person makes the same mistake over and over again. The same is true for AI systems; it’s okay to get an answer wrong or do a task poorly the first time. However, if you make the same mistake repeatedly, the feedback that your customers give you seems to be wasteful — you’re not able to get better.
While we generally felt like Devin’s AI Software Engineer wasn’t ready for production use, one of the features that we appreciated was its ability to incorporate feedback. When you correct Devin or ask it to make a stylistic change to the code it generated, it automatically identifies that as general guidance and adds that guidance to a shared knowledge base. When it uses that shared knowledge in a future piece of work, it notifies you so you knew what principles it was relying on and were aware that it was steadily getting better over time.
Finally, fail loudly. When you can’t do a task or make a mistake, you need to make sure that your customers know that. While it’s good to be able to say “I don’t know” and learn from your mistakes, we’ve found that AI systems induce a demand that didn’t previously exist for most products. That means that there’s a volume of data being generated that your users simply can’t be expected to sift through.
For example, RunLLM is now having thousands of conversations per-week for some of our largest customers. Simply saying “I don’t know” is not enough if we don’t give our customers the opportunity to help us learn from those interactions. Instead of failing silently, RunLLM can email your support team, start a Slack thread, or open a support ticket, so your team can help out. (And of course, we learn from the answers you give as well — so we don’t make the same mistake twice!)
As AI applications grow up, trust is going to be a critical unlock for getting into the enterprise. Trust is unfortunately multidimensional, which makes it difficult to build and easy to lose. While the principles above aren’t going to get you a 100% of the way there — it still depends on the details of your application area — we’re confident that this will be a strong baseline for any AI application to build on top of.
Good post! Giving examples of failed outputs or example synthetic training data you made to mitigate this would make the post even better