Building AI applications you can trust

Trust is the key to a good user experience

and

Apr 10, 2025

We’ve discussed many times on this blog that we think the UX around AI applications is under-discussed but critical for long-term adoption and defensibility. The under-discussed aspect is starting to change: We’ve been hearing more and more people discuss how more people need to discuss the UX of AI applications. We love that everyone agrees with us, but most conversations (including our past posts!) have focused on saying that more needs to be done when it comes to the UX… without actually saying what needs to be done. Classic startup advice!

We’ve been struggling with this lack of specificity, but having grappled with it for a while, we’re starting to formulate a hypothesis: control and visibility are the two main aspects of UX that AI applications should be building towards. Before diving in, it’s worth saying that this is something that we currently don’t do well in RunLLM. It’s something that we’re just now starting to actively work on. Usually, our observations come from having tried things and observing what worked (and what didn’t). In this case, we’ve identified these as key missing pieces in how RunLLM works today — so we thought it would be worth sharing even though we haven’t nailed our solutions yet.

To be precise, what we mean by control is the ability to give an AI system fine-grained instructions and trust that they will be executed on, and what we mean by visibility is the ability to inspect the specific steps an AI system is taking in order to solve a problem.

There are two reasons these principles matter. First, most humans are (rightfully) skeptical of the outputs of AI applications. Being able to see what the system is doing along the way helps users understand the “thinking” of the system and trust the outputs more confidently. Second, AI systems will inevitably make mistakes, which can quickly erode trust if you’re not careful. Giving users the control to resolve issues and make proactive improvements is critical to maintaining trust.

Case Study: Devin vs. Cursor

The benefits of control and visibility are best highlighted by the fact that Cursor has become universally beloved while the early takes on Devin have been filled with frustration and laments about missed opportunities. Both products are of course oriented at making software engineers more productive by automating the tedious parts of the job. The main difference of course is the level of abstraction: Devin tries to tackle whole large tasks while Cursor operates on a much finer granularity.

We’re not going to recap all the critiques of Devin here — the previous posts we linked do that in detail. On the surface, you might think Devin gives you visibility into what’s happening. When you log into the Devin console, you can see in real time what files Devin is reading and editing and what the current state of its plan is. However, even for moderately complex tasks, this might mean planning to make changes in 10s of files, which is more than a person can grasp by skimming the screen. That means that you end up without a full understanding of what’s going to be done — even more critically, oftentimes, you’re not able to grasp what Devin missed (see the example about database schema changes in our last post). When Devin makes a mistake, giving it feedback leads to a very long cycle of iteration & improvements, which reduces the sense of control. The one thing Devin does well is track what feedback you gave it and tell you when it’s using those rules.

By focusing on a much finer granularity of task, Cursor changes this experience completely. By its very nature, Cursor doesn’t encourage you to give it large tasks — instead, you give it bite-sized updates to your codebase. This reduces your information overload. It then immediately tells you in plain text what changes it’s going to make and then highlights which code snippets it’s going to update. You can accept each change individually or all of them at once. Cursor Rules also give you a sense of control, and again, Cursor tells you when it’s using specific rules.

What this ultimately comes down to is how frustrated you get when each one of these systems makes a mistake. With Devin, you can potentially figure out why but have to dig through a lot of data, and when you provide feedback, you’re not sure whether it’ll be addressed. With Cursor, the cycle time is small and the responsiveness is high, which gives users a much stronger sense of visibility and control.

How RunLLM stacks up

As we said above, we are constantly working to make RunLLM’s UX better align with these principles. Our biggest gap area is in visibility. Under the hood, RunLLM does a lot of work — analyze each question, read documentation, classify its relevance, etc. — but that work is totally opaque to both the user asking the question and the customers we serve. Because there isn’t visibility into that process, it’s difficult for them to understand why we made the mistake we did. Oftentimes, when we show system internal telemetry, they find that a stray piece of documentation was the culprit. RunLLM then takes every conversation and processes it for potential insights (topics, documentation issues, use cases), but the connections between conversations and the insights we generate today is also currently hidden.

We do a little better at control; when we get an answer wrong, the user can immediately correct us, and we make sure that we don’t make the same mistake twice.

This most viscerally reduces the immediate frustration when an answer is wrong, but there’s a lot more we can do (e.g., giving our customers fine-grained control over which citations to include or what topics to discuss and avoid).

Why control and visibility

We’re harsh critics of ourselves, and we think we (along with everyone else) have a long way to go. Even Cursor, with all of its positive traits, can improve in giving users more visibility into and control over how it interacts with larger codebases and reducing the possibility of Cursor Agent repeatedly making bad changes. That said, we’ve found it very useful to break down our problem space along these two axes — so much so that control and visibility are the two main driving factors behind how we’re thinking about the next revision of our product.

Ultimately, control and visibility are helpful is because you need to think about how a customer manages your product. With the rise of AI work, buying an AI application is like hiring a team member. If you have someone on your team who’s running off and doing a bunch of work without any visibility about what’s happening and doesn’t address your feedback, you’d be pretty frustrated.

On the other hand, if you have a coworker who tells you what’s going on, shows you how they did their work while you’re getting to know them, and addresses your feedback quickly, you’d be pretty happy. We’re still learning, but that’s the direction we’re headed in.

The AI Frontier

Discussion about this post