The promise of AI is obviously automation — completing the tedious tasks that humans don’t want to do or speeding up tasks that humans are naturally slower at doing. When built well, AI systems will both save time for people and also improve top-line metrics like revenue generated, customer satisfaction, job satisfaction, and NPS. It’s in this context that we’ve been advocating for thinking about the types of work that AI products will do for the last couple of weeks.
While we’re obviously bullish on the prospect of AI automating existing tedious work, there will absolutely be edge cases — knowledge that wasn’t written down anywhere or something that a person intuited or simply just an area where the model can’t reason over the data it has. In those cases, the AI system will obviously need to fall back to a human.
Products that intend to operate in full autopilot mode without any affordances for falling back to a person will not handle these cases well, which is both dangerous for individual products and bad for the perception of AI in general. When a workflow has known edge cases, it’s imperative that there are well-defined integration points for a human to jump in. Otherwise, enterprises will be hesitant to hand workflows over to AI systems.
Why you should care
For one thing, it’s a critical part of how customers are going to evaluate LLM products. This might change a little over the next few years as we build better discipline in evaluations, but vibes-based evaluations aren’t going anywhere. Asking simple questions might be how a potential customer starts testing the product, but it’s certainly not going to be sufficient, and people will naturally poke at the corner cases to see how the product will respond. If the responses aren’t graceful, it’ll leave a bad taste in everyone’s mouth.
Even more importantly, it’s critical to the success of the system when it’s out in the world. At any reasonable scale, a customer probably isn’t going to monitor every piece of output your product generates; even if it’s only a few hundred pieces of text a week, inspecting every result would defeat the purpose of using the AI system in the first place. That means that failure cases will have to gracefully fall back to a person — if not, issues will accumulate over time and leave an unpleasant surprise for your customers to discover later.
What designing for humans means
So, what does it mean to design an AI product with human colleagues in mind? Determining the right path forward probably requires answering a few questions: (1) how do you detect the edge case is happening? (2) is it a workflow that’s interactive between the AI and the human, or is it a handoff? (3) how can the human get up to speed as fast as possible? (4) how can the AI learn from what the person did to handle the situation?
This list isn’t meant to be comprehensive, but it’s where we’ve started. There are no right answers to each of these. For example, detecting an edge case (e.g., an unanswerable question) is relatively straightforward for RunLLM, and support is probably a case where a person might want to take over when the assistant is not able to answer. Providing a summary could help the person get up to speed quickly, but realistically, they’ll end up digging into the full exchange to understand how best to help the customer. Learning is important, but it’s also critical to distill what the right answer was and if the assistant previously provided incorrect information.
An AI-powered SDR will detect edge cases similarly, but to keep the conversation flow, it might interact with its colleagues out-of-band (e.g., on Slack) to get the requisite information before constructing a response to the prospect. Since an AI SDR will handle a large volume of email, the person won’t want to read the whole exchange but will simply want to know what information they need to provide.
Once you have these answers, it’s a question of finding the right integration points, which sounds boring but is incredibly important. From a product perspective, doing AI-powered work means meeting the rest of the company where they live today — that means that AI systems will need to send emails, start Slack threads, open Linear tickets, or write Notion documents. In reality, it’ll be a mix of all these interaction points.
Like with the web, it’ll take AI products a few years to figure out what the right mechanisms are, but in some ways, you don’t need to reinvent the wheel. To figure out where you should be focusing your time, look at what tools your customers are using. Where do they spend the most time, and how can you save the time? That should at least point your roadmap in the right direction.
Realistically, this still isn’t going to please everyone. While most of the early adopters in the AI space are excited about improving the technology and providing constructive feedback, there are plenty of examples of people who seem to go out of their way to nitpick at every AI system. Given the limitations of the technology today, there might not be much that you can do about this, other than shake your fist at the clouds. 🙂
Design and product management always come back to human-centric principles, and while AI changes the details of what a person will need to pay attention to, it doesn’t change very much about how people want to interact. We like to joke that every B2B company ends up building a million integrations in one form or the other, and for AI work products, that’s going to be more true than ever.