Over the last couple years, we’ve written a few times about different approaches to AI product development. It’s been about a year since we wrote about Deep AI Work — the idea that domain-specific AI products would have an advantage over horizontal platforms that let you build any agent that you could image. A year later, we feel like our claim in that post was both accurate but actually too weak. Not only do domain-specific products have an advantage, there seems to be almost an almost linear return in value to increasing specificity. This extends to a level of specificity that we’re surprised to see.
What’s interesting about this is that it seems to apply at multiple levels of abstraction. We’ve heard discussion about whole companies that are focused on building workflows for very specific use cases, but we’ve also found in our own experience that it makes sense to build agents for specific tasks rather than agents that are trying to do many things at once.
Before we dive in, it’s worth saying that is a positive argument for depth, rather than a criticism of breadth. There are plenty of general-purpose agent platforms that are growing quickly, and there’s clearly value there too — but depth is where we’re seeing the most excitement and surprising ROI.
The returns to depth
While this is a little bit outside of our area of expertise, we’ve heard multiple instances recently of agents that are built to do highly-specific enterprise tasks — things like SAP migrations or old-school language (e.g., FORTRAN) updates. These agents require a significant amount of per-customer customization, since they’re operating on highly-calcified legacy systems, but once they work, they command an incredible amount of value (7-8 figure contracts) from their customers. These are the types of tasks that historically would have been consigned to long-term consulting contracts that are both slower than AI systems and are also exorbitantly expensive (8-9 figure contracts). The opportunity to automate these tasks is hugely valuable. To be honest, if you’d told us 18 months ago that startups would be succeeding with these workflows, we probably wouldn’t have believed you.
This might seem like an extremely esoteric set of examples — how many of us are going to build agents for highly-specific legacy enterprise workflows? What’s interesting is that the same principle has applied to our own experience in building RunLLM. We’re not in the business of SAP or FORTRAN migrations, but as a product principle, we’ve found that building agents tailored to specific tasks is a very productive approach.
Over the last year, as our product’s matured and we’ve begun working with more teams — specifically engineering and SRE teams focused on alert triage — we’ve settled on an architecture with a shared platform for knowledge management, monitoring, and agent orchestration layer with customizable agents built on top for each use case we enable. For example, we’ve found that there’s value in going more specific than one agent for SRE tasks — we get the best results from having a technical Q&A agent, an alert triage agent, a ticket analytics agent, and so on. Our vision for our product is increasingly a suite of agents each one of which is responsible for one precise task, with all the agents built on a shared knowledge base and orchestration layer.
Why depth works
The natural question, of course, is why this pattern holds. What are we getting out of narrowly-scoped agents that general-purpose platforms aren’t able to accomplish? Hindsight is always 20/20, but we’ll take a bit of credit for having identified the benefits to deep AI work a year back. Here’s what we think:
Depth = expertise. As a general economic principle, tasks that require more expertise are more valuable. The number of people you can hire to do complicated cloud infrastructure management is dramatically lower than the number of people you can hire to summarize meeting notes, so the latter is naturally much cheaper than the former. The same is true in AI: the number of agents you can trust to do complicated tasks is not very high, so they can command a premium. The level of complexity of these products of course means that the engineering investment in building them is higher, and correspondingly, the complexity means that getting to a level of trust where you can adopt the product is also more work. But if you can do it right, the returns are obvious.
Jobs to be done. Agents are everywhere (or at least the marketing is!), and it can be difficult for customers to know what specific problems your agent(s) can solve. Even saying you’re building an AI SDR or SRE doesn’t necessarily tell the customer exactly what you do. By getting into the specifics, you give your customers a clear understanding of what specific tasks you can do — one agent might customize and send email sequences to prospects who have spent time on your website, while another might scour LinkedIn for cold outbound signals. Both are valuable, but they’re different sets of functionality and should be scoped accordingly.
Guardrails are critical. As agents move into more complex tasks, putting the right guardrails on agent behavior is critical. Recent research has shown that individual users hate when AI systems say “I don’t know,” but accurate guardrails are critical the success of an agent in the enterprise. Narrowing the scope of an agent such that it’s focused on a precise task allows the developers to put much clearer guardrails on that agent. For example, our technical Q&A agent has to prove that it has enough data to answer a question before it can start generating an answer. (And that proof process typically has 5-7 steps.) More broadly, an agent with strong guardrails is only allowed to do certain things, so if the requests or requirements break those boundaries, you can safely rule those actions out. That is critical to enterprise trust.
Workflows, workflows, workflows. While it’s fun to talk about the cool things that you can do with fancy AI functionality, there’s just as much — or likely more — value in integrating into the specific workflows that your customers follow. The more integration you can do, the more valuable your product will be and the stickier it will become over time. And while building integrations has gotten easier with AI coding agents, you still can’t boil the ocean. Picking and choosing where you integrate and how deeply you can connect to the relevant systems is hugely valuable. The better these integrations work, the happier your customers will be.
Why this matters
Does this mean we should all only be buying agents that are focused on building as much depth as possible? Not quite. Everyone reading this probably uses and AI notetaker, and at ~$20/month per-person, they all provide a phenomenal amount of value — imagine hiring a person to follow you around to do the same task! There are undoubtedly valuable businesses to be built in widely applicable, comparatively lower-expertise products. (Interestingly, even our preferred notetaker — Fathom — has started to focus its positioning going deeper in specific use cases around sales and marketing.)
Nonetheless, it’s clear that more generic products — notetakers, chat-with-X platforms, etc. — are tending towards commodification because the barrier to entry to build a product like this is quite a bit lower than some of the applications that we described above. In the long run, the products that build deeper, more specific integrations will both be more defensible and also will earn their customers’ trust in a way that allows them to expand into adjacencies. That’s not the only way to build a scalable business over time, but it’s certainly the one that we’re the most excited about!
Really appreciate the insights - this newsletter helps me stay focused amongst all the hype!
I feel like that hallucination solution has potential - could become viable in a few more iterations