This may seem like a silly question to you — it sort of is! But it’s a question that we unintentionally get asked quite a bit nowadays. The form of the question is usually a bit different and a little more circumspect. What we often get asked is something more similar to, “There’s so much innovation going on in foundation models — how does RunLLM differentiate itself?”
We’ve found ourselves struggling to answer this question. Our struggle isn’t because we don’t think there isn’t differentiation in what RunLLM is building — it’s because we find that people are overly fixated on what the model itself is and not how it is used.
As we’ve said many times, we believe strongly that LLMs are incredibly powerful tools, and we think there’s a lot of value to be added and money to be made at the frontier model layer (though how many winners there will be and who they are is uncertain). But despite the rapid pace of innovation in frontier models, a frontier model isn’t an application — anymore than a database is a SaaS application or an engine is a car.
Again, this might sound like a silly distinction to you, but it’s worth emphasizing. In all three cases, the core components (databases, engines, LLMs) are critical to the existence and success of the broader application, but the application isn’t complete with a high quality (database, engine, LLM). Each core building block can be used for many different applications, and it’s the full application that ultimately delivers the most value to the end-user.
There’s a few implications to this framing. First, in the long-term, AI applications in aggregate will generate more value than the models will. The models will of course be more valuable than any individual application, but it’s the packaging and integration of those models into usable packages or existing workflows (the last mile) across many, many different applications that will dominate. This trend will be amplified by the fact that the frontier model layer isn’t going to be a winner-takes-all market — there will be many different options, and as models get better and better, there will be downward competitive pressure on costs.
Second, it has implications for how we as AI application builders spend our time. The core AI componentry is of course critical. There are good AI applications, and there are bad ones, and much of the distinction comes from how you’re using LLMs. However, it’s also pretty clear that having good use of LLMs isn’t enough. Depending on what application you’re building, you will have to integrate into the tooling that your customers use day-to-day. AI applications are increasingly becoming synonymous with job titles, so you wouldn’t expect to hire an AI Support Engineer who told you that they don’t answer tickets on Zendesk or an AI Software Engineer that doesn’t use GitHub. For example, with RunLLM, we prioritize integrating into all the channels via which businesses communicate with their customers — documentation sites, ticketing systems, Slack, etc. For each one of these surfaces, RunLLM’s support experience is designed to match user expectations.
Similarly, an AI SDR will have to integrate into an organization’s sales tooling, and an AI SRE will have to integrate into observability infrastructure. Once you adopt this job-based framing, it becomes exceedingly obvious that innovation in the model layer — while important — doesn’t change the outlook for applications. No matter how good an LLM is, it’s not going to integrate in your enterprise ecosystem effectively, both in order to find the right input data at the right time and also to simply workflows. Even more importantly, it becomes clear that each AI application doesn’t fulfill a single task but instead does a suite of tasks. RunLLM answers support tickets but also helps improve documentation and KB entries and also generates insights about customer conversations. Each of those tasks is backed by many different LLM calls, and it’s the combination of the system integration and thoughtful application that matters the most.
Finally, the focus on the use of models themselves is going to change dramatically as the models improve. Paradoxically, better models are going to widen the gulf between good and bad applications. Smarter models of course mean that all applications are going to get better to the extent. But smarter models also mean that the capabilities of top end AI applications are going to improve dramatically — less well thought out applications are going to look mediocre in comparison. Where is that difference going to come from? From the use of data (making sure your application knows what it needs to know) and from the use of models (compound AI systems are here to stay). The better you get at both of those things, the more mature your application is going to look. Ignoring those things and relying on smarter models as a panacea means that you’re going to fall behind very quickly.
As always, we’re biased. We’re neck deep in building an AI application, and we think a lot about what differentiates our product and how we best explain that to customers and investors. This is an exercise we think about both when we’re in the conversation and through assets like our website. Even accounting for that bias, we’re very confident that o3 or o5 or any foreseeable model innovation isn’t going to suddenly remove the need for AI applications. That doesn’t mean that all applications are created equal: Depending on how you use the tools at your disposal, you will see different results. But at the end of the day, the chassis of your car and the integrations in your AI application matter as much as the engine.
Great perspective! I’m curious — based on the current focus of regulating at AI as general purpose technology, which approach may make more sense to you (especially from the perspective of enabling continued improvement of your product):
1. Legislation at the model-level, focused on an adaptive risk framework, with most restrictions passed to the application layer by proxy.
2. Regulation more heavily weighted at the application layer on sectoral/agency rule-making basis, especially given your point about great models not automatically translating to great applications.
I know these two aren’t mutually exclusive, but sometimes it is presented as such when looking into the opposing schools of thought around AI policy and governance.
Would love to hear your perspective, given the points you made here! I can see an argument for either end.