Pricing any product is hard, and we’re not experts by any stretch. We’re still in the early days of figuring out how to price RunLLM, but it’s something we’ve been thinking about a lot recently. We thought it would be useful to share how our thinking has evolved as we’ve spent more time with customers and how the dynamics of AI products change pricing principles.
In one sentence, the lesson that we’ve learned is that seat-based pricing is relatively undesirable and that AI tools should instead price by work done. Depending on your vantage point, this may be blindingly obvious or a fiery hot take, but this is by no means a new take or even a new concept.
Historically, most work productivity tools derived their value from having your whole company integrated. Slack, Linear, Notion, and the likes are useful as team collaboration tools, and the larger your team gets, the more product usage there will be. It might not necessarily be linear in each new team member, but there’s a clear correlation. Even if you don’t create more tasks or write more docs, having every additional team member have access to the product provides an obvious benefit.
AI-based productivity tools don’t follow the same pattern. Take a simple example like email — the volume of email we get (as the CEO at RunLLM and as a professor at UC Berkeley) is very different from the volume of email one of our engineers at RunLLM or a typical grad student get. For an AI-based email responder to charge based on seats would be silly — there’s a lot more work being done if you’re writing a 100 emails a day than if you’re writing 2.
That’s what makes something like Notion AI so frustrating. It feels silly to pay $10 per-person per-month when some people on the team spend so much more time writing by virtue of their jobs than others. (That and the fact that we don’t find Notion AI very useful...) This is different from the general pricing for Notion because access to the company’s documents is valuable in and of itself, even if the user doesn’t spend much time writing. With AI, the value is in the automation that it’s providing — the work that it’s doing — and you want to pay accordingly.
That’s the general principle behind work-based pricing, and it’s something we’re seeing increasingly commonly amongst AI products. RunLLM prices based on the number of questions (substantively) answered. AI-based SDRs are pricing based on meetings booked. Model providers themselves, of course, are pricing based on tokens generated.
Consumption-based pricing isn’t new
What we’re calling work-based pricing is just a form of consumption-based pricing, which has been around since the beginning of the cloud for software (and before that, around 3000 BCE to price water used for irrigation in early Mesopotamia). Almost every service you use from the AWSs and GCPs of the world bills based on the number of compute-seconds (or hours) rented and the amount of data stored. As serverless systems have become more popular, pricing became even more granular: You only pay for the resources you actually consume (as opposed to the resources that you rent).
Enterprises have historically been averse to purely-consumption based pricing, however, because it’s unpredictable and difficult to budget for. In fact, when cloud functions were first becoming popular, one of the major cloud providers told us that the biggest blocker to adoption was that they weren’t set up to give enterprises fixed-usage contracts for cloud functions yet — since they were relying on purely consumption-based billing, enterprises were unwilling to adopt the infrastructure.
Consumption-based pricing, especially at the infrastructure level, has also historically been difficult to implement. Counting the number of seconds that a function is running is maybe relatively straightforward, but you need to reliably track this (through failures) across thousands of servers and across hundreds of data centers. You’re also now eating the cost of service spin up and tear down times, and those costs are happening more frequently as you switch between different customers’ workloads. As such, consumption-based pricing has mostly been at the infrastructure layer — where teams have the expertise to solve these problems — until relatively recently, while everyone’s has stuck with seat-based pricing.
Why it’s the right fit for AI
AI products should be treated differently. We gave a quick summary above explaining why work-based pricing is valuable in the context of AI products, but it’s worth dwelling on.
Well-built AI products will be able to accelerate companies’ productivity by generating high-quality, human-equivalent results in a fraction of the time and for a fraction of the cost. What that means is that the tedious things people don’t like doing — responding to the 100th “Hello, world” support ticket, answering repetitive emails, finding high-quality sales leads — will be done by AI. The more tedious things that get automated, the more time is being freed up for people to do high-value work.
That has two consequences. First, you’re no longer paying for the number of people with access but instead paying for the amount of work done. Having another team member have visibility into what your AI SDR is doing provides increased visibility but doesn’t change the output of the product. Second, you can realistically incorporate whether the work was successfully done into your pricing model. AI agents (just like humans!) will sometimes get things wrong, and you can take that into account when determining how much value the product has added.
To use an example we know well, support teams measure their productivity based on the number of tickets handled and how quickly they are handled; RunLLM does that work, cheaper and often more accurately, so we charge based on the number of questions we answer. But if we get an irrelevant question, like “Who won the Napoleonic wars?” or if we can’t find the data to answer a relevant question, we’re not adding any value, so we don’t charge our customers.
The more we talk about it, the more this sounds like the way a consulting business works. Generally, that’s a good thing. Unlike traditional software, AI can generate coherent end-to-end work, which is exactly what you’d expect a consultancy to do. If you buy into the the hype around agents, then you’ll probably believe that this means you’ll have an army of AI agents running around doing all your busy work for you. Whether or not that’s the case in every domain remains to be seen, but for the ones where AI’s shown the ability to work well — support, sales, documentation, etc. — it’s already reality.
Challenges in work-based pricing
Switching to work-based pricing is of course not a panacea. Charging customers based on the work that’s been done by your product introduces edge cases that a straightforward seat-based pricing model avoids.
The most obvious one is capturing what “work done” means. Consider an AI-based SDR service. Do you charged based on meetings booked, meetings held, or meetings converted? You can find consultancies that do all three, where charging for more success means a higher cost per-unit. There’s no right answer here, but the major challenge is that buyers are more skeptical of AI today. A mistake a human makes is easily remedied: ”Sorry, we’ll make sure it doesn’t happen again!” A mistake an AI makes is examined more closely, and we’ve found that we have to go out of our way to convince folks that AIs can learn the same way (sometimes faster, even!) that humans do. At scale, these agents will have to operate autonomously, so you’ll need to earn customers’ trust that your product will behave the way it’s supposed to.
As we mentioned above, dealing with enterprise budgets for consumption-based pricing can also be tricky, but this is an easier problem to solve. The general principle we’ve seen — and what we do at RunLLM — is a tiered, usage-based model. The customer pays a certain amount up-front for their expected usage and then pays per-unit for any overage. This is, again, a pretty standard model for how usage-based pricing has been done historically, so there’s no secret sauce here.
The most interesting challenge is understanding what the work is actually worth to users. The cloud infrastructure examples we shared above are relatively low-margin businesses that operate at ridiculously high-volume. The price that you’re paying for the marginal second of a GPU is some function of what that GPU costs. Delivering work on the other hand is a high-margin, low-volume business. That means prices per-unit of work are going to be much higher, and businesses might have some early sticker shock. That said, we believe pricing based on the value of work — roughly correlated with what you’d pay a person to do it — is the right direction.
Again, there’s no right answers here, but we’re already seeing customer expectations change as they understand that they are not paying for compute cycles but are paying for high-quality work. Nonetheless, there’s a long way to go in getting this message out into the market.
Exceptions to the rule
There are always exceptions to every rule. In this case, the most glaring exceptions are the two categories of tools that set of the generative AI revolution: ChatGPT and GitHub Copilot. Both these products charge a fixed seat-based price rather than pricing based on usage.
There are two reasons that this works. First, predicting usage of both these tools is extremely difficult, so pricing based on something like tokens used would create a negative incentive — you don’t know how much you are paying, so you might be worried you’re going to run up your costs if you’re not careful. More importantly, quantifying what the “work” done in this case is even more difficult than what we described above. How does ChatGPT measure whether it’s completed your task, or how does GitHub Copilot determine whether its autocomplete suggestion was valuable? In both these cases, you’re at the whim of the user’s feedback, which is noisy and can be gamed.
Generalizing from these two examples, the relatively low cost and generic nature of the tasks in both cases means that seat-based pricing will likely continue to work in the short run. We’ll likely see the same trend for other general-purpose products, but as the market matures, we’d expect to see a product like Copilot especially move towards completing tasks more holistically and charging accordingly by work done.
As much as the AI market has changed in the 18 months, we’re still incredibly early. Every truly AI-native business is still understanding customer dynamics, which means that everything from pricing to margins to volume-based discounts is still being figured out. We’re in the same boat as everyone else here, and we certainly don’t think we have the answers, but we’ve been thinking about this topic a lot in recent weeks because we’re in the process of redoing our pricing model for RunLLM.
There’s a whole another sub-topic here around implementing usage-based billing and whether it’s worth paying for the many services that have popped up to automate this process. We haven’t yet formulated our opinions — and it’s not particularly specific to AI products — so we’ll put in a pin in that for now.
Whatever the concrete mechanics end up being, we’re convinced that work-based pricing is the direction that AI is headed, certainly for enterprises and maybe for consumer technology as well. Perhaps AI is the breakthrough we needed for ubiquitous micro-transactions on the internet?
many (i'd say most) markets for units of labor operate on asymmetric information that benefits many (maybe not all) in the ecosystem. work-based pricing would need all that to be transparent