The end of scaling laws doesn't matter

But LLMs will probably get better anyway

and

Dec 05, 2024

The potential end of LLM scaling laws has been a hot topic. While there’s plenty of technical discussion, what everyone is really wondering is why GPT-5 hasn’t yet been released and what in the world Anthropic is doing with the naming scheme for the latest Claude releases. Given the rapid rate of LLM innovation we saw over the last couple years, these are valid questions — how come the progress has seemingly slowed dramatically in recent months? We’ll leave the technical prognostication to others, but we believe that it doesn’t matter if LLMs get better.

Source: DALL-E 3. LLMs could be frozen in carbonite, and we’d still be able to unlock billions of dollars of value with the existing technology.

Let’s imagine a world for a second where LLM technology was frozen in time. Whether it’s for technical reasons or regulatory reasons or something else altogether, let’s pretend that we know for a fact that we will never get an LLM that’s better than the current state of the art (GPT-4o, o1 preview, and Claude 3.5). It’s painfully obvious that in this world there is a huge amount of progress to be made simply from building more applications with the existing models and increasing distribution and adoption. What that means is that — regardless of how quickly models improve — AI adoption isn’t reliant on having better LLMs. It’s more strongly determined by the ability for AI applications to scale distribution and adoption than it is by LLMs themselves getting better.

The main reason why this is the case is that quality is not the main barrier to adoption. The two main limits on adoption today is the lack of high-quality applications to automate tasks and the lack of willingness to adopt high-quality applications when they do exist.

In the first case, there are tons of rote tasks that each of us do today that can and should be automated. The lack of adoption in these cases is primarily because high-quality applications in critical domains don’t exist yet. It’s not that plenty of tools haven’t tried to build automated email responders or meeting schedulers, it’s simply that none of them are good enough yet (sorry Superhuman). In some cases, the technology exists but requires significant human handholding in order to be successful.

In the second category, we frequently see organizational resistance to adopting high-quality AI solutions. One common concern a fear of automation and job loss, but as we’ve discussed previously, we don’t think job loss is a major concern. In other cases, there’s an insistence on building solutions in-house, which brings us to the classic build vs. buy challenge that enterprise software companies face. We’re biased, but we generally believe that build decisions can often be wild good chases the delay time-to-value — but can of course turn out to be incredibly valuable in some circumstances. Whatever the outcome of the build vs. buy debate may be, it’s clear that the existing technology and applications can provide huge wins in productivity without any ground breaking innovation.

This point is reinforced by the fact that many of the most successful AI products don’t rely on AI innovation alone. As with any product, UX is probably the most important thing when it comes to adoption of AI. Our favorite example is Cursor, which has gotten an incredible amount of attention and adoption — and is probably one of the most beloved AI products on the market today. The fascinating thing about Cursor is that there is relatively little AI innovation in the product: The AI features in Cursor are primarily wrappers around existing LLMs like GPT-4o and Claude 3.5. What Cursor has done incredibly well is build a seamless UX that allows developers to ask for AI assistance without breaking out of their workflow. There’s less technical depth to this than the custom models that back GitHub Copilot, but the UX is what has enabled Cursor to deliver a better experience.

While an IDE is a relatively narrow example, that same principle can be applied to a variety of other applications. Taking the existing technology and applying it in a clever way that’s natively designed to surface the power of LLMs is possible in a variety of domains — and can lead to huge productivity wins. By way of contrast, there are countless examples of products that have tried to bolt an AI-powered UX on as an afterthought to an existing interface; our least favorite is probably Notion, where we’ve almost never intentionally activated Notion AI, but the interface pops up every few seconds. In a tool designed for writing text, the power of existing LLMs could be integrated more effectively, but the poorly designed UX has hindered adoption.

If — and it’s still a big if! — an application is thoughtfully designed for its domain, the wins with the existing technology are blindingly obvious. We’ll focus on RunLLM since we’re obviously very familiar with our customers. We’ve found time and time again that whatever a prospect’s expectations are when they come to us, they are consistently pleasantly surprised by how quickly they’re able to make themselves productive. In our case, the wins are saved time, increased adoption, and generated insights from customer conversations. We’ve now gathered enough data across different types of companies to be confident that the existing technology will make every company doing technical support more productive. Having talked to founders working on other applications, it’s clear that this isn’t a one-off example.

All of this is to say that simply applying the existing technology more broadly and increasing distribution is enough for us to see huge wins in productivity from AI. We firmly believe that distribution and adoption matter more than improved technology. If and when the technology does improve, it won’t solve distribution and adoption by magic; the same work will need to be done independent of how much LLMs improve.

None of this is to say that LLMs won’t get better. In fact, we think they probably will — even if it isn’t at the same rate that they have been improving for the last few years. Improved models will enable new functionality and even unlock certain use cases where existing models aren’t able to meet user requirements.

We’re not here to forecast whether and when those improvements will happen, and frankly, we don’t think it’s a very useful exercise. We know beyond a shadow of a doubt that there’s plenty of time and money being poured into making improvements happen sooner. If you’re not working on one of those problems, it’s probably a better use of your time in the interim to be thinking about how to improve your UX and distribution rather than guessing when GPT-5 will come out. And iff you’re worried about the AI bubble bursting, building applications that deliver concrete value is almost certainly more likely to mitigate the trough of disillusionment than simply having shinier toys to play with.

The AI Frontier

Discussion about this post