A couple notes:
At RunLLM, we’re building out Support Engineers for popular open-source projects. You can find the beta of the tool here — we’d love to hear your feedback. Feel free to reach out or respond directly to this email.
We’ll be off next week around the 4th of July holiday. We’ll have a short post on supporting open-source communities, and we’ll be back the week after!
If you’re in the bubble we live in (and you probably are if you’re reading this blog), you probably spend a lot of time thinking about where AI can be applied to a wider variety of applications and what the existing limitations are. But if you zoom out, we’re now almost 3 years into the post-ChatGPT era (3 PGPT, if this was an episode of Andor), and plenty of people less invested in AI than us seem to have a sense that AI applications should have changed everything about how we operate on a daily basis. And yet, the technology world largely seems be the same as it was way back in 2022 or even 2019. Why is that?
The short answer is specificity. We’re lacking specificity at every level of this conversation, but three things in particular stand out to us: (1) changing everything doesn’t really mean anything; (2) specificity is required for success, and we haven’t yet figure out the best UX to maximize specificity; and (3) narrowly-scoped (i.e., specific) applications already are succeeding.
What should actually change?
We acknowledge that “why hasn’t AI changed everything” is a bit of a strawman. Most people aren’t expecting the singularity to happen and for us to be living in a Star Trek-like utopia already. What we find in talking to friends and family who don’t spend all their time on AI, however, is that their expectations aren’t clearly defined either. The feeling is more that AI hasn’t really changed anything. Many of them will use ChatGPT casually, and while it’s a fun novelty to get custom answers, that doesn’t really increase GDP or lead to scientific breakthroughs.
The immediate promise is of course reducing the amount of manual work that we do, which allows us to focus on higher value things. That type of micro-improvement probably doesn’t lead to any major changes in our lifestyles, but it does have a cumulative impact. A simple example is the podcast interview we refer to below from Dario Amodei — we knew what he said but couldn’t find the reference so set o3 to run in the background while we wrote other parts of this post. It saved us 10-minutes of Googling and reading different interview transcripts, which allowed us to write this post faster (which meant there was time to squeeze in a workout!).
Longer term, we will of course likely see bigger changes as we automate more and more tedious tasks in our daily lives, but there’s going to be a frog-in-boiling-water effect. Three years ago, the fact that a text-in, text-out model could answer any question you posed to it was novel. Today, we’re annoyed if o3 doesn’t find the exact thing we were looking for. Those improvements have been the result of countless incremental improvements, and the same will be true for the next few years. In other words, things have changed faster than we realized, and while everything about lives isn’t different, LLMs have enabled a huge amount of time savings if nothing else.
Specific inputs lead to useful outputs
The ads that we see on TV for AI agents and assistants often start with prompts that are functionally useless: “Find me an Airbnb in New York!” When are you going? How many people are going? How much can you spend on it? Do you need to stay anywhere particular in New York? Are there any restrictions you have? Somehow — mostly via the magic of video editing — you get exactly the result you were looking for.
In reality, you need to give anyone (whether human or AI!) who’s helping you with a task sufficient instructions to be helpful. Giving a human the instructions above would lead to an equally useless result. Because of how the interest has evolved, we’ve been trained to send short snippets of text and expect useful responses rather than fleshing out our thoughts, but the best results from LLMs often come when you provide them the most context. (On a related note, we often start by prompting an LLM to ask us all the questions that might be helpful to solve a problem before we actually ask it to solve a problem. The more specific the better.)
A major missing piece here has nothing to do with AI: We haven’t figure out how to encourage people to provide sufficient context. There are some silly hacks, like Deep Research always asking a set of questions before proceeding, but this can be annoying too because if you provide enough specificity up front, the questions feel manufactured. Voice inputs (with text outputs) will help here as well, as it’s easier for us to talk out loud than to type in detail. But generally, there’s more for us to do to make the UX better: Don’t rely only on text areas for input, use context clues to infer helpful details, and surface assumptions that users can then correct. When we collectively figure out the best UX to encourage specificity, that will make existing models dramatically more useful.
Narrowly-scoped applications work
Despite the general feeling we’re arguing against, we’re also firmly convinced there’s many areas where AI applications are actually providing a lot of value. The theme amongst all of them is that they are narrowly scoped to solving a specific problem rather than trying to boil the ocean (i.e., a general agent to do tasks for you).
The contrast is obvious to us given our perch. Just this week, we did a routine check-in with a customer who told us that RunLLM changed the way that they work. They were previously considering leaving their job because they spent all their time answering questions rather than working to make customers successful with their product. Using RunLLM as their first line of defense freed up hours per-day and significantly increased their job satisfaction.
By way of contrast, we tried Manus recently to do research to find an Airbnb for a group trip (with the right amount of detail for the record!), and the results were useless. It gave us a PDF with a list of “House in [city]: $NNN per-night” without any links or details about the place or why it chose each one. This isn’t a criticism of Manus — the right application here would be built to surface options in the same way that Airbnb does but with custom filters generated via a combination of boolean filters and model evaluation. Unfortunately, Manus isn’t built to be this narrowly scoped.
One common version of the overarching criticism is summed up by a question Dario Amodei was asked by Lex Fridman: If LLMs have been trained on the sum total of human knowledge, how come they haven’t invented anything new yet? The answer that Amodei gives is illustrative of what’s missing: Operating in a sandbox limits the system’s ability to experiment and iterate, which in turn means that an LLM in isolation can’t generate new results. An agent trained to do this would be integrated into the right infrastructure to run experiments, evaluate results, and update hypotheses. By definition, this would require a narrow scope per-domain.
As with any hype cycle, there’s some substance and some snake oil. We’ll be the first to tell you that there’s plenty of nonsense out on the market today that isn’t adding any value and is probably a waste of your time.
The reality with AI is that there are plenty of small things that have already been improved that we’ve become inured to — success by a thousand paper cuts. The bigger advancements have come in narrow (and usually, less sexy) areas like customer support, finance, and security. As more and more thoughtful applications are built, we’ll see more and more examples of this kind of success, but building good applications takes time.
This is not all that different from Sam Altman’s claim that AGI will actually be a set of hyper-competent agents that behave like virtual coworkers or employees. Whether you want to call it AGI, AI changing everything, or just the natural of evolution of technology, the outcome is the same. The promise of AI hasn’t gone anywhere, but the fun part is that we’ll probably uncover it bit by bit rather than all at once.