AI news roundup: You see what you want to see
Rorscach tests, enterprise vs. consumer tech, hype cycles, and more
Every major AI announcement — like last week’s news about GPT-4o and the updates to Gemini 1.5 — is a Rorschach test. Everyone sees in it what they want to see — everything from the doom of all AI startups to the advent of the singularity. As usual, we’re somewhere in between. The general sentiment in our conversations after the latest round of updates has been excitement about the possibilities and — of course — frustration with the controversies.
We wouldn’t be a proper AI blog in 2024 if we didn’t have an opinion, so we’ll start there.
From a technology perspective, the changes are genuinely impressive. Seamless multimodality is not something we would’ve expected to be possible even a year ago, and shipping that while reducing cost and latency by a factor of two is a pretty huge improvement. It’s particularly impressive given that it’s the second time they’ve cut costs in 6 months. Whether OpenAI’s eating costs to gain market share remains an open question, but the performance gains and the consistency in cuts makes us think there’s substance there.
Gemini’s improvements, unsurprisingly, followed a similar path. What’s obvious with Google’s announcements is that their control over the underlying operating system gives them a distinct advantage in building cohesive applications. Asking text-based questions of your photos, for example, is something OpenAI has the capabilities to do but lacks the integration points to build effectively.
For both companies, these changes show a ton of innovation in the underlying technology and in research teams’ capabilities to control how models behave. They also seems to imply that OpenAI’s lead is shrinking — at least relative to Google.
OpenAI’s voice controversy is of course also top of mind. To us, it seems like an unnecessary own goal. Again, the improvements they made are genuinely impressive! If they had packaged those changes in a less grandiose way — without trying to recreate Her (by their own admission) — it seems that they could’ve avoided much of this blowback. It’s perhaps expected, given their mission, that they’re trying to imbue their own vision of humanity into the models, but letting the technology be technology would’ve been a better bet. It’s the tone & style of the demo that was really off-putting in our opinion.
We’ve gotten plenty of questions in the last week about whether we’re worried about GPT-4o and how we’re planning on establishing our defensibility in the light of model advancements like these. These questions were a little confusing for us at first — why would OpenAI adding native multimodality affect its ability to answer detailed technical questions about products it’s never been exposed to?
We’ve learned two things from these conversation. First, the hype around AI has fully made it into the general zeitgeist — we’re getting questions about AI from relatives who’ve never thought about it before. Second, “AI news” in general is far too broad a category, and where most people see impressive consumer news, they assume it applies to enterprises as well.
Our (perhaps hot?) take is that GPT-4o will have a relatively small impact on most B2B AI applications. Reduced cost and improved latency are valuable, but they’re not changing the types of interactions customers are having either. The analogy that comes to mind is Moore’s law — chips getting faster & more efficient didn’t kill software innovation but in fact helped accelerate it.
Enterprises might benefit from having easy access to non-text modalities, but the real benefit for most enterprises comes from access to private and proprietary data. These latest model updates have no impact on those types of applications. Realistically, you’re probably not going to be having a detailed conversation with Sky1 about your Kubernetes cluster or your sales pipeline anytime soon.
The consumer world is different; we’re no experts on building consumer tech, but we can imagine a ton of use cases that will immediately benefit from seamless multimodality. Personally, speaking to generate inputs and reading (or skimming) to consume results is a particularly powerful combination. The obvious question is whether tools like Rabbit and Humane have real differentiation anymore or whether they are commodified GPT-wrapper boxes. (Especially if OpenAI makes its own devices! The latest rumors are that Humane is looking for a buyer.)
Our last observation is about the AI market at large. Just when we thought the hype couldn’t get stronger, things have accelerated in the last couple months. Between these announcements, Llama 3, the latest flurry of crazy funding rounds, and ongoing investment activity, it’s safe to say that we’re back. Theoretically, the hype will die down (or the bubble will burst) at some point, but there’s no signs that it’ll be any time soon.
Even still, there are more open questions than ever in our minds, for both the businesses and the technology.
At the bottom of the stack, we’re curious about things like the investment (time and money) and advancements required to make these changes real. Was support for multimodality a year of work or 3 months? Will these techniques keep scaling, or are we starting to see asymptotic behavior (evidenced, perhaps, by clustering of LMSys Elo scores)? Should we read into the fact that multiple models have started to catch up to GPT-4 but none, including OpenAI’s own efforts, have really surpassed it in the last 14 months? Will there be a GPU bust as availability increases?
Moving up the stack, it’s not clear how these changes affect AI application builders. Will they commodify the core models, or will they further differentiate offerings? Will individual models become more capable (not yet!), or will high-quality applications still be compositions of multiple models? Will there be more opportunities for model specialization? (We think so!)
Across the market, we’re starting to see more focus on quality, but most teams still don’t know what that means — do you have an evaluation framework or rely on spotchecks? What is good enough? How do startups scale in a fashion that ensures defensibility?
Regardless of what you think the answers are, the technology’s changing fast, and we’re probably in for a crazy second half of 2024. Adaptability is key. Make sure you’re prepared for whatever comes down the road — chances are it’s not what you expect.
The AI voice allegedly inspired by Scarlett Johansson, not Joey’s research group. Incidentally, Joey’s research group, the Sky Computing lab, is working on how to deploy Kubernetes across clouds — but there’s no voice assistant for that (yet).