We’re roughly halfway through the year and in the doldrums of a relatively slow summer news cycle — at least as far as AI goes. We thought it’d be a good time to check in on our predictions from January. The first half of the post is a summary of the broader themes and of how our thinking’s changed. There’s a more detailed review, prediction-by-prediction, down below.
We were right about the staying power of commercial LLMs. One of the main themes in our predictions was our confidence that the advantage the commercial proprietary model builders had was too large to be quickly surmounted. Perhaps that was low-hanging fruit in retrospect given the timescale on which these models were built. Nonetheless, there’s never really been a contender for the state-of-the-art models from OpenAI, Anthropic, and Google. The former two have been chasing each other with this year’s model updates, while Google’s been making steady progress. All signs point to those trends continuing.
We very much underestimated open LLMs. We were relatively bearish on open-source LLMs. We thought there would be investment but relatively little progress towards catching the proprietary models. Broadly, this was wrong. Llama 3 is really good, and we use it in our production inference pipeline at RunLLM. We’ve written about this at length previously, but tier 1 LLM quality seems to have hit an asymptote, and the open-source models are catching up quickly. Whether or not they end up fully catching the proprietary models doesn’t seem to matter at this point as open LLMs offer a very attractive price/performance trade off as-is.
We were right about hype but not always right about where. We had a lot of predictions that reflected the fact there would be serious hype around LLMs — the GPTs App Store, funding rounds, and large model releases. There’s been plenty of funding (even more than we thought there would be) and too many model releases to count, but some of the hype we latched on to didn’t materialize. Most notably, the GPTs App Store seems to have been a complete flop, and our predictions about reduced costs were directionally but might have been overly eager. There’s obviously still time, but it feels like the hype will trend downwards if anything.
We weren’t thinking enough about regulation. What seems to be a glaring omission in retrospect was any discussion about regulation. In an early draft of the original predictions post, we had something about the Biden administration taking no further action on AI in 2024 — we don’t remember why exactly we cut this, but the political pressures seem to have played out the way we expected. On the other hand, SB 1047 in California is both strongly opinionated and quite worrying, and it seems somewhat obvious that a state like CA would have stepped in to fill the void sooner or later. The regulatory battle is going to be an important narrative to keep an eye on for the rest of the year.
Now onto the actual predictions!
OpenAI and commercial LLMs
OpenAI will not release GPT-5. 50% So far so good! We’ll see if and when this changes, but we’ve heard relatively little about GPT-5 timelines this year.
GPT-4 per-token costs will come down by at least 5x in 2024. 70% 5x might have been optimistic, but GPT-4o is already 2x cheaper, so 5x is not out of scope.
GPT-4 (or GPT-5 if released) will be at the top of the LMSys Leaderboard at the end of 2024. 90% We didn’t quite account for incremental model releases like 4o, but this has held true.
Amazon and Google combined will have less enterprise LLM usage than OpenAI. 70% This is a difficult prediction to evaluate. There’s no clear data, but it’s clear that AWS has not made significant headway while Google is still trailing OpenAI and Anthropic.
At least two GPTs on the OpenAI app store will generate $100K in revenue. 50% This was a miss. As far as we can tell, the GPTs App Store really hasn’t found much traction at all.
Open-source LLMs
Llama 3 will be released in 2024. 90% ✅
At least 3 open-source LLM companies will raise funding rounds of $100MM or more. 70% ✅ ✅ (Mistral, Cohere)
There will be a new open-source LLM release from an established technology company. (Meta + Llama 3 do not count.) 50% ✅ ✅ (Snowflake, Databricks)
No open-source LLM will be within 5% of the quality (ELO) of the top commercial model on the LMSys Leaderboard. 90% So far, so good, with Gemma being the closest model to GPT-4o.
One open-source foundation model company that has raised at least $50MM as of 12/2023 will close shop or be acquired. 50% This was a bit of a bold prediction, and probably one that was too specific in its claim about open source. We can declare a bit of a moral victory on this with the functional acquisition of Inflection AI by Microsoft, but the full prediction’s still up in the air.
No open-source LLM company will reach $20MM in revenue. 70% This is an area where we were too bearish by far, and knowing what we know now, Mistral — whose open-source credibility is now in question — might have already surpassed this in 2023.
Other Predictions
There will be fewer dollars invested in AI companies in 2024 than in 2023. 50% This is an area where we were too pessimistic — the hype cycle has continued and hasn’t shown any signs of abating yet.
At least one US government agency will be involved in the building of a publicly available LLM. 50% SB-1047 alludes to this with its allocated funding. However, I don’t think anyone wants SB-1047 to pass in its current form.
Per-token fine-tuning by third-party services will come down by at least 5x. 90%
Per-token fine-tuning by third-party services will come down by at least 10x. 70% These predictions feel directionally correct, but it’s been hard to find concrete data on this. We’ll look for more towards the end of the year, but open-source LLMs are pretty cheap!
Llama 3 will have multimodal capabilities. 70% ❌ This was a miss — multimodality has continued to be popular, but Llama 3 was probably too close to being finished to have multimodality.
Anthropic will release a model with multimodal capabilities. 70% Same as the above.
Llama 3 vision is coming does that count?