This is our last post of the year, as we’re taking a holiday break for the next couple weeks. We’ll be back on January 9th with our next post, but we might have some cross-posted for you all over the holiday break.
2024 has been a wild year in AI. We’re sure you’ve already seen plenty of general recaps over the last couple of weeks, so we won’t repeat much of that. Personally, we have a lot more data than we did 12 months ago on how AI technology and AI applications are maturing — both from our own experience at RunLLM and from seeing how our expectations played out.
So, get ready for a self-critical, data-driven, end-of-year post where we:
grade ourselves on our 2024 predictions (grades are due at Cal!)
share some of our most popular and our favorite posts from the year
share 3 key lessons we’ve learned this year.
We will wait until January before making prediction for 2025.
Lastly, we very much appreciate your support. We’ve more than doubled our subscriber count (1,106 → 2,398) this year, and we’ve had over 100k total views. Our favorite interactions are when you all reach out — we’d always love to hear from you, whether it’s to share feedback or just say hi!
Alright, let’s dive in.
Looking back on our predictions
As usual, making predictions is hard, and our accuracy could’ve been better (we’ll let you decide what the curve will be!). If it weren’t for Gemini making a late push to be the best model on the LMSys leaderboard, our accuracy would’ve been directionally correct, if over confident. Alas, we’ll try better next year!
Before we dive into the specifics, here are a couple high-level observations about our predictions:
We were overly pessimistic about open-source LLMs, which mostly got a lot better this year. We correspondingly made some pessimistic predictions about open-source LLM startups, when really we should’ve been thinking about a lot of foundation model startups (think Inflection, Character AI). We were generally right that proprietary models would maintain their advantage, though the advantage (especially OpenAI’s) wasn’t as solid as we expected.
We made some predictions about inference cost reductions, which were directionally correct but not in the way we expected. We were thinking systems optimizations would bring inference costs down, but models like GPT-4o were smaller models that achieved the same performance as previous generation larger models. This relates to the test-time compute trend, which we’ll have more on soon.
Likelihood Accuracy 50% 40% (2/5) 70% 57% (4/7) 90% 50% (2/4) Unresolved 2
Open AI and commercial LLMs
OpenAI will not release GPT-5. 50% ✅
GPT-4 per-token costs will come down by at least 5x in 2024. 70% ❌ While we were directionally correct, we missed on the ratios. Input token costs came down by 4x and output token costs came down by 3x.
GPT-4 (or GPT-5 if released) will be at the top of the LMSys Leaderboard at the end of 2024. 90% ❌ The latest version of GPT-4 typically stayed at the top of the leaderboard for most of this year, but the competition from Claude and Gemini was fierce, and the latest Gemini release has taken over as of this writing.
Amazon and Google combined will have less enterprise LLM usage than OpenAI. 70% ❓We’re honestly not sure about this one; the bet was that OpenAI would maintain its model advantage, which it has — but less than we expected. AWS isn’t breaking out Bedrock revenue for example, so we’re left guessing at best. Our hunch is that this is right, but we’ll stay humble.
At least two GPTs on the OpenAI app store will generate $100K in revenue. 50% ❓(❌) We weren’t able to find any data about GPTs app store revenue, and while OpenAI’s touted reaching 3 million GPTs earlier this year, we think it’s relatively unlikely that any of them are generating significant revenue. We overestimated adoption.
Open-source LLMs
Llama 3 will be released in 2024. 90% ✅
At least 3 open-source LLM companies will raise funding rounds of $100MM or more. 70% ✅
There will be a new open-source LLM release from an established technology company. (Meta + Llama 3 do not count.) 50% ✅ While the definition of open-source LLMs is still contested, we’ll count the open-weight Gemma models from Google as a win, and DBRX fits as well.
No open-source LLM will be within 5% of the quality (ELO) of the top commercial model on the LMSys Leaderboard. 90% ✅ As of this writing, Llama 3.1 is the highest-rated open-source LLM (1269 ELO), which is about 8% lower than the latest Gemini model (1377 ELO).
One open-source foundation model company that has raised at least $50MM as of 12/2023 will close shop or be acquired. 30% ❌ This was an outside chance, and while we did see (strangely structured) acquisitions of foundation model companies like Inflection AI, open-source LLM companies have survived.
No open-source LLM company will reach $20MM in revenue. 70% ❌ Simply put, we were overly negative on the prospects of open-source LLMs. Mistral has blown well past this number. That said, definitions are murky: Is Mistral really still open-source? Do Databricks and Google count? We’ll count this prediction as a general loss.
Other Predictions
There will be fewer dollars invested in AI companies in 2024 than in 2023. 50% ❌ We were curious whether there would be a retreat in investment while investors waited for AI to deliver on its promise. We thought it was a coin flip, but that wasn’t the case: AI investment through Q3 already surpassed 2023 investment.
At least one US government agency will be involved in the building of a publicly available LLM. 50% ❌ Outside of SB 1047 in California, there has generally been much less government attention on AI than we expected in 2024. Perhaps in 2025, the government will partner with X.ai.
Per-token fine-tuning by third-party services will come down by at least 5x, 10x. 90%, 70% ❌ ❌ Fine-tuning hasn’t taken off as much as we expected, and while you can now fine-tune GPT-4, the GPT-3.5 fine-tuning price has been constant. Third party services (e.g., Fireworks AI, Together AI) have been relatively constant on pricing. Wins have come from new models rather than reduced cost for existing models.
Llama 3 will have multimodal capabilities. 70% ✅
Anthropic will release a model with multimodal capabilities. 70% ✅ Being generous to ourselves, we’re giving ourselves a win here because Claude does support image inputs, but it’s borderline. We would’ve expected to see Anthropic release an image model this year, but it seems like they’ve been laser-focused on improving the UX around Claude.
Top posts of 2024
Our top 5 posts of 2024 (ranked by views) are an interesting bunch. Our top post for this year wasn’t one that we expected — it was one that we wrote as an afterthought. Posts 2-5 were ones that we expected to generate more interest, but none of them came close to being as popular as our top post. Funny how that works!
Throw more AI at your problems (11.3k views). Compound AI has become a popular buzzword, and it’s clear that using more than one LLM is the best way to build great AI applications.
You can build a moat with AI (6.3k views). As powerful as LLMs are, everything comes down to the data — you will build the best AI applications by understanding your data.
A theory of the AI market (3.5k views). A bigger picture post about where we see opportunity and over-investment in AI.
Your AI strategy is a waste of time (2.6k views). We’ll admit this is a clickbait-y title, but we’ve also seen a lot of handwringing over AI with comparatively little action. We think we need less talking and more doing.
An introduction to evaluating LLMs (2.5k views). LLM evals were a hot topic early in the year, but everyone’s settled back into vibes-based evals.
Lessons Learned in 2024
We’ve done our best to share our thoughts with you as we’ve been learning, but it’s always helpful to step back and summarize. Here are the top 3 things we learned in 2024.
AI companies have to focus on delivering value. You can read our many posts on the AI market from this past year, but it’s crystal clear to us that you have to focus on a concrete business win from AI — saving money, increasing revenue, and so on. Less tangible wins — like developer productivity or better performance — simply aren’t going to cut it. Our bias is that means selling into job functions, which we believe we’ll see much more of in 2025.
Customer trust matters more than anything. This is obviously true for any product, but AI skepticism is still high while customer requirements are murky. We spent the early part of 2024 trying to build evaluation tools to make trust-building more empirical, but we’ve realized the best strategy is to get customers’ hands on the product and to let them go through their own process (i.e., vibes). Evaluation frameworks are too difficult to understand, and forcing the issue spooks customers. To be fair, we’ve also rarely looked at a TPC-C benchmark when deciding what database to use.
AI UX is a critical — and unsolved. We’ve spent a lot of time innovating on new AI features in 2024, and we’ve found every time that we need to match AI innovation with UX innovation. What’s worked in the past doesn’t necessarily apply with AI, so you need to be thinking, listening, and learning as quickly as you can. Good AI with a bad UX isn’t going to get you anywhere. A short corollary to this is that you should be thinking about where your AI’s limits are and how it hands off seamlessly to human coworkers.
That’s it for us in 2024, folks! Thanks again for reading and for your support. We’ll be back in a couple weeks with a look ahead to 2025. Happy holidays!