Open LLMs don’t need to beat OpenAI

May 9, 2024

Last fall, we wrote that OpenAI is too cheap to beat. To date, that’s still our most popular post with over 30k views on Substack. With a title like that, it generated the amount of strong opinions you’d generally expect — both agreeing and disagreeing with us. The general gist of that post is that the cost performance tradeoff that OpenAI was offering at the time was as close as to optimal as you were going to get.

Read →

6 Comments

May 9, 2024

This is an awseome post - well thought out. And you are spot on, as dig deep and productionize using Small or Specialized Language models for automating workflows, I clearly see that you do not need a large LLM for everything. I have two questions: 1. I am curious about the claim that Claude 3 Opus is 3x times more expensive then GPT-4. Can you point to any data or source behind that? And 2. You compare the scenario of RAG and Fine-tuning. Are you looking into or evaluating merging models?

Expand full comment

Nathan Lambert

May 9, 2024

Elo being a coin flip really makes me think we don’t know how to generally compare models. When you average over many tasks that are somewhat saturating, much like any AI task, signal saturates.

Expand full comment

Reply (1)

Vikram Sreekanti

May 9, 2024

What do you mean by saturating?

Expand full comment

Reply (1)

Nathan Lambert

May 9, 2024

Two decent LLM answers to an english question are just so similar most of the time. Only specific questions expose weaknesses usually.

Expand full comment

Reply (2)

Vikram Sreekanti

May 9, 2024

This also relates to what we were talking about at the end of the post w.r.t Elo asymptoting. Getting from 1k to 1.1k Elo is probably a function of meeting the baseline expectation. Getting from 1.15k to 1.2k is probably the cream of the crop differentiating itself.

Expand full comment

Vikram Sreekanti

May 9, 2024

Ah, yeah, 100%. The "it looks like an LLM" problem.

Expand full comment

The AI Frontier

Open LLMs don’t need to beat OpenAI