One LLM won't rule them all

And why that's good for open source

and

Nov 02, 2023

The chips in most devices — microwaves, kettles, even your Apple Watch — aren’t general-purpose CPUs. They are application-specific integrated circuits (ASICs). These chips are smaller & less powerful than the CPU in your laptop or phone, and they’re capable of only one thing like running a timer or connecting to a cell network. Thankfully, your refrigerator doesn’t need to run a web browser or Slack (yet!), so this lack of flexibility is perfectly reasonable. In exchange for this specificity, these chips consume less energy and are cheaper to manufacture.

The same pattern will emerges with LLMs. Rather than having one, all-powerful LLM, we’ll instead see a proliferation of smaller, application-specific LLMs.

Current bleeding-edge LLMs are extremely good — and will get better — at general-purpose tasks. Models like GPT-4 can answer questions about historical counterfactuals, generate accurate code snippets, or create recipes from scratch. They’re extremely general-purpose because they’ve been trained on a large portion of the text on the internet. Correspondingly, they’re incredibly expensive to build and quite slow (and costly) to use. Generating a simple answer on GPT-4 can take 10s of seconds, and as our team at RunLLM has learned, it’s very easy to run up bills of $10s or even $100s for relatively simple tasks.

In the same way you don’t need the latest Apple Silicon M-series chip in your fridge, you don’t need the most powerful LLM in the world to automate simple tasks. For the sake of this post, we’ll call the GPT-style, general-purpose models GLMs (general language models). Following the nomenclature from chip design, we’ll call the emerging category of smaller, domain-specific models ASLMs (application-specific language models). We won’t spend much time on GLMs here because they’ve been discussed to death; instead, we’ll focus on the what, why, & how of ASLMs.

What is ASLM?

Conceptually, it’s quite simple: It’s a model that doesn’t need the generality and flexibility of GPT-style GLM and can instead be optimized for a single, specific task. There are many examples of ASLMs already out there — models trained from scratch like BloombergGPT and fine-tunes of existing models like Gorilla both count. The most well-known example is probably Codex, the model behind GitHub Copilot. We believe we will see many more ASLMs in the coming months, as more teams learn how to build effective fine-tuned models — both on top of OpenAI’s base and with open-source LLMs. It’s early, but we’ve heard rumors that some of the leading fine-tuning services (e.g., Together AI) are doing quite well — this reinforces our hypothesis around the emergence of more ASLMs. (We’re working on a suite of ASLMs ourselves, so stay tuned!)

Why ASLMs over GLMs?

This is perhaps an obvious point, but it’s important. Continuing with the ASIC analogy, to compensate for the lack of generality, ASLMs should be smaller, cheaper, and faster than a GLM would be. They can achieve GLM-level (or better!) performance on the task they are optimized on, and users don’t need to incur the dollar cost or performance hit of the larger models. For example, a fine-tuning run on OpenAI to customize GPT-3.5 can be as cheap as a few dollars, and inference cost is about 2-3x less than GPT-4. As applications mature, ASLM builders will create the classic virtuous cycle around better data and improving models.

How do we build ASLMs?

While there are examples of ASLMs trained from scratch, the easiest path forward for most teams will be to fine-tune an existing language model. Currently, the cheapest and easiest way to do that is to use OpenAI’s fine-tuning APIs, but we hope that changes soon. OpenAI is likely still charging a (large!) margin on these models, and this is where the open-source community can shine.

In all honesty, this is where the ASLM argument is weakest now. Yes, GPT-3.5 fine-tuned is cheaper than GPT-4, but not by the 10 or 100x margin that’s possible. For ASLMs to take off, the base LLMs they’re built on need to get smaller, faster, and cheaper.

We’ve been beating this drum for a few weeks now: We strongly believe open-source models should focus on getting smaller & cheaper to run. This will allow them to be the base for others to build & run ASLMs for much, much cheaper then OpenAI currently allows.

If this plays out as we suggest, open-source LLMs will flourish. Each open-source model will create its own niche, and the community will have a clear incentive to provide feedback and enable iteration.

If you’re working on reducing LLM size and increasing efficiency, drop us a line! We’d be quite interested in learning more.

The AI Frontier

Discussion about this post