The classic debate between open-source vs. proprietary software is relevant yet again with LLMs. Proprietary models like GPT and Claude are significantly easier to use and have quality in their favor, but skeptics of data centralization and enterprises with stringent data security standards, as always, prefer open-source models. So, who will win?
Open-source LLMs are getting better, but the race against proprietary model providers is likely going to be impossible to win — OpenAI, Google, and the likes have a nearly insurmountable advantage. Instead, open-source models can set themselves up for success by becoming smaller & cheaper with the same quality, enabling users to fine-tune and customize them more easily. Let’s dive into why.
The open-source LLM ecosystem has grown significantly this year. HuggingFace, the posterchild of open-source machine learning recently raised money at a $4B valuation, and companies like Mistral, Cohere, and Mosaic (now Databricks) continue to push the boundaries on open-source LLMs — not to mention Meta AI. These companies are culturally committed to the success of open-source models and have built brands around pushing forward the open-source community rather than maintaining proprietary technology.
As a result, open-source models have made impressive strides this year; Joey’s group at Berkeley built the Vicuna model earlier this year, which by some measures is still the top quality OSS model, and more recent efforts like LLaMa 2 and Mistral-7B reflect significant time, money, and talent invested as well. These models are very high quality in the grand scheme of things.
Despite these advances, proprietary models have a huge lead. According to the LMSys leaderboard, no open-source model has bested the quality of GPT 3.5, and the gap between GPT 4 and GPT-3.5 is quite large.
Looking ahead, Anthropic CEO Dario Amodei recently predicted that we will have models with $10B+ of investment by 2025. That’s a mind-boggling number (> 10% of the rumored enterprise value of OpenAI!), and it reflects the overwhelming resources the leading LLM providers have at hand and their belief in their ability to continue to improve models with those resources. The internet has already proven once that gathering user data is virtuous cycle that’s difficult to beat.
If that prediction turns out to be even 10% true, a $1B model will be something that most companies can only even dream of building. So what hope does the open-source community have?
If the goal is to build general-purpose models that match OpenAI’s quality, the answer is not much. If anything, the gap between open and closed models will only grow.
Realistically, that’s not the only path forward for OSS models. As the LLM space matures, there will likely be a suite of models that don’t need to have the general-purpose reasoning capabilities of GPT-4 (or better). There are plenty of tasks that can be accomplished by fine-tuning a smaller, lighter-weight LLM with some domain-specific examples. This will be where open-source models thrive. Of course, enterprises with strict data security requirements might also adopt open-source LLMs, but given every enterprise is moving to the cloud already, we believe Azure, GCP, and AWS (likely in that order) are best-suited to solve security & privacy challenges.
Of course, OpenAI already allows fine-tuning GPT-3.5 at a surprisingly low cost (more on this soon!), but open-source models are better suited to fight the cost battle instead of the quality battle. Our prediction is that in the near future, many open-source models will switch from increasing in size to maintaining quality while reducing size and increasing efficiency. If models can provide GPT 3.5-level capabilities while being an order of magnitude smaller and cheaper, that will trigger a wave of fine-tuning and customization throughout the industry.
Coming from Berkeley, we’re big fans of open source. However, open-source models are at a significant disadvantage to proprietary models due to the sheer scale of investment, but there’s still a critical path forward — fine-tuned open-source models that excel at particular tasks by sacrificing general-purpose reasoning and broad multi-domain knowledge. In this world, models will need to get smaller and more efficient rather than larger and more general.
Great perspective on Open Source LLMs. I am curious on the basis of this order "we believe Azure, GCP, and AWS (likely in that order)". Microsoft/Azure seem to have the most security incidents - and recently have been called out as well.