Qwen 3 and doubling down on open-source AI

Cutting-edge open-weight models are here to stay

and

May 08, 2025

Qwen 3’s release last week is a clear sign that open-source LLMs are here to stay and will push the bounds of model capabilities. There’s been plenty of analysis of the model itself and its score on evaluations — so as usual, we’ll stay away from reacting to the models themselves. Looking at everything that’s been written so far, it’s safe to say that these are high-quality models.

Qwen being the second model — after DeepSeek — to move the needle with an open-weights release this year leaves us with plenty to think about at the intersection of LLMs, product building, and business decisions. Let’s dive in!

Open models are a boon for application builders

We said this after the DeepSeek R1 release earlier this year, but it’s worth repeating: The release of state-of-the-art LLMs with open weights is a boon for application builders. While the leading models — at least according to ChatBot Arena Elo — are still proprietary, we now have credible alternatives to those leading models that are open-weight and achieve comparable quality. That has two critical downstream benefits for application builders.

First, it applies downward pressure to the whole market. Qwen 3’s 600M model is now comparable in quality with other 3B parameter models, and Qwen3-235B has a imilar Elo to o1. As the models themselves become more efficient, inference gets cheaper, which encourages all model providers to build smaller and more efficient models. As we’ll discuss below, this also creates opportunities for more innovation in inference systems, which further increases the potential for efficient inference. All that means that we, as application builders, can throw more AI at our problems for less and less cost. Our hypothesis is that more AI will make applications better and better.

Second, this increases the deployment flexibility for startups. As an early-stage startup, we regularly get beaten up by large companies about whether they can give us access to proprietary data. We’re quickly approaching a state — perhaps we’ve already reached it — where we can credibly construct a version of our product that uses open-source models exclusively, so we can run in any public cloud account that our customers’ provide us. That’s a huge unlock in our ability to distribute our product.

Decoupling innovation in systems and AI

One of the most exciting implications of more open models being released is that we can decouple innovation in the models themselves from innovation in how those models are served. The increasing popularity of projects like vLLM and SGLang demonstrates exactly this: These projects are able to make LLM inference dramatically more efficient without having to change the models themselves. While having control over the full stack always has value for infrastructure, distributed innovation typically means that everyone moves faster.

Given that much of the innovation at the serving layer is also happening in open source — even companies like Nvidia are now releasing open-source serving frameworks — that further strengthens our belief that we’ll soon be able to deploy applications like RunLLM efficiently in a variety of environments. Today, that primarily means in customers’ cloud accounts, but that could eventually even be in more security sensitive environments (e.g., on-prem or airgapped infrastructure) as well.

American open-source AI is behind

It hasn’t escaped everyone’s notice that this is now the second impressive open-source LLM that has come from a Chinese company, which adds to the general sense of disappointment that came from the underwhelming Llama 4 release last month. The fact that Meta is still the only company in the west that is building credible open-source models — and with a strange & restrictive license at that — is disappointing. All the other players have switched away from open models or quietly dropped their model building efforts.

While the open nature of these models reduces the concerns about security, it’s generally bad for the ecosystem if models built in the US are closed-only. The obvious concern is whether models taken off the shelf from potentially hostile governments might not have been exposed to a diversity of data. Longer term, we want to get into the habit of building models out in the open — in a way that promotes community enforcement of standards, oversight, and awareness. Open models from American companies will also help promote collaboration with researchers without concerns about interference from foreign governments.

On a related note, our friend Nathan Lambert made a related argument about the perceived and real perils of adopting open-weight models from China this week.

More options are good (if you know how to use them)

For all the reasons we outlined above, having more options for high-quality, reliable language models is a good thing. However, every model release now comes with a plethora of configurations — different sizes, base models, instruction-tuned models, reasoning-enabled models, sparse and dense models, and so on. What that means is that as an application builder, you have to know how to pick the right model for your task. Blindly picking a model and dumping it into your application probably won’t have the intended effects.

We’ve discussed evaluation frameworks quite a bit previously, and we’re increasingly convinced that general-purpose evaluation frameworks are only useful as directional guides. For a more specific analysis of how a new model will improve your application you need an evaluation framework specific to your application. We’ve been procrastinating on building this at RunLLM because so much of our performance is dependent on the data customers give us, but we’ve finally decided to bite the bullet. If you’re trying to keep up with everything going on with new model release, you’re probably going to need to do the same thing.

There may be a day where new model releases become like iPhone launches — something we all get excited about but are ultimately disappointed by because the incremental improvements are somewhat meaningless. (We’ve already seen that with launches like Llama 4.) For now, there’s still a lot to learn from every incremental model release, especially when those releases enable new classes of use cases that weren’t possible before.

Open models are at the forefront of things that are exciting for us, and Qwen 3 shows us that DeepSeek wasn’t a blip on the radar. Open-weight frontier models (with permissive licenses) are here to stay, and we need to work them into our view of the world. That means more open-weight models — ideally from US companies! — and more flexibility for application builders. We’re particularly excited about the rumors that OpenAI may be finally living up to their name and jumping into this arena!

The AI Frontier

Discussion about this post