Joey and our co-founder, Chenggang, gave a talk at Data Council last week titled AGI is already here, but it’s not what you think. The theme of the talk was very aligned with what we talk about around here, and originally, we were hoping to share the talk recording with you all this week, but the recordings aren’t available yet. We still think the core ideas are interesting and still worth sharing. When our friends at Data Council post the recordings, we’ll of course share.
We’ve previously made the argument that we’ve already achieved AGI. We won’t repeat the whole argument here, but to briefly recap, let’s look at artificial intelligence and how it generalizes. It’s safe to say that we’ve had narrow artificial intelligence — models that are very adept at solving a certain task as well as (or better than) humans — since the advent of deep learning, if not earlier. These models were good at everything from image recognition to playing Go to many of the aspects of driving cars.
Of course, the key thing is that there was a separate model that did each task without being able to generalize to other tasks. The advent of transformers-based models was when generality began to enter the picture, and by the time ChatGPT was released (and with the subsequent addition of native multimodality), it’s very clear we have general models that can do everything from answering questions to summarizing documents to analyzing images. In other words, we have artificial intelligence that’s generally applicable — AGI.
So… why haven’t we solved all of humanity’s problems and achieved a Star Trek-like 25th century utopia? It turns out that AGI on its own isn’t enough — in fact, it’s only just the beginning. In order for AGI to solve real problems for us, we need to do a lot more than train general-purpose models that are capable of many things. What we need to do is to specialize them to particular tasks.
This might be counterintuitive: Wasn’t the whole unlock we discussed going from specific models to general intelligence? It certainly was — but general capabilities doesn’t mean that the same models should be applied to every task. Humans also have general capabilities, and that generality is hugely beneficial, but most of us specialize in certain areas. Someone who’s specialized in being a doctor could later train to also become a software engineer, but they will of course have to learn a new skill. Put another way, the fact that we’ve achieved generality is critical, but the general-purpose skills are building blocks that allow us to specialize more effectively.
How do we use these building blocks? Turns out that’s what we’ve been talking about on this blog for 18+ months now: use data effectively, specialize models, and decompose problems into manageable chunks.
Data use is probably the most important and the least exciting of the three problems. We’ve long been skeptical that long context windows are going to solve all problems in AI. Until we have a different model for attention than what was introduced by the transformers architecture, it simply doesn’t make sense to stuff massive amounts of data into a single LLM call. That means that you need to be thoughtful about what data your retrieve when and how you best use it. In our experience, that means doing the work up front — waiting for the query to do a simple vector search isn’t going to cut. You need to be reading, analyzing, annotating, and indexing your data in a fashion that’s optimized for your application, and you need to tailor your search process accordingly. We’ve spent many engineer years just building data pipeline at RunLLM.
Model specialization via post-training has gone through many cycles over the last 2 years. Fine-tuning was all the rage in 2023 but has lost some of its shininess as companies have realized that fine-tuning takes more than a random dataset that you throw together. However, when done well, specialization is still critical because it allows your model to recognize patterns that weren’t available or visible in the general-purpose training sets used for LLMs. For example, we find that fine-tuning is a critical aspect of RunLLM’s ability to recognize and handle the complex vocabulary that comes from highly technical questions. Both fine-tuning and RLHF fall into this bucket and are critical tools.
Problem decomposition (which is synonymous with compound AI) is the area that’s the most popular right now. If you want to be buzzy, this is how you construct ‘agentic’ systems — you have a set of intelligent calls that reasons over narrower components of a problem in order to construct a more complex solution. This has been critical to our architecture at RunLLM, and it’s an increasingly common approach amongst most high-quality AI systems that we see on the market today. Decomposition not only helps address LLMs’ helpfulness problem, it also helps improve latency and reduce costs because you’re not waiting for one giant reasoning model call to solve all your problems. Decomposition also of course encourages specialization because you can customize models that are experts at very narrow tasks, similar to how humans develop expertise for these tasks based on experience.
The full talk covers how RunLLM operationalizes each one of these aspects, which we’ll skip here to avoid going on for too long. However, in building production-ready AI systems, there’s one more lesson that we’ve found is critical beyond all of the AI-oriented things discussed above. That last lesson is that you still have to do all the boring software engineering and integration aspects of the job.
As much as we like to focus on the cool new AI systems and techniques, if those techniques aren’t integrated into customers’ tooling and workflows in a way that makes them easily consumable, you aren’t going to get very far with your product. Not only do you need to integrate your product into all the right tools, you also need to make sure that you give users control over and visibility into what your system is doing. Without good software and UX discipline, the coolest AI in the world will still fail.
All of this serves to emphasize something that we’ve repeated many times: AI adoption is still very early in the grand scheme of things. The rate of innovation and adoption is exciting — the market has matured much faster than we would have expected this year — but we’re all still very much in the process of figuring out how to build the best AI applications and what customers are looking for when they adopt those applications. Achieving AGI is what unlocked the ability to build these applications, but we still have a lot of hard work to do.
We’ve achieved AGI, but that’s step one of many.
This point resonated a lot with what we have experienced
"... waiting for the query to do a simple vector search isn’t going to cut. You need to be reading, analyzing, annotating, and indexing your data in a fashion that’s optimized for your application, and you need to tailor your search process accordingly...".
And some other great points as well. Thanks