How to build a data moat
Why thoughtful product design increases defensibility
A few weeks ago, we established that all AI agent moats are fundamentally driven by the ability to gather more data. Whether you are building an application that is easy to adopt or solving a deeply challenging technical problem, the fundamental question is the same: Can you gather enough proprietary data to make your agent demonstrably better than any generic alternative?
Importantly, the kind of data that you gather matters. In some cases (e.g., Cursor), you’re going to gather a large volume of relatively broadly applicable data – what code changes were accepted or rejected – that allows you to train better models which improve the product for all your customers. In other cases (e.g., RunLLM), the data (experience) you gather will make you stickier within each individual customer over time.
However, simply having an opportunity to create a moat isn’t the same as executing on it. Business history is littered with companies that had an advantage on paper but failed to operationalize it — or worse, squandered it. To avoid that, we need to look at the actual mechanics of how you build a data moat for an AI agent.
Data Moats are Loops
You should think about a data moat as a loop. We saw the first version of this moat play out in consumer products like search and social media. Google started by building a better product, which garnered orders of magnitude more usage than the competition. That usage generated data which helped improve the product, which in turn led to more usage. By the time the competition was able to catch up on quality, it was too late.
Interestingly, there aren’t many examples of this pattern in the previous generation of enterprise SaaS companies – the obvious one being a pure data product like ZoomInfo. That’s where we’re seeing the most interesting changes. In the AI era, the framework is very similar, but the focus has shifted from product quality to agent quality, and the audience has expanded to, well, everyone in the world.
When most people think about improving agent quality, they immediately think of model training. This is certainly one path: We’ve seen Cursor do this recently with their Composer model, which they claim is faster and more accurate for code completion than generic LLMs. This works well when the data you’re gathering is generally applicable. Whether you’re writing Python for work at Salesforce or at home for a side project, the correctness of your code and how well it executes the task you prompted are largely similar. That means that training a better model once improves the product for every single one of your customers in one fell swoop. But while model training can be immensely valuable, it isn’t the only way to close the loop.
At RunLLM, we see that the feedback our agent gathers — both passively and actively — during the process of debugging an incident is a moat in itself. That feedback comes in many forms for an AI SRE: Human actions and feedback matter, but so do the agent’s internal findings themselves. For example, we might learn through repeated exposure that one particular dashboard is a very strong indicator of a service failure we’re debugging. That feedback can absolutely be used to train an expert foundation model over time, but its more immediate value is in guiding the agent’s decision-making in real-time. If the pattern’s repeated 3 times in a row, it should be at the very top of our hypothesis list the 4th time around.
You should be thinking about how to gather data for the long loop (model training) and the short loop (empirical guidance) simultaneously. Given the volume of data that model training requires and the cost and latency of a fine-tuning run, you will be forced weeks or months for improvement if you rely solely on training. In fact, if you implement the short loop effectively, your periodic training runs will become more effective because the quality and precision of your feedback will improve if the agent improves in the short loop.
Loop Density: Frequency and Trust
Defining your loop is step one. Step two is understanding its density — the frequency and reliability with which that loop occurs.
Consider the slide creation paradox we mentioned in the last post. Why hasn’t an AI slide tool established the same data moat as a coding assistant – or even achieved similar quality? At first glance, you would think that coding sounds like a much harder problem than making slides – sure, it’s text only, but the number of degrees of freedom is much higher.
The delta comes down to the unit of work. With Cursor, the unit of work is a few lines of code; with a slide creator, the unit of work has historically been a deck. (It doesn’t have to be this way, but that’s a topic for another post!) The unit of work implies the frequency and the cost of failure of the task. If a coding agent in 2023 gave you a bad code suggestion, you reject it and write the code by hand – annoying, but not the end of the world. You’d try again once you finished this task and moved onto the next one, and if the agent was right, say, 3 out of 5 times, that was probably good enough.
On the other hand, generating a slide is an end-to-end task where the cost of being wrong is high. In the early days of AI slide generators, the first result was often so bad that it felt like it would be easier to do it from scratch rather than trying to fix what was created. That generated plenty of signal about what was bad but relatively little about what was good since users would lose trust and break the loop.
Cursor’s unfair advantage was its ability to be good enough to maintain user trust in the early days while gathering massive volumes of data over the course of a couple years. They earned the right to build a better model later because they had the frequency of use today.
How to Scale Your Loop
If data is your moat, your job is to maximize the amount of high-fidelity data you gather. Relying solely on explicit human feedback (like “accept” or “reject” buttons) is great when possible but is often too slow, depending on the kind of agent you’re building. That means that you need to think about creative approaches to maximize the data that you gather.
At RunLLM, we maximize our loop by generating multiple hypotheses for every issue. For each hypothesis, we check an internal catalog to see which observability tools are relevant to each hypothesis then trigger a separate investigation for each one. Without getting too deep into the weeds of our specific implementation, this means that for every incident, we increase the chance that we identify the correct root cause quickly but also gather lots of data about what was not relevant.
This extra data is incredibly valuable. Tactically, it helps us understand things like which systems are likely to be associated with symptoms, which data streams are prone to anomalies, and how to refine our hypothesis formulation for the next issue. More broadly, this negative signal is incredibly cheap for us to gather. An engineer is likely going to be lazy about giving precise feedback, but our agent can teach itself both negative and positive signal by observing the outcomes of its own actions.
While there’s no simple answer, here are some of the things you should be asking yourself as you think about your own data loops:
Frequency: Does this event happen a thousand times a day (like a code edit) or once a year (like a rocket launch)?
Feedback: Does feedback about what was good or bad come naturally from using the app, or does a person have to go out of their way to tell you whether you were right or wrong?
Ground Truth: Will you be able to easily determine what the ultimate correct answer was, or will that take specific expertise that’s rare and expensive?
As a rough hypothesis, we think you need frequency and either feedback or ground truth. Gathering lots of data without a way to analyze that data and use it to make progress is not likely to be useful.
Wrapping Up
No business is foolproof. Whatever moat you think you have, you will have to defend it tooth and nail – especially in AI markets with unbelievable amounts of competition. As the technology evolves, the opportunities for disruption will only increase. However, a moat built on process data and experience data is the hardest to recreate. In the same way that no one truly recreated Facebook’s social graph in the 2010s, it is unlikely that anyone will easily recreate the data gathering model that Cursor has established.
Realistically, there will be a next thing that creates a subsequent opportunity for disruption. No one was able to recreate Facebook’s social graph, but by the 2020s, the social graph stopped mattering – TikTok and Instagram’s algorithms became the stars of the show. But while you have the moat you have to make the most of it. As long as agent quality remains the dominant factor in enterprise decision-making, your data loop is your strongest defense.



