In the 6 or so months we’ve been writing this blog, we’ve alluded to what we’re building at RunLLM from time to time — we’re sure a few of you have looked at our (often outdated, until recently!) website — but we haven’t taken the time to explain what we’re building and why. This is mainly because we’ve been working out kinks in the product and iterating on its form factor with early users. We’re (finally) ready to start taking the wraps off.
Briefly, RunLLM is a domain specific AI-powered assistant for developer-first tools. You can see RunLLM in action live in the MotherDuck and RisingWave community Slack channels, among others. We’re in beta but if you’d like to get access, please reach out!
Okay, let’s dive in!
RunLLM is a custom assistant for developer-first tools that can generate code, answer conceptual questions, and help with debugging. Using fine-tuned LLMs and cutting-edge data augmentation and retrieval techniques, RunLLM learns from data like documentation, guides, and community to help developers navigate your product and its APIs. You can integrate RunLLM via our Slack and Discord bots or our web widget (and there’s more to come beyond chat!). You can see a quick demo of the RunLLM admin ui here:
When you think of LLM-powered developer assistants, your mind probably jumps to GitHub Copilot or RAG + GPT-4 based solutions. Our approach is fundamentally different — rather than relying on generic LLMs and search techniques, we use fine-tuning to build a narrowly tailored expert on your product. If you were lucky enough to be at Data Council this week, Joey spoke about some of the things we’re working on in his keynote with DJ Patil. We’ll share that video when it’s available.
RunLLM is built (and trained!) to be an expert on your product,. For each assistant, we fine-tune a custom LLM that learns the ins & outs of your API and how you intend users to use it. This approach allows us to optimize the assistant narrowly for your use cases and has a few key benefits:
Expertise without hallucination: Because of its built-in expertise, our fine-tuned LLM is able to identify which data sources are relevant to a question more accurately than a generic RAG process. This allows us to confidently ground our responses in your data sources and avoid answering questions we aren’t informed on.
Efficiency via fine-tuning: Training an expert LLM on your API enables us to use smaller models as base models — in turn, this means that we’re able to generate higher-quality results at a lower latency and cost.
Tight feedback loops: Because each assistant has a custom LLM and search index, we’re able to establish a tight feedback loop. Our data sources and models are updated frequently to reflect the feedback — positive and negative — that we’re getting. Having a custom LLM means it can improve to match your preferences over time.
Simplicity and ease-of-use are paramount to us, so we’ve made the process of setting up a new assistant as easy as possible — upload documentation & guides, trigger a fine-tuning job, and integrate RunLLM wherever you work (documentation widget, Slack, Discord, etc.).
The kernel of the idea for RunLLM came from the research group Joey leads in the Sky Lab at UC Berkeley. RunLLM builds on this work, using a fine-tuned LLM with a powerful search index and RAG pipeline that’s optimized for technical products. The result is an assistant that works on a most technical knowledge sources and can handle a wider variety of questions.
Building RunLLM over the last 6 months has taught us a ton about building LLM applications. Many of our posts have been informed by our experiences and sharing the product with customers, but there are a lot more lessons we’ve learned — everything from the tactical (don’t build your application on a single LLM call) to the bigger picture (developers are probably still the best audience for AI tools … for now). We’ll keep sharing our lessons here and pull back the curtain on some of the internals of RunLLM, so stay tuned!
RunLLM is still in private beta, but we’re incredibly excited about where we’re headed. The feedback we’ve gotten from users has indicated the quality of our models is noticeably higher than generic RAG-based solutions, and we have a lot more planned — different output modalities, insights into questions your users are asking, and suggestions for improved documentation. You can see RunLLM in action in some Slack communities: MotherDuck, RisingWave, and SkyPilot — more coming soon!
If you’re interested in getting access to RunLLM, please reach out!