Gorilla: An LLM for Massive APIs
Last week, we interviewed Shishir Patil and Tianjun Zhang about Gorilla, an LLM fine-tuned to generate code using massive APIs. (Disclosure: Shishir and Tianjun are also Joey’s PhD students.) You should, of course, go watch the full interview here, but here are our most interesting takeaways from the conversation.
Fine-tuning with retrieval in mind outperforms basic RAG. Shishir and Tianjun shared a really interesting concept they developed for Gorilla called retrieval-aware fine-tuning, which effectively merges model fine-tuning and retrieval-augmented generation (RAG). The fine-tuning process teaches the model to read documentation more effectively; at inference time, they use traditional retrieval to feed the relevant API documentation into the model. However, since retrieval can be noisy, the model itself decides at inference time whether the selected API is relevant to the question or not. If the results aren’t relevant, it can ask the retriever to try again. Without fine-tuning, a model like GPT achieves 50-60% accuracy, but with fine-tuning the model is better able to discern relevant documentation and get closer to 90% accuracy.
Retrieval is still critical to ensure freshness. Given the high cost of training LLMs, the models itself can’t be updated very frequently, while APIs do change frequently. However, going back to the previous point, the model now has to contend with whether the retriever returned updated API documentation or irrelevant documentation. Impressively, Gorilla’s able to generate a reasonably accurate confidence score for the likelihood of the documentation having been updated, allowing it to stay up to date with documentation without requiring expensive training runs.
There’s a lot of nuance to fine-tuning. This is a bit of a catch-all, but there were a number of interesting takeaways from the conversation:
Full supervised fine-tuning outperforms LoRA. We covered the Low-Rank Adaptation (LoRA) technique last week; it’s been incredibly popular for fine-tuning, but Shishir and Tianjun shared that they found regular fine-tuning where all weights are updated outperforms LoRA, while of course being more expensive than LoRA.
The costs aren’t quite as ridiculous as we imagined. The fine-tuning process varies based on which model you’re training, but it roughly takes 8 A100s less than 10 hours to fine-tune a Gorilla model. The cost for a single fine-tuning run (assuming nothing goes wrong of course!) is less than $1k.
Fine-tuning requires data engineering. What doesn’t boil down to data engineering nowadays?! Jokes aside, the process of getting the API documentation, generating instruction labels, and collating the data is as much about the right formats as it is about the training process itself.
As always, there are many more interesting ideas in the full interview, so you should check it out. Let us know who else you’d like us to interview!