Train Agent Skills Like Neural Networks (Without Touching the Weights)
Ever wonder if you could "train" an AI agent to get better at doing things, without actually updating its underlying model weights? That's exactly what Microsoft's new open source project, SkillOpt, is doing.
It's a clever twist on agent optimization. Instead of fine-tuning a language model, you optimize the sequence of skills or tools the agent uses to complete a task. Think of it as gradient descent for your agent's tool chain.
What It Does
SkillOpt takes a pretrained agent (like a language model with access to tools) and frames its skill selection process as a trainable policy. You define a set of possible skills or API calls the agent can make. Then, instead of backpropagating into the model weights, SkillOpt learns which skills to chain together, in what order, and when to call them to maximize performance on a specific task.
In practice, this means you can take a general purpose agent and "specialize" it for your domain just by optimizing its skill usage. The model weights stay frozen. The skills get smarter.
Why It's Cool
No fine-tuning nightmares. You don't need a GPU cluster or worry about catastrophic forgetting. SkillOpt treats skill selection like a reinforcement learning problem, but it's surprisingly sample efficient.
Interpretable by design. Since the optimization is over discrete tool calls, you can actually see what the agent learned to do. It's not a black box weight update. It's a clear sequence of "first call A, then call B, then use the result for C."
Drastic performance gains. Early benchmarks show that optimizing skill sequences can match or beat fine tuned agents on complex tasks like web navigation, data extraction, and multi step reasoning. All without ever updating a single weight.
Works with any LM. Since you're not touching model weights, you can swap out the underlying model without retraining. Use GPT 4 today, try a smaller open model tomorrow, and keep the skill policy.
How to Try It
Clone the repo and install it:
git clone https://github.com/microsoft/SkillOpt
cd SkillOpt
pip install -r requirements.txt
Then take a look at the examples folder. There's a quickstart notebook that walks you through defining skills for a web agent and running the optimization loop. It runs on CPU just fine for small experiments.
from skillopt import SkillOptimizer optimizer = SkillOptimizer( model="gpt-4", # or any compatible API skills=["search_web", "extract_text", "summarize", "navigate_to"]
)
optimizer.optimize(task="find the latest paper on agentic AI and summarize it")
The library handles the rest: exploration, reward estimation, and policy up