Skip to content

collinear-ai/spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

422 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spider

Lightweight on/off-policy distillation engine with a single client interface. Runnable in minimal lines of code.

spider supports two types of jobs:

  • Off-policy distillation, which is to create a dataset with model rollouts to conduct SFT.

    • This script demonstrates how to generate a single-turn instruction dataset with custom processors to create prompt variations.
    • This script demonstrates how to generate a multi-turn user-simulated trajectory dataset, where an LLM is configured to play the role of a user to ask follow-up questions.
    • This script demonstrates how to generated a tool-enabled trajectory dataset calling tools automatically parsed from real-world MCP servers, where the agent can directly update external databases in a sandbox with a user-simulation model in the loop.
  • On-policy distillation, which is to create a training run with online supervision from a teacher model.

    • This script demonstrates how to train on-policy with any teacher model with a different tokenizer, ensuring the correct chat template is used by both models.
    • This script demonstrates how to train on-policy with a specified set of tools so that the teacher can supervise the student's tool executions for multiple turns directly in a sandbox.
    • This script demonstrates how to train on-policy for SWE-agent tasks, executing tool trajectories in concurrent docker environments following standard agent scaffolds.

Highlighted features of the engine includes:

  • Plug-and-play with any tool definitions and custom pre/post-filtering functions. The user only needs to pass a fixed template of tool and filter definitions to the client. The client will recursively parse and package referenced modules, and the server will spin up a sandbox with dependencies to run the generation.

  • On-policy distillation with any chosen model. The backend will realign tokenization differences between student and teacher models to ensure the KL divergence loss is correct.

Install

# install & launch a server
pip install -e .[server]
python -m uvicorn server.app:app --host 0.0.0.0 --port 9000
# install a client
pip install -e .[client] # (available on cpu machines)

Python client API

The following snippet showcases a complete job cycle for a distillation job.

For complete examples across on/off-policy distillation scenarios, see /scripts (which references config files in /config). Each script is responsible for an independent large-scale distillation run.

from spider.client import SpiderClient
from spider.config import AppConfig

def pre_prcess_row(row) -> Dict[str, ANy]: # custom transform of inputs
  return row

def post_process_row(row) -> Dict[str, Any]: # custom transform of outputs
  return row # or None, if unwanted

def tool_call(arg): # custom tool that will execute in sandbox
  return ""

TOOL_SCHEMA = {}

config = AppConfig.load("config/generate_tool_calls.yaml") # define rollout hyperparams
env = ("HF_TOKEN", "OPENAI_API_KEY") # register env variables (auto fetched from the os environment)

with SpiderClient(
  config=config, 
  env=env,
  pre_processor=pre_process_row,
  post_processor=post_process_row
) as client:
    client.add_tool( # add tool
      description="",
      json_schema=TOOL_SCHEMA,
      func=tool_call,
    )

    submission = client.submit_job()
    job_id = submission["job_id"]

    # pool_job streams the distillation process back to client
    status = client.poll_job(job_id, interval=5.0, wait_for_completion=True)

    if status["status"] == "completed":
        client.download_result(job_id, destionation="./artifacts/result.json") # return full data with metadata, optionally upload to HF

Config Snapshot

The following is the config file for an off-policy distillation job, enabling multi-turn user simulations.

server:
  base_url: http://127.0.0.1:9000
job:
  model: 
    provider: vllm
    name: "openai/gpt-oss-120b"
    parameters:
      tensor_parallel_size: 8
      gpu_memory_utilization: 0.85
  source:
    dataset: "hotpotqa/hotpot_qa"
    config_name: "fullwiki"
    split: "train"
    max_examples: 100
    multi_turn: true
    user_simulation_prompt: You are a user prompt generator.
    user_model:
      name: "gpt-5-nano-2025-08-07"
      provider: openai
  generation:
    max_turns: 4
    parameters:
      temperature: 0.7
      top_p: 0.9
      max_tokens: 16384
  output:
    mode: "upload_hf"
    hf:
      repo_id: collinear-ai/spider-openqa-hotpot-gptoss-samples
      private: true

The following is the config file for an on-policy distillation job.

server:
  base_url: http://127.0.0.1:9000
  request_timeout: 120
job:
  model:
    provider: tinker
    name: "Qwen/Qwen3-8B"
  source:
    dataset: nvidia/Nemotron-RL-knowledge-web_search-mcqa
    split: train[0:128]
  generation:
    on_policy: true
    parameters:
      top_p: 0.9
      max_tokens: 32768
      tool_choice: auto
    on_policy_options:
      teacher: moonshotai/Kimi-K2-Thinking
      learning_rate: 1e-7
      groups_per_batch: 64
      lora_rank: 16
      num_substeps: 1
      kl_penalty_coef: 1.0
      kl_discount_factor: 0.0
      loss_fn: importance_sampling
      save_every: 20
  output:
    mode: upload_hf
    hf:
      repo_id: collinear-ai/spider-on-policy-tool-search-qwen-teacher-kimi-k2
      repo_type: model

About

Streamline on-policy/off-policy distillation workflows in a few lines of code

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages