Skip to content

Feature: --workspace-scope so conductor honors scoped applyTo in .github/instructions/ #231

@franklixuefei

Description

@franklixuefei

Problem summary

Conductor v0.1.17's --workspace-instructions discovers .github/instructions/**/*.instructions.md but filters them through _is_always_on_instructions_file, which loads ONLY files with applyTo: "**" (src/conductor/config/instructions.py:145-166). Files with any scoped applyTo glob — **/*.cs, tests/**, src/**, services/foo/**, etc. — are silently skipped.

GitHub Copilot's documented applyTo semantic is per-path scoping: a file with applyTo: "**/*.cs" applies whenever any C# file is involved; one with applyTo: "services/foo/**" applies to that service's tree. These are correct uses of the convention, not misuse. Conductor's all-or-nothing filter doesn't honor them — it treats anything other than the literal "**" as "skip unconditionally," which excludes the bulk of real-world per-area instructions.

The gap is structural to the filter and applies to any repo that uses applyTo for scoping — single-service flat repos with per-language or per-area scoping, library repos splitting src/ vs examples/ vs docs/, and monorepos with per-service scoping. Monorepos are the most extreme manifestation because they typically have the most instruction files and the strictest service boundaries, but they're not the only affected case.

The conductor team explicitly anticipated the simpler form of this gap in #168:

"Some users may misunderstand the convention and create .github/instructions/foo.instructions.md without applyTo expecting it to be always-on. The conductor design correctly reflects the convention's documented semantic; if user feedback shows widespread misunderstanding, a future PR can add an explicit --include-unscoped-instructions flag."

This issue is the user-feedback signal — with a stronger ask. The legitimate case is not misunderstanding (which --include-unscoped-instructions would address) — it's correct use of scoped applyTo where conductor has no path-aware mechanism to honor it for in-scope files. The proposed fix is --workspace-scope <path> (preserves applyTo fidelity); --include-unscoped-instructions (the simpler form #168 named) is documented as a minimum interim alternative.

Evidence (ground-truthed against upstream/main @ efa520f, conductor v0.1.17)

Where the filter lives

src/conductor/config/instructions.py:

# L177
CONVENTIONS: list[Convention] = [
    ConventionFile("AGENTS.md"),
    ConventionFile(".github/copilot-instructions.md"),
    ConventionFile("CLAUDE.md"),
    ConventionDirectory(
        path=".github/instructions",
        pattern="*.instructions.md",
        include_file=_is_always_on_instructions_file,   # ← the filter
        recursive=True,
    ),
]

# L145
def _is_always_on_instructions_file(path: Path) -> bool:
    fm = _parse_frontmatter(path)
    if fm is None:
        return False
    return fm.get("applyTo") == "**"   # L166 — exact-equality "**" only

Real-world data (Azure Chaos Studio)

Truth-grounded inventory of .github/instructions/*.instructions.md files in a real Azure repo. The repo happens to be a monorepo (5 service pillars: ARM Gateway, Backend, Action Providers, Agent Services, Portal Extension), but most of the applyTo globs are language- or area-scoped rather than service-scoped — i.e., the same pattern that appears in single-service repos:

File applyTo Loaded by conductor today?
arm-rpc-guidelines.instructions.md services/GW/**;services/BE/**
csharp-coding-standards.instructions.md **/*.cs
eng_ms.instructions.md /docs/eng.ms/*.*; /docs/eng.ms/**/*.*
engineering-standards.instructions.md **/*.cs,**/*.csproj,**/Directory.Packages.props,**/global.json,**/*.bicep,**/*.md
poc-warning.instructions.md (uses non-standard glob: field, no applyTo)
portal-ux-patterns.instructions.md **/portal-extension/**,**/portal-ux/**,**/ux-accelerator/**
specs-arm-api.instructions.md docs/specs/arm-api-published/*.*; docs/specs/arm-api-published/**/*.*
testing-standards.instructions.md **/*Tests.cs,**/*Tests/**,**/*UnitTests*/**,**/*HermeticTests*/**
workload-identity.instructions.md **/kubectl/*.yaml,**/kubectl/*.yml,**/identity.bicep,**/ev2/**

Total: 52 KB / ~13K tokens. 0 / 9 files loaded today. All 9 have substantive description frontmatter and contain real engineering guidance (ARM RPC patterns, async/await rules, testing categories, AKS workload identity setup, etc.).

These globs are not misuse of applyTo — they correctly express the convention's per-path scoping. Promoting them all to applyTo: "**" would falsify the convention's intent and inject (e.g.) AKS workload-identity guidance into agents working on portal-extension code, or kubectl YAML rules into agents writing eng.ms docs.

Repro

mkdir repro && cd repro
git init -q
mkdir -p .github/instructions src tests

# Convention file — gets loaded (control)
echo "Repo-wide rule." > .github/copilot-instructions.md

# Scoped instructions — none get loaded, despite all using applyTo correctly
cat > .github/instructions/csharp.instructions.md <<'EOF'
---
description: C# coding standards
applyTo: "**/*.cs"
---
Use file-scoped namespaces.
EOF

cat > .github/instructions/testing.instructions.md <<'EOF'
---
description: Test-specific patterns
applyTo: "tests/**"
---
Tests must use the Arrange-Act-Assert pattern.
EOF

# Create a trivial workflow
cat > wf.yaml <<'EOF'
workflow:
  name: repro
  entry_point: agent_a
agents:
  - name: agent_a
    prompt: "Echo your workspace_instructions."
    routes:
      - to: $end
EOF

conductor run --workspace-instructions wf.yaml

Expected: agent prompt's <workspace_instructions> block contains content from csharp.instructions.md and testing.instructions.md when working in a context that touches .cs files or tests/ paths.
Actual: block contains only copilot-instructions.md. Both scoped files are silently skipped, with no caller-side way to opt them in based on actual scope.

Note that this is a single-service flat repo, not a monorepo. The same gap manifests there because **/*.cs and tests/** are legitimate scoped uses of applyTo — not just services/foo/**-style monorepo paths.

Proposed fixes

Listed in order of recommendation. Option B (--workspace-scope) is the architecturally correct fix and the recommendation of this issue. Option A (--include-unscoped-instructions) is a simpler interim that #168 already named; it ships faster but surrenders scoping fidelity. Option A alone is not the long-term answer.

Option B — --workspace-scope <glob> (repeatable; recommended)

Add a repeatable CLI flag declaring "this agent run is operating in these scope(s)." Filter becomes: include if applyTo == "**" OR if applyTo's glob intersects any --workspace-scope glob.

conductor run --workspace-instructions \
  --workspace-scope 'services/api/**' \
  --workspace-scope '**/*.cs' \
  wf.yaml
  • Implementation: new repeatable CLI flag, glob-intersection helper (most globs are conjunctive; precise intersection is undecidable in general but a "shares-any-literal-prefix-or-overlaps" approximation is sufficient), pass through to a scope-aware predicate.
  • Pros:
    • Preserves convention semantics — applyTo continues to mean "applies to matching paths"; conductor honors it, just per-run-scope instead of per-Chat-file.
    • Preamble bloat scales with actual scope, not with total instruction inventory.
    • Convention-agnostic and forward-compatible — when .cursor/rules/, .clinerules/, or any future convention with scoping frontmatter is registered, the same flag works with zero caller changes.
    • Workflows that know their scope (octane epic-level Files Affected, skyship stack/fix-review CWD, pr-orchestrator's target_branch diff) can pass it; workflows that don't (open-ended planning) pass nothing and fall back to current strict behavior.
  • Cons: larger surface than Option A; glob-intersection semantics need careful spec; callers must compute and pass scope (acceptable — they already know it for the workflows where it matters).

Flag composition. --workspace-scope should implicitly enable discovery (i.e., implies --workspace-instructions). Declaring scope without enabling discovery would be a no-op, so requiring both flags would be friction without benefit. Callers may still pass --workspace-instructions explicitly for clarity; it's idempotent. This mirrors GitHub Copilot Chat's mental model — the scope IS the input; filtering is automatic.

Flags Behavior
(none) No workspace discovery
--workspace-instructions Discover; load applyTo: "**" files only (today's behavior)
--workspace-scope <p> Discover; load applyTo: "**" OR globs intersecting <p>
--workspace-instructions --workspace-scope <p> Same as --workspace-scope <p> alone (idempotent)
--workspace-instructions --include-unscoped-instructions (Option A) Discover; load every .instructions.md regardless of applyTo

Option A — --include-unscoped-instructions (minimum interim, anticipated by #168)

Add a CLI boolean. When set, the convention's include_file predicate is overridden to "include any file whose frontmatter parses, regardless of applyTo value." Files with no frontmatter are still skipped (per current behavior).

conductor run --workspace-instructions --include-unscoped-instructions wf.yaml

Rejected alternative — "always include all, let the agent self-filter." Bundles all instruction content into every prompt and relies on LLM-side filtering. Unreliable, expensive, and surrenders the convention's scoping semantic in caller-invisible ways. Option A is the principled "include everything" mechanism (with a flag the user opts into); LLM self-filtering is not.

How Option B compares to the existing --instructions <path> flag

The repeatable --instructions flag (cli/app.py:326-329, instructions.py:410-424) already exists and loads files unconditionally (no applyTo filter). It is the escape hatch — "I know exactly which files I want; ignore the convention." Option B is the principled mechanism — "honor the convention; here's the context it needs."

Axis --instructions <path> (today) --workspace-scope <glob> (Option B)
What you declare Specific files to load What scope your agent run is in
Who decides relevance Caller (must enumerate files) Conductor (consults applyTo per file)
Convention semantics Bypassed — files loaded unconditionally Honored — applyTo glob respected for matching scope
Forward compat with new conventions None — every new convention requires caller-side changes to know its layout Convention-agnostic — declare scope once, conductor consults every registered convention
Per-call cardinality N flags (one per file) Small number (one per service/area touched)
Caller knowledge required File inventory + per-file relevance The scope the work is in (typically 1–2 globs)

They are complementary, not redundant: --instructions remains as the escape hatch for "force-load these exact files"; Option B fills the missing primitive "tell conductor where the agent is operating so it can honor applyTo correctly."

Note: a caller-side workaround that globs .github/instructions/*.instructions.md and emits one --instructions <path> per file does work today (zero-conductor-change). But it reimplements conductor's own discovery walk in caller code, bypasses applyTo semantics, has the same all-or-nothing tradeoff as Option A, and won't generalize when conductor registers .cursor/rules/ or other conventions. It should not become the long-term pattern.

Integration sketches (concrete consumers for Option B)

To show that --workspace-scope isn't an abstract proposal: every Microsoft scenario package that launches conductor today already has the scope information it needs. Below are concrete integrations for azure-core/octane's scenarios. The same pattern applies to skyship's launchers, ATS-Copilot's pr-reviewer, and any other scope-aware caller.

Semantic recap

--workspace-scope takes the file paths or directories the agent will operate on, not abstract globs. Conductor path-matches each instruction file's applyTo against that list and loads matches. Callers pass what they already know: their changeset, their target file, their epic's Files Affected list. No new user input required.

Sketches below pass both --workspace-instructions and --workspace-scope for clarity. Per the Flag composition table above, --workspace-scope alone is equivalent — --workspace-instructions is idempotent when scope is set.

Sketch 1 — pr-orchestrator (scope derived from existing git changeset)

Timing. pr-orchestrator operates on changes the user has already made locally (or on an already-open PR in later phases). At the moment run-phases.py invokes conductor, git diff target_branch..HEAD always returns a non-empty changeset — Phase 1 reads the dev-branch diff before the PR exists, Phases 2–5 read the PR's diff after creation. The agent never creates a changeset from nothing; it operates on what's already there. So scope is derivable from git state, pre-launch.

Today's central launcher (artifacts/scenarios/pr-orchestrator/skills/deterministic-scripts/scripts/run-phases.py):

cmd = ["conductor", "run", str(workflow_path)]
# ... inputs, --log-file, --no-interactive ...
cmd.append("--workspace-instructions")
cmd.extend(conductor_flags)

With Option B — add one helper and one loop:

def _derive_workspace_scopes(repo_root: Path, target_branch: str) -> list[str]:
    """Return changed file paths for conductor's --workspace-scope.

    Uses `git diff --name-only <merge-base>..HEAD` so conductor's applyTo
    filter loads only the .instructions.md files whose scope intersects
    the PR's actual changeset. Returns [] on any git failure — conductor
    falls back to applyTo: "**" only (current behavior).
    """
    try:
        merge_base = subprocess.run(
            ["git", "merge-base", f"origin/{target_branch}", "HEAD"],
            cwd=repo_root, capture_output=True, text=True, check=True,
        ).stdout.strip()
        result = subprocess.run(
            ["git", "diff", "--name-only", f"{merge_base}..HEAD"],
            cwd=repo_root, capture_output=True, text=True, check=True,
        )
        return [ln.strip() for ln in result.stdout.splitlines() if ln.strip()]
    except subprocess.CalledProcessError:
        return []

# In run_conductor():
cmd.append("--workspace-instructions")
for path in _derive_workspace_scopes(repo_root, target_branch):
    cmd.extend(["--workspace-scope", path])
cmd.extend(conductor_flags)

Worked example. A user has already modified 5 files under services/GW/ locally and invokes octane-pr-orchestrator. The launcher computes the diff pre-launch and emits:

conductor run --workspace-instructions \
  --workspace-scope services/GW/src/Controllers/CapabilityController.cs \
  --workspace-scope services/GW/src/Repositories/CapabilityRepository.cs \
  --workspace-scope services/GW/tests/CapabilityControllerTests.cs \
  --workspace-scope services/GW/src/Mappers/CapabilityMapper.cs \
  --workspace-scope services/GW/src/Models/Capability.cs \
  ...

Conductor's applyTo-aware filter then loads 4 of the 9 instruction files:

File applyTo Loaded? Reasoning
arm-rpc-guidelines services/GW/**;services/BE/** ✅ matches services/GW/src/Controllers/...
csharp-coding-standards **/*.cs ✅ matches every .cs file
engineering-standards **/*.cs,...,**/*.md ✅ matches every .cs file
testing-standards **/*Tests.cs,**/*Tests/**,... ✅ matches CapabilityControllerTests.cs
portal-ux-patterns **/portal-extension/**,... ❌ no overlap
workload-identity **/kubectl/*.yaml,... ❌ no overlap
specs-arm-api docs/specs/arm-api-published/*.* ❌ no overlap
eng_ms /docs/eng.ms/*.* ❌ no overlap
poc-warning (no applyTo) ❌ skipped (would need --include-unscoped-instructions)

Preamble grows by ~30 KB (4 relevant files), not 52 KB (all 9). Agents working on portal-extension PRs stay clean. Agents working on the GW service get exactly the standards they need.

Sketch 2 — octane-workflow-implement (scope declared in the plan doc)

Timing — different from Sketch 1. Unlike pr-orchestrator, the implement agent is creating the files; they don't exist in git when conductor launches. But the plan document (authored by octane-workflow-plan in a prior step) declares which files each epic will touch, in a Files Affected section:

## Epic 2 — Add validation middleware

**Files Affected:**
- `src/api/middleware/validation.ts` (new)
- `src/api/routes/users.ts` (modify)
- `tests/api/middleware/validation.test.ts` (new)
- `docs/api/validation.md` (new)

**Tasks:**
1. Define `ValidationConfig` type
2. ...

The launcher reads this section pre-launch and emits one --workspace-scope per declared path:

def _derive_implement_scopes(plan_path: Path, epic_id: str) -> list[str]:
    """Read the plan doc, locate the specified epic's `Files Affected` section,
    return the listed file paths for conductor's --workspace-scope.

    Returns [] if the epic or its Files Affected section can't be found —
    conductor falls back to applyTo: "**" only.
    """
    plan = plan_path.read_text(encoding="utf-8")
    epic_pattern = rf"##\s+Epic\s+{re.escape(epic_id)}\b.*?(?=##\s+Epic|\Z)"
    epic_match = re.search(epic_pattern, plan, re.DOTALL)
    if not epic_match:
        return []
    files_section = re.search(
        r"\*\*Files Affected:\*\*\s*\n((?:\s*-\s+.*\n?)+)",
        epic_match.group(0),
    )
    if not files_section:
        return []
    return [
        re.sub(r"\s*\(.*\)\s*$", "", line.strip("- ").strip()).strip("`")
        for line in files_section.group(1).splitlines()
        if line.strip().startswith("-")
    ]

# In the implement launcher:
cmd.append("--workspace-instructions")
for path in _derive_implement_scopes(plan_path, epic_id):
    cmd.extend(["--workspace-scope", path])
cmd.extend(conductor_flags)

For the example epic above, this produces:

conductor run --workspace-instructions \
  --workspace-scope src/api/middleware/validation.ts \
  --workspace-scope src/api/routes/users.ts \
  --workspace-scope tests/api/middleware/validation.test.ts \
  --workspace-scope docs/api/validation.md \
  implement.yaml --input epic=2

The scope here is human-authored intent from the plan, not git output. This is also the natural enforcement point for the plan's "scope containment" discipline that implement.yaml's review prompt already articulates: the implement agent operates within instructions scoped to the files it's allowed to touch.

What if the agent extends beyond declared scope mid-run? Two observations: (1) most applyTo globs are broad (**/*.ts, tests/**, etc.) so they continue to match nearby files even if the agent adds one; only narrow service-prefix globs would miss. (2) Pre-launch scope is a property of all conductor instruction-loading mechanisms — --workspace-instructions, --instructions, and workflow.instructions: are all computed once at launch. --workspace-scope inherits this constraint; it doesn't introduce it. Dynamic per-step reload would be a separate feature, out of scope here.

End-to-end: what happens when a user invokes /octane-workflow-implement

Unlike pr-orchestrator, octane-implement is SKILL.md-driven (no central Python launcher). The user invokes the slash command, and a Copilot CLI agent reads SKILL.md and executes the steps. So the integration is at the SKILL.md instruction level plus a small helper script in the skill's scripts/ directory.

SKILL.md Quick Reference change (3 lines before → 4 lines after):

- conductor --silent run --workspace-instructions assets/implement.yaml \
-   --input plan="feature.plan.md" --input epic="EPIC-001" --web-bg
+ # Pre-launch: derive scope from the plan doc's `Files Affected` section(s)
+ $scopeArgs = & python scripts/derive-implement-scope.py feature.plan.md EPIC-001
+ conductor --silent run --workspace-instructions $scopeArgs assets/implement.yaml \
+   --input plan="feature.plan.md" --input epic="EPIC-001" --web-bg

$scopeArgs splats as multiple args in PowerShell; bash uses array expansion or xargs. The helper script (scripts/derive-implement-scope.py, ~30 LoC) is the standalone form of the helper function shown earlier — it prints --workspace-scope <path> pairs to stdout, one per file in the epic's Files Affected section.

Runtime timeline:

Step Who What
1 User Types /octane-workflow-implement feature.plan.md EPIC-001
2 Copilot CLI agent Reads octane-workflow-implement/SKILL.md, follows Quick Reference
3 Copilot CLI agent Runs python scripts/derive-implement-scope.py feature.plan.md EPIC-001; captures stdout
4 Helper script Parses plan doc, locates epic, extracts Files Affected paths, prints --workspace-scope <path> per file
5 Copilot CLI agent Invokes conductor run --workspace-instructions --workspace-scope <path> ... assets/implement.yaml --input plan=... --input epic=EPIC-001 --web-bg
6 Conductor Walks CONVENTIONS per today, but now tests each scoped applyTo glob against the --workspace-scope paths; includes matches
7 Conductor Prepends <workspace_instructions> block (with scoped matches) to every agent prompt in the workflow
8 Conductor agents (epic_selector, coder, epic_reviewer, committer, plan_reviewer, fixer) Implement the epic with full awareness of repo-specific standards for the files in scope

Fallback behavior — strictly additive. If the helper script fails for any reason — plan path doesn't exist, plan lacks Files Affected sections, epic not found, etc. — it prints nothing, the conductor invocation has no --workspace-scope args, and conductor falls back to today's applyTo: "**"-only behavior. If the user omits epic (run all incomplete epics), the script unions scope across every epic's Files Affected — coarser than per-epic, but still better than today. No regression in any case.

Sketch 3 — open-discovery workflows (no scope, intentional)

For workflows whose job is to identify scope (e.g. octane-workflow-plan, open-ended research), the launcher passes no --workspace-scope. Conductor falls back to applyTo: "**" only. This is the right semantic — planning agents should NOT see per-service standards until the plan crystallizes which areas are affected.

Map across other launchers

Scenario Scope source Pattern
pr-orchestrator git diff --name-only vs target_branch Sketch 1
octane-workflow-implement Epic's Files Affected from plan doc Sketch 2
octane-workflow-plan None (open discovery) Sketch 3
doc-reviewer workflow.input.doc_path Single --workspace-scope $doc_path
gatekeeper / replay Same as pr-orchestrator (diff-based) Sketch 1
self-serve-bugfix Bug target path/area (existing input) One or more --workspace-scope per bug-target path
skyship stack/fix-review Branch being processed, CWD One --workspace-scope per changed-file or working dir
ATS-Copilot pr-reviewer PR diff (when CWD-matches-PR check passes) Sketch 1

Two integration principles

  1. Scope is information callers already have. None of these integrations require new user input — every workflow either has a changeset, an explicit path input, or genuinely doesn't have scope. Callers pass what they know.
  2. Empty scope is a valid degraded mode = today's behavior. Adoption is incremental: each launcher opts in at its own pace without affecting others.

SKILL.md guidance for manual invocations

For scenarios where the user invokes conductor run directly from SKILL.md docs, the docs add a single line: "if your workflow targets a specific area, add --workspace-scope <path> so conductor loads scoped instructions for that area." Users whose repo has no scoped instructions ignore it; users with scoped instructions adopt it as needed.

Affected users

Anyone using --workspace-instructions against a repo whose .github/instructions/ files use scoped applyTo globs. This is not monorepo-specific — affected categories include single-service flat repos (**/*.cs, tests/**), library repos (src/ vs examples/ vs docs/), framework-scoped repos (**/*.tsx, scripts/**), and monorepos with per-service scoping (the most acute case).

Direct evidence: Azure Chaos Studio repo (9 files, 0 loaded today; 6/9 globs are language- or area-scoped and would appear identically in non-monorepo repos). Copilot Chat's docs explicitly recommend applyTo for per-area scoping in any repo, so this pattern is conventional, not anomalous.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions