Skip to content

Latest commit

 

History

History
1089 lines (787 loc) · 33.2 KB

File metadata and controls

1089 lines (787 loc) · 33.2 KB
theme default
title Agentic Engineering Workshop
info First full Slidev implementation artifact synthesized from the reconciled workshop plan and drafted component artifacts.
colorSchema dark
highlighter shiki
lineNumbers true
mdc true
monaco true
aspectRatio 16/9
fonts
sans mono
Inter
JetBrains Mono

Agentic Engineering Workshop

Context Engineering + Harness EngineeringAgentic Workflows

  • For developers moving from autocomplete to reliable agent systems
  • Live demos: context before/after, Speckit SDD, MAF HR onboarding
  • Format: Slidev, dark theme, Mermaid-ready, GIF fallbacks where needed

[Visual: GitHub-dark cover with the three pillars color-coded]


Slido Poll 1 — Where are you today?

  • I only use autocomplete
  • I use chat regularly
  • I have tried agent mode a little
  • I use AI agents weekly
  • I build custom workflows already

[Visual: Slido join QR code + room code]


The Evolution Ladder

flowchart LR
  A[Manual coding] --> B[Autocomplete]
  B --> C[Chat]
  C --> D[Agent mode]
  D --> E[Multi-agent]
  E --> F[Agentic workflows]
Loading
  • Autocomplete predicts the next line
  • Agent mode tries to complete the job
  • This workshop is about climbing from useful assistance to reliable systems

Why this matters now

  • 20M+ Copilot users; 77K+ enterprise organizations
  • Copilot now writes ~46% of active developers' code
  • 84% of professional developers use AI tools daily
  • Adoption is no longer the bottleneck; reliability is
  • Code is cheap now; quality is still expensive once you include tests, review, and explainability

[Visual: staircase infographic with a moving “you are here” marker]


What is Agentic Engineering?

  • Agent = software that lets an LLM run tools in a loop to achieve a goal
  • Coding agent = that loop plus repo access, code edits, and execution
  • Agentic engineering = disciplined, reviewable engineering amplified by that loop
  • Context Engineering = give the agent the right knowledge
  • Harness Engineering = keep the loop on track with constraints, feedback, and cleanup
  • Agentic Workflows = apply both to real delivery loops in code or business operations

Vibe coding vs agentic engineering

Vibe coding Agentic engineering
Prompt into existence Direct a tool-using loop toward a goal
One-shot output Loop: research → plan → execute → verify
Prototype-grade confidence Reviewable specs, tests, and notes
"Seems fine" "I reviewed it, ran it, and can explain it"
Hard to maintain Easy to rerun, review, and improve

The conceptual framework

flowchart TD
  CE[Context Engineering] --> AW[Agentic Workflows]
  HE[Harness Engineering] --> AW
  BO[Built-in orchestration] --> AW
  CO[Custom orchestration] --> AW
  AW --> ENG[Engineering delivery]
  AW --> BIZ[Business workflows]
Loading
  • Better context improves decisions
  • Better harnesses improve reliability
  • Orchestration decides how work moves

Glossary checkpoint

Term Plain-English meaning
Agent An LLM using tools in a loop to achieve a goal
Vibe coding Prototype-first prompting with light review rigor
Context window The model's working memory for this turn
Skill A reusable training manual loaded on relevance
Instruction Always-on guidance or policy
MCP server A tool/capability provider for the agent
Subagent A specialist worker with its own context window
HITL Human-in-the-loop approval or review checkpoint
Cognitive debt Working code the team can no longer confidently explain
Agentic workflow A repeatable process executed by agents

Context Engineering: the working definition

  • Anything outside the agent's context effectively does not exist
  • Context Engineering = curate, structure, and inject the right knowledge at the right time
  • Best context is often a reusable proof: starter repos, specs, successful diffs, screenshots
  • Hoard things you know how to do so agents can recombine working examples instead of guessing from theory

Context anti-patterns

Anti-pattern Failure mode Better move
Context dump token waste, noisy output progressive disclosure
Context starvation generic, convention-breaking output repo instructions
No reusable proofs agent guesses from theory hoard examples + starter repos
Stale context agents follow old patterns freshness checks + reviews
Tool sprawl capabilities crowd out reasoning add MCP intentionally

Copilot's layered context model

flowchart TB
  S[System + safety] --> R[Repo instructions]
  R --> P[Path instructions]
  P --> A[Selected agent / prompt]
  A --> K[Skills loaded on relevance]
  K --> C[Current files + diagnostics + terminal]
  C --> LLM[Model context window]
Loading
  • Skills are lazy-loaded expertise
  • MCP adds capabilities through connected tools

Context precedence and runtime behavior

  • Personal / user instructions > repository instructions > organization rules
  • On-demand: prompts, agents, skills
  • Automatic: current file, open tabs, diagnostics, terminal output
  • Compaction kicks in near a full context window; older turns are summarized
  • Practical rule: use skills for knowledge, MCP for capabilities

Demo 1 — Same prompt, weak context

  • Prompt: “Create a user registration endpoint with password hashing and validation.”
  • No project instructions
  • Typical failure modes:
    • generic naming and folder placement
    • weak hashing or missing validation
    • no ProblemDetails / inconsistent error handling

[GIF: Context Engineering before/after — generic output]


Demo 1 — Same prompt, upgraded context

  • Add repo instructions: BCrypt, Clean Architecture, ProblemDetails, FluentValidation, CancellationToken
  • Re-run the exact same prompt
  • Output now follows house style, safer defaults, and reviewable conventions
  • Teaching beat: same prompt, same model, different context

[GIF: Context Engineering before/after — improved output + fallback screenshot]


Slido Poll 2 — Which context injection method surprised you?

  • repository instructions
  • path-specific instructions
  • skills
  • MCP servers
  • editor / terminal context

[Visual: Slido QR code + short debrief prompt]


Token economics: skills vs MCP

Use it for Skills MCP
Purpose expertise / procedure external capabilities / tools
Loading model lazy eager
Scale signal ~1,000 skills ≈ 15K tokens 10 tools ≈ 50K tokens
Best use “How should it do this?” “What can it access or act on?”
  • Top workshop MCP examples: GitHub, Context7, Playwright, Chroma, Filesystem
  • Rule of thumb: prefer skills by default; add MCP intentionally

Harness Engineering: why it exists

  • Writing code is cheap now; correctness, reviewability, and changeability still cost real effort
  • Great prompts are not enough for repeated, high-stakes work
  • Harness Engineering = the scaffolding around the agent:
    • context sources
    • architectural constraints
    • feedback loops
    • entropy management
  • When the agent struggles, inspect the environment before blaming the model

The three harness components

Component What it does Concrete example
Context Gives the agent the right map repo docs, specs, logs, browser state
Constraints Enforces safe boundaries layering tests, lint rules, approval gates
Garbage collection Reduces drift and entropy cleanup agents, stale-doc sweeps, dead-code checks

Planner → Generator → Evaluator beats self-critique

  • Separate planning, generation, and evaluation when quality matters
  • First run the tests before editing; it loads repo reality into the loop
  • Use red/green TDD when you need a real regression guard
  • Use deterministic validators where you can
  • Janitorial / cleanup agents are part of the system, not an afterthought
  • Reliability comes from feedback loops, not hero prompts

[Visual: Planner → Generator → Evaluator + cleanup loop]


Copilot customizations mapped to the harness

  • Context: .github/copilot-instructions.md, *.instructions.md, skills, prompts
  • Constraints: tests, linters, CI, hooks, approval modes
  • Garbage collection: review agents, scheduled cleanup, drift detection, doc freshness checks
  • Recommended order:
    1. context first
    2. constraints second
    3. cleanup automation last

Harness maturity scorecard

  • Score each 0–2:
    1. Repository knowledge
    2. Documentation structure
    3. Architectural constraints
    4. Application legibility
    5. Feedback loops
    6. CI/CD gates
  • ≤5 = early stage
  • 6–8 = developing
  • 9–12 = mature

[Visual: one-page scorecard handout preview]


Break — 10 minutes

  • Resume at: [insert local time]
  • Drop questions in chat while we reset demo windows
  • Next up: GitHub Copilot deep dive, SDD, and custom orchestration

The Copilot ecosystem

Product Interface Best for
Copilot in VS Code IDE interactive coding, visual edits
Copilot CLI terminal automation, shell-heavy tasks
Copilot SDK application code embed Copilot into custom apps
Copilot in GitHub.com web PRs, issues, review
Copilot Mobile phone quick review and Q&A
Copilot Coding Agent GitHub Actions issue-to-PR automation
  • Learn the model once; apply it across surfaces

Copilot customization primitives

Primitive Purpose Analogy
Instructions always-on rules house rules on the wall
Prompts reusable task templates fill-in-the-blank form
Agents persona + tools + workflow specialist teammate
Skills lazy-loaded expertise training manual
Plugins installable bundles app install
Hooks lifecycle triggers automated security camera

Choosing the right primitive

  • Use instructions when the rule should apply every time
  • Use a prompt when the task is repeatable but user-triggered
  • Use an agent when persona, tool access, or orchestration matters
  • Use a skill when domain knowledge should load only on demand
  • Use a plugin when multiple components must ship together
  • Use hooks when governance should happen automatically

CLI vs VS Code: same goal, different runtime

Runtime Native context Best at
VS Code open files, diagnostics, editor UI inspect, edit, review
Copilot CLI shell, filesystem, session workspace automate, script, batch, analyze
  • The right question is not “Which runtime wins?”
  • The right question is “Which runtime fits this task shape?”

Portability rules across CLI and VS Code

Primitive Portability
Skills ✅ best option for cross-runtime sharing
Instructions ⚠️ content shareable, delivery differs
Hooks ⚠️ mostly shareable, payload details differ
Agents ❌ runtime-specific wrappers required
Prompts / toolsets ❌ VS Code-centric
Plugins ↔ same concept, built per runtime
  • Write expertise as skills if you want it to survive runtime changes

Skills: the most portable primitive

  • Same folder-based artifact in both runtimes: skills/<name>/SKILL.md
  • Lazy-loaded on relevance, not always-on
  • Great for reusable procedures: commits, release flows, framework playbooks
  • Keep SKILL.md lean; push detail into references/, scripts/, assets/

[Visual: skill folder anatomy]


Ralph v2: one case study, all six primitives

Primitive Ralph v2 usage
Instructions shared workflow logic
Agents 6 roles × runtime variants
Skills planning, signals, knowledge, session ops
Plugins runtime-scoped bundles
Hooks session lifecycle automation
Prompts VS Code orchestration shortcuts
  • Ralph matters because it proves the primitives compose

Ralph v2 as context + harness engineering

  • 6-agent flow: Orchestrator → Planner → Questioner → Executor → Reviewer → Librarian
  • Shared instructions provide durable workflow logic
  • Skills inject context only when needed
  • A deterministic state machine provides ownership boundaries and quality gates
  • Session artifacts make the workflow legible and auditable

[Visual: role diagram + state strip]


Ralph v2 runtime variants

  • VS Code variants use @AgentName and agents: references
  • CLI variants dispatch with task("ResolvedAgentName", ...)
  • Both variants embed the same shared instruction files at build time
  • Result: one behavior source of truth, two thin runtime wrappers
  • .plugin-managed keeps CLI distribution plugin-owned

Built-in agents in Copilot CLI

Agent Best for Avoid when
explore codebase research you already know the exact file
task tests, builds, installs you need every log line inline
general-purpose complex multi-step work a narrower worker will do
code-review high-signal bug/security review you want style feedback
  • Start fresh sessions with git status or git log -5 so the agent sees recent reality
  • Use Git as both context loader and safety net: commits, branches, reflog, diff review

VS Code capabilities + the blended workflow

  • VS Code shines when visual editing, diagnostics, and inline review matter
  • Key built-ins to name: Agent Mode, Fleet Mode, Code Review, Sessions View, Handoffs
  • CLI shines when the task is shell-heavy, automation-heavy, or batch-oriented
  • Expert practice is blended:
    • VS Code = where you see
    • CLI = where you orchestrate

Pragmatic adoption ladder

  • Beginner: use built-in agent mode for bounded tasks
  • Intermediate: add repo instructions, a small skill catalog, maybe one hook
  • Advanced: build custom agents, package plugins, and mine session history
  • Earn complexity from repeated pain, not from excitement

Slido Poll 3 — Which primitive will you try first?

  • instructions
  • skills
  • prompts
  • custom agents
  • hooks / plugins

[Visual: Slido QR code + “why this one?” follow-up]


Built-in orchestration: the agent loop

flowchart LR
  P[Prompt] --> PL[Plan]
  PL --> T[Tool call]
  T --> E[Execute]
  E --> O[Observe]
  O --> I[Iterate]
  I --> T
Loading
  • Built-in agents are not one-shot generators
  • This is the literal meaning of "agents run tools in a loop to achieve a goal"
  • They already research, act, inspect, and retry

Copilot built-in orchestration modes

Mode What it gives you
Interactive human approval on each step
Autopilot end-to-end agent execution
Plan mode review the plan before action
Fleet mode parallel subagent execution
  • Start interactive; graduate to more autonomy as trust rises
  • Use fleet/subagents when work is independent or likely to burn root context; don't split tasks just because you can

Claude Code comparison: same loop, different defaults

Copilot Claude Code
GitHub ecosystem + multi-surface terminal-first heavy agenting
Fleet + GitHub-native workflows custom subagents, hooks, checkpointing
flat subscription tiers per-plan / per-token style tradeoffs
great daily velocity great deep terminal workflows
  • Many advanced users combine both instead of choosing one forever

When built-in orchestration is enough

  • A single codebase, standard tools, and a human close to the loop
  • Great for test generation, refactors, bug fixes, docs, and code review
  • Move to custom orchestration when you need:
    • external business systems
    • durable queues / retries / state
    • multi-provider routing
    • custom approval branches

Why Spec-Driven Development matters

  • Fast AI code generation is useful, but vague prompting creates ambiguity and rework
  • SDD makes the spec the source of truth for the agent
  • Desired flow: idea → spec → plan → tasks → implementation → verification
  • This is the discipline layer between “wow” and “reliable”

What SDD changes

  • Specs become durable context for the agent
  • Plans, tasks, and tests become the harness around execution
  • Human review happens at phase boundaries
  • Benefits: consistency, auditability, easier maintenance when requirements change

Speckit teaching workflow: six reviewable phases

Step Command Reviewable artifact
1 constitution principles / constraints
2 specify requirements + edge cases
3 plan architecture + data model
4 tasks dependency-ordered work
5 implement code per approved tasks
6 analyze consistency / gap report
  • Teaching simplification: today Speckit also surfaces clarify, checklist, and taskstoissues

Live demo setup: safest path and caveats

  • Safest live surface: VS Code Copilot Chat
  • In Copilot CLI, treat Speckit as custom-agent invocation, not guaranteed slash-command parity
  • Prefer the pinned init flow over an unknown global install
uvx --from git+https://github.com/github/spec-kit.git@v0.5.0 specify init --here --ai copilot --script ps --no-git --force
  • CLI fallback paths: /agent → choose speckit.specify, or copilot --agent=speckit.specify --prompt "..."

Demo part 1 — Constitution + Specify

  • Scenario: task-management REST API with ASP.NET Core Minimal APIs
  • Constitution locks stack and non-negotiables:
    • C# 14, .NET 9, xUnit, EF Core, SQLite
  • Specify captures real requirements:
    • CRUD
    • status filter
    • due date must be future
    • soft delete
    • 404 behavior

[GIF: Speckit constitution + specify walkthrough]


Demo part 2 — Plan + Tasks

  • plan turns the spec into architecture, data model, contracts, and quickstart
  • tasks breaks work into atomic, ordered steps
  • Example tasks:
    1. scaffold project
    2. create Task entity + DbContext
    3. add DTOs and mapping
    4. implement CRUD + status filter
    5. add validation and tests
  • This is the key review gate before code generation

Demo part 3 — Implement + Verify

  • First run the existing tests so the agent sees repo reality before editing
  • implement works task-by-task instead of taking one speculative leap
  • Use red/green TDD for new behavior: fail first, then fix
  • Verification remains explicit:
dotnet build && dotnet test
  • Live tip: show only the first 2–3 tasks; keep the rest as recorded backup

Speckit in context: OpenSpec is the safer fallback

  • Speckit = GitHub's general-purpose SDD toolkit
  • OpenSpec = this workspace's production SDD implementation
  • Same philosophy: specs → plan → tasks → implement → validate
  • Workshop fallback path:
    • @openspec-explore (optional)
    • @openspec-propose
    • @openspec-apply-change
  • Teaching line: same shape, safer local fallback

Built-in vs custom orchestration

Built-in orchestration Custom orchestration
already inside coding agents you design the workflow graph
fast to adopt more setup effort
great for dev tasks in one environment great for cross-system, durable workflows
limited provider / control boundaries full routing, approval, and integration control
  • Built-in is for using agents well
  • Custom is for building agent systems

Microsoft Agent Framework at a glance

  • Unified .NET framework for building AI agents and multi-agent workflows
  • Key building blocks:
    • agents
    • workflow graphs
    • custom executors / middleware
  • Patterns: sequential, concurrent, conditional, deterministic
  • Enterprise-friendly: telemetry, type safety, CI/CD, provider flexibility

MAF architecture patterns

flowchart LR
  A[Agent node] --> B[Agent node]
  A --> C[Parallel agent]
  B --> D[Validator]
  C --> D
  D --> E{Human approval?}
  E -->|yes| F[Next step]
  E -->|no| G[Exception queue]
Loading
  • Mix LLM-backed nodes with deterministic nodes
  • Sequence, parallelism, conditions, and approvals are first-class

Provider flexibility: Copilot SDK, Claude, Azure OpenAI

Option Setup Best for
Copilot SDK Copilot CLI + subscription teams already living in GitHub Copilot
Claude SDK API key + Anthropic billing Claude-first reasoning workflows
Azure/OpenAI direct endpoint + API key full control, residency, cost tuning
  • Core message: the workflow code can stay the same while the provider changes

Code walkthrough: one AsAIAgent() surface

AIAgent copilot = copilotClient.AsAIAgent(instructions: "...");
AIAgent claude = anthropicClient.AsAIAgent(model: "claude-sonnet-4-6", instructions: "...");
AIAgent azure = openAiClient.GetChatClient("gpt-4o-mini").AsAIAgent(instructions: "...");
  • Same orchestration API
  • Different model/provider behind the node
  • Swap the brain without rewriting the workflow

Demo 3 architecture — HR onboarding workflow

flowchart TD
  F[New-hire form / HRIS event] --> I[Intake Agent\nCopilot SDK]
  I --> P[Provisioning Agent\nClaude]
  P --> N[Notification Agent\nAzure OpenAI]
  N --> V[Deterministic Validator]
  V --> H{Human approval}
  H -->|approve| O[Checklist + email draft + tasks]
  H -->|reject / missing data| Q[Exception queue]
Loading
  • 3 providers, 1 deterministic node, 1 explicit HITL checkpoint

Demo 3 walkthrough — where the human approves

  • Intake extracts role, department, manager, start date, location
  • Provisioning creates equipment, access, and training checklist
  • Notification drafts welcome email and orientation schedule
  • Validator checks completeness and policy violations
  • Human approves before tickets, messages, or calendar events are created

[GIF: HR onboarding workflow run + MP4 fallback]


What is an agentic workflow?

  • A structured, repeatable, agent-executed process for a specific domain
  • It combines:
    • context = domain knowledge
    • harness = constraints + checks + cleanup
    • orchestration = how work moves
  • Same pattern works in software, HR, sales, marketing, and ops
  • If the result ships but nobody can explain it later, the workflow is creating cognitive debt

The R-P-E-V loop

flowchart LR
  R[Research] --> P[Plan]
  P --> E[Execute]
  E --> V[Verify]
  V --> R
Loading
  • Research gives context
  • Plan shapes the harness
  • Execute uses built-in or custom orchestration
  • Verify closes the loop
  • After verify, ask for a walkthrough, diagram, or interactive explanation so understanding compounds too

Ralph v2 as a software engineering workflow

  • Requirement → discovery → plan → execute → review → knowledge extraction → iterate
  • 6 agents split responsibilities instead of overloading one giant worker
  • Context Engineering: skills, instructions, session knowledge
  • Harness Engineering: signal protocols, review checklists, atomic commits
  • Orchestration: deterministic delegation with isolated contexts

Ralph state machine + knowledge loop

flowchart LR
  A[Initialize] --> B[Planning]
  B --> C[Batching]
  C --> D[Execute batch]
  D --> E[Review batch]
  E --> F[Knowledge extract]
  F --> G[Iteration review]
  G --> H[Complete]
  G --> I[Critique]
  I --> B
Loading
  • Knowledge is promoted after review, not dumped into the system by default

Engineering use cases: Module 9 anchors

  • 3 live GIF anchors
    • multi-service auth refactoring
    • CI/CD diagnosis and fix
    • test coverage automation
  • Reference earlier modules instead of replaying them
    • spec-driven development
    • Ralph v2 / multi-agent orchestration
  • P1/P2 backups
    • documentation validation, browser testing, migrations, plugins, diagrams, atomic commits

[Visual: use-case map by impact vs demoability]


Use case — Multi-service auth refactoring

  • Problem: security logic drifts across services
  • Agentic pattern: discover auth seams → refactor shared patterns → verify across services
  • Context: repo auth conventions, service maps, framework docs
  • Harness: contract tests, auth regression suite, CI gates
  • Monday morning takeaway: AI is safest when the repo already teaches the target pattern

[GIF: multi-file auth refactor, ~15s]


Use case — CI/CD diagnosis and fix

  • Problem: failed pipelines block delivery and create waiting time
  • Agentic pattern: read logs → patch workflow/script → rerun until green
  • Context: workflow YAML, command conventions, failure history
  • Harness: required checks, rerun loop, diff review, branch protection
  • Monday morning takeaway: agents are debuggers and operators, not just code generators

[GIF: red ❌ to green ✅ pipeline, ~20s]


Use case — Test coverage automation

  • Problem: teams know the gaps but rarely schedule the work
  • Agentic pattern: find weakly tested modules → generate tests → fix failures → report delta
  • Context: test style, fixtures, examples of good assertions
  • Harness: coverage threshold, failing-test loop, reviewer checks for meaningful tests
  • Monday morning takeaway: harnesses turn “write tests” into a measurable outcome

[GIF: coverage jump highlight, ~15s]


More engineering cases to steal from

Beginner / quick win Intermediate / team value Advanced / system value
atomic Git commits issue-to-PR feature delivery spec-driven development
architecture diagrams docs validation Ralph v2 orchestration
visual slide decks browser testing legacy migration
plugin scaffolding database query automation cross-service refactors

Non-engineering workflows: same pattern, new domain

Discipline Engineering example Non-engineering equivalent
Context Engineering repo docs, tickets, logs policies, templates, CRM schemas
Harness Engineering tests, CI, linters approvals, compliance, audit trails
Orchestration agent loops, subagents handoffs, queues, event-driven workflows
  • The domain changes; the mental model does not

Non-engineering use case — Google Workspace executive ops

  • One operator asks an agent to prepare tomorrow's schedule, draft replies, refresh a sheet, and organize files
  • Context: executive preferences, calendar norms, Drive taxonomy, email tone
  • Harness: dry-run mode, approval before send/edit, narrow permissions, action logs
  • Best implementation path: coding agent + skills/tools
  • Why it matters: easiest example of repurposing a coding agent for business work

Non-engineering use cases — Lead processing + RFP automation

  • B2B SaaS lead processing
    • ingest → enrich → score → route → update CRM
    • harness = confidence thresholds, dedupe, manager override, audit trail
  • RFP response automation
    • retrieve approved content → draft → validate → review → approve
    • harness = citations, legal review, locked final approval
  • These are the moments where teams often graduate from prompting to orchestration

Non-engineering use case — HR onboarding is the selected demo

  • Universally relatable; no deep domain setup required
  • Clear agent boundaries: intake, provisioning, notification, validation
  • Shows the full workshop thesis in one compact flow:
    • context = role templates, HR policies, comms templates
    • harness = completeness checks + approvals
    • orchestration = workflow graph across specialists
  • Compact enough to explain and credible enough to matter

When skills are enough vs when orchestration must grow up

Coding agent + skills is enough when... Custom orchestration is better when...
a human operator is present the workflow runs repeatedly or semi-autonomously
actions are reversible actions affect money, access, or compliance
one agent can call a few tools in sequence many handoffs, queues, or approval branches exist
you need a fast pilot you need durable state and governance
  • Copilot Studio is a strong middle layer between prompting and a full custom framework

Slido Poll 4 — Which use case matters most to you?

  • codebase refactoring
  • CI/CD and testing
  • documentation / knowledge work
  • internal business workflow automation
  • customer / support operations

[Visual: Slido QR code + “what would you pilot first?”]


Context anti-patterns and the fix

Anti-pattern What goes wrong Better pattern
giant instruction file token waste, quality drops progressive disclosure
no instructions generic output minimal repo rules
stale docs wrong implementation choices freshness checks
random tool additions active context gets crowded intentional capability design

Harness anti-patterns and the fix

Anti-pattern What goes wrong Better pattern
YOLO coding entropy explosion tests + review gates
blind trust in output subtle bugs, security issues diff review + verification
reviewer disrespect teammates do the first real review review it yourself + include evidence
cognitive debt future changes slow because nobody understands it walkthroughs + diagrams + interactive explanations
no cleanup loop drift accumulates scheduled garbage collection

Enterprise governance essentials

Area What to say briefly
Access control manage licenses and features by team / repo
Policy disable risky modes where needed
Privacy org code and prompts should stay out of training on business tiers
Audit log settings changes and agent activity
Cost watch premium requests, dashboards, and overage risk
Compliance map to SOC / ISO / trust-center evidence
  • Governance model: start interactive, graduate to autonomy as trust rises
  • Team rule: no unreviewed AI PRs — reviewers should never do the first meaningful review of agent-generated code or PR text

Learning roadmap — Context Engineering resources

  • GitHub docs: custom agents configuration
  • VS Code blog: “Context is all you need”
  • Community catalogs: awesome-copilot, Skills Hub, SkillsMP
  • Internal follow-up: publish one repo instruction file and one reusable skill

[Visual: QR code block for curated resource list]


Learning roadmap — Harness + orchestration resources

  • OpenAI: Harness Engineering
  • Anthropic: harness design for long-running apps
  • Martin Fowler: harness engineering analysis
  • Microsoft Agent Framework quickstart + samples
  • GitHub Copilot SDK and Claude Agent SDK docs

[Visual: QR code block for the second resource list]


Take-home scorecard + adoption ladder

  • Use the 6-dimension maturity scorecard to assess your current environment
  • Then act at the smallest useful level:
    • this week: add repository instructions and hoard one working example
    • this month: add one skill, one guardrail, and ask for one walkthrough after a significant change
    • this quarter: build one repeatable workflow
  • Goal: progressive leverage, not instant platform engineering

Slido Poll 5 — What is your #1 takeaway?

  • open text → word cloud
  • Use this as the closing reflection, not as a quiz

[Visual: Slido word-cloud prompt]


Closing call to action

  • Reliable agent outcomes are designed, not wished into existence
  • Start with context
  • Add harness
  • Choose the lightest orchestration that fits the job
  • Then make one workflow real in your own domain

[Visual: Thank-you slide with contact / follow-up placeholders]


Appendix — optional advanced material

  • Safe to skip live if timing is tight
  • Keep these as backup or Q&A slides
  • Focus: Copilot CLI session topology and feedback loops

CLI session topology: the two-layer model

flowchart TB
  W[Layer 1: Working directory\nGit repo, code, tests, .github/hooks] --> S[Layer 2: Session workspace\n~/.copilot/session-state/<uuid>/]
  S --> F[files/ for plans, outputs, shared artifacts]
  S --> E[events.jsonl for canonical event history]
Loading
  • Never confuse product code with runtime/session artifacts
  • Subagents are workers inside one shared session container

Session directory anatomy

~/.copilot/session-state/<uuid>/
├── workspace.yaml
├── events.jsonl
├── checkpoints/
├── rewind-snapshots/
├── files/
│   ├── plan.md
│   ├── research/
│   └── shared artifacts
└── session.db
  • events.jsonl is the ground truth
  • files/ is the right place for session-scoped deliverables like this deck

Session lifecycle + /chronicle feedback loop

flowchart LR
  C[Create session] --> A[Active work]
  A --> P[Checkpoint / compact]
  P --> R[Resume / rewind]
  R --> S[Shutdown]
  S --> CH[/chronicle analytics]
  CH --> I[Better instructions and workflows]
  I --> C
Loading
  • Files are canonical; searchable catalogs are derived views
  • /chronicle standup, tips, improve, and reindex turn session history into better future sessions