theme
default
title
Agentic Engineering Workshop
info
First full Slidev implementation artifact synthesized from the reconciled workshop plan
and drafted component artifacts.
colorSchema
dark
highlighter
shiki
lineNumbers
true
mdc
true
monaco
true
aspectRatio
16/9
fonts
sans
mono
Inter
JetBrains Mono
Agentic Engineering Workshop
Context Engineering + Harness Engineering → Agentic Workflows
For developers moving from autocomplete to reliable agent systems
Live demos: context before/after, Speckit SDD, MAF HR onboarding
Format: Slidev, dark theme, Mermaid-ready, GIF fallbacks where needed
[Visual: GitHub-dark cover with the three pillars color-coded]
Slido Poll 1 — Where are you today?
I only use autocomplete
I use chat regularly
I have tried agent mode a little
I use AI agents weekly
I build custom workflows already
[Visual: Slido join QR code + room code]
flowchart LR
A[Manual coding] --> B[Autocomplete]
B --> C[Chat]
C --> D[Agent mode]
D --> E[Multi-agent]
E --> F[Agentic workflows]
Loading
Autocomplete predicts the next line
Agent mode tries to complete the job
This workshop is about climbing from useful assistance to reliable systems
20M+ Copilot users; 77K+ enterprise organizations
Copilot now writes ~46% of active developers' code
84% of professional developers use AI tools daily
Adoption is no longer the bottleneck; reliability is
Code is cheap now; quality is still expensive once you include tests, review, and explainability
[Visual: staircase infographic with a moving “you are here” marker]
What is Agentic Engineering?
Agent = software that lets an LLM run tools in a loop to achieve a goal
Coding agent = that loop plus repo access, code edits, and execution
Agentic engineering = disciplined, reviewable engineering amplified by that loop
Context Engineering = give the agent the right knowledge
Harness Engineering = keep the loop on track with constraints, feedback, and cleanup
Agentic Workflows = apply both to real delivery loops in code or business operations
Vibe coding vs agentic engineering
Vibe coding
Agentic engineering
Prompt into existence
Direct a tool-using loop toward a goal
One-shot output
Loop: research → plan → execute → verify
Prototype-grade confidence
Reviewable specs, tests, and notes
"Seems fine"
"I reviewed it, ran it, and can explain it"
Hard to maintain
Easy to rerun, review, and improve
flowchart TD
CE[Context Engineering] --> AW[Agentic Workflows]
HE[Harness Engineering] --> AW
BO[Built-in orchestration] --> AW
CO[Custom orchestration] --> AW
AW --> ENG[Engineering delivery]
AW --> BIZ[Business workflows]
Loading
Better context improves decisions
Better harnesses improve reliability
Orchestration decides how work moves
Term
Plain-English meaning
Agent
An LLM using tools in a loop to achieve a goal
Vibe coding
Prototype-first prompting with light review rigor
Context window
The model's working memory for this turn
Skill
A reusable training manual loaded on relevance
Instruction
Always-on guidance or policy
MCP server
A tool/capability provider for the agent
Subagent
A specialist worker with its own context window
HITL
Human-in-the-loop approval or review checkpoint
Cognitive debt
Working code the team can no longer confidently explain
Agentic workflow
A repeatable process executed by agents
Context Engineering: the working definition
Anything outside the agent's context effectively does not exist
Context Engineering = curate, structure, and inject the right knowledge at the right time
Best context is often a reusable proof: starter repos, specs, successful diffs, screenshots
Hoard things you know how to do so agents can recombine working examples instead of guessing from theory
Anti-pattern
Failure mode
Better move
Context dump
token waste, noisy output
progressive disclosure
Context starvation
generic, convention-breaking output
repo instructions
No reusable proofs
agent guesses from theory
hoard examples + starter repos
Stale context
agents follow old patterns
freshness checks + reviews
Tool sprawl
capabilities crowd out reasoning
add MCP intentionally
Copilot's layered context model
flowchart TB
S[System + safety] --> R[Repo instructions]
R --> P[Path instructions]
P --> A[Selected agent / prompt]
A --> K[Skills loaded on relevance]
K --> C[Current files + diagnostics + terminal]
C --> LLM[Model context window]
Loading
Skills are lazy-loaded expertise
MCP adds capabilities through connected tools
Context precedence and runtime behavior
Personal / user instructions > repository instructions > organization rules
On-demand: prompts, agents, skills
Automatic: current file, open tabs, diagnostics, terminal output
Compaction kicks in near a full context window; older turns are summarized
Practical rule: use skills for knowledge, MCP for capabilities
Demo 1 — Same prompt, weak context
Prompt: “Create a user registration endpoint with password hashing and validation.”
No project instructions
Typical failure modes:
generic naming and folder placement
weak hashing or missing validation
no ProblemDetails / inconsistent error handling
[GIF: Context Engineering before/after — generic output]
Demo 1 — Same prompt, upgraded context
Add repo instructions: BCrypt, Clean Architecture, ProblemDetails, FluentValidation, CancellationToken
Re-run the exact same prompt
Output now follows house style, safer defaults, and reviewable conventions
Teaching beat: same prompt, same model, different context
[GIF: Context Engineering before/after — improved output + fallback screenshot]
Slido Poll 2 — Which context injection method surprised you?
repository instructions
path-specific instructions
skills
MCP servers
editor / terminal context
[Visual: Slido QR code + short debrief prompt]
Token economics: skills vs MCP
Use it for
Skills
MCP
Purpose
expertise / procedure
external capabilities / tools
Loading model
lazy
eager
Scale signal
~1,000 skills ≈ 15K tokens
10 tools ≈ 50K tokens
Best use
“How should it do this?”
“What can it access or act on?”
Top workshop MCP examples: GitHub, Context7, Playwright, Chroma, Filesystem
Rule of thumb: prefer skills by default; add MCP intentionally
Harness Engineering: why it exists
Writing code is cheap now; correctness, reviewability, and changeability still cost real effort
Great prompts are not enough for repeated, high-stakes work
Harness Engineering = the scaffolding around the agent:
context sources
architectural constraints
feedback loops
entropy management
When the agent struggles, inspect the environment before blaming the model
The three harness components
Component
What it does
Concrete example
Context
Gives the agent the right map
repo docs, specs, logs, browser state
Constraints
Enforces safe boundaries
layering tests, lint rules, approval gates
Garbage collection
Reduces drift and entropy
cleanup agents, stale-doc sweeps, dead-code checks
Planner → Generator → Evaluator beats self-critique
Separate planning, generation, and evaluation when quality matters
First run the tests before editing; it loads repo reality into the loop
Use red/green TDD when you need a real regression guard
Use deterministic validators where you can
Janitorial / cleanup agents are part of the system, not an afterthought
Reliability comes from feedback loops, not hero prompts
[Visual: Planner → Generator → Evaluator + cleanup loop]
Copilot customizations mapped to the harness
Context: .github/copilot-instructions.md, *.instructions.md, skills, prompts
Constraints: tests, linters, CI, hooks, approval modes
Garbage collection: review agents, scheduled cleanup, drift detection, doc freshness checks
Recommended order:
context first
constraints second
cleanup automation last
Harness maturity scorecard
Score each 0–2:
Repository knowledge
Documentation structure
Architectural constraints
Application legibility
Feedback loops
CI/CD gates
≤5 = early stage
6–8 = developing
9–12 = mature
[Visual: one-page scorecard handout preview]
Resume at: [insert local time]
Drop questions in chat while we reset demo windows
Next up: GitHub Copilot deep dive, SDD, and custom orchestration
Product
Interface
Best for
Copilot in VS Code
IDE
interactive coding, visual edits
Copilot CLI
terminal
automation, shell-heavy tasks
Copilot SDK
application code
embed Copilot into custom apps
Copilot in GitHub.com
web
PRs, issues, review
Copilot Mobile
phone
quick review and Q&A
Copilot Coding Agent
GitHub Actions
issue-to-PR automation
Learn the model once; apply it across surfaces
Copilot customization primitives
Primitive
Purpose
Analogy
Instructions
always-on rules
house rules on the wall
Prompts
reusable task templates
fill-in-the-blank form
Agents
persona + tools + workflow
specialist teammate
Skills
lazy-loaded expertise
training manual
Plugins
installable bundles
app install
Hooks
lifecycle triggers
automated security camera
Choosing the right primitive
Use instructions when the rule should apply every time
Use a prompt when the task is repeatable but user-triggered
Use an agent when persona, tool access, or orchestration matters
Use a skill when domain knowledge should load only on demand
Use a plugin when multiple components must ship together
Use hooks when governance should happen automatically
CLI vs VS Code: same goal, different runtime
Runtime
Native context
Best at
VS Code
open files, diagnostics, editor UI
inspect, edit, review
Copilot CLI
shell, filesystem, session workspace
automate, script, batch, analyze
The right question is not “Which runtime wins?”
The right question is “Which runtime fits this task shape?”
Portability rules across CLI and VS Code
Primitive
Portability
Skills
✅ best option for cross-runtime sharing
Instructions
⚠️ content shareable, delivery differs
Hooks
⚠️ mostly shareable, payload details differ
Agents
❌ runtime-specific wrappers required
Prompts / toolsets
❌ VS Code-centric
Plugins
↔ same concept, built per runtime
Write expertise as skills if you want it to survive runtime changes
Skills: the most portable primitive
Same folder-based artifact in both runtimes: skills/<name>/SKILL.md
Lazy-loaded on relevance, not always-on
Great for reusable procedures: commits, release flows, framework playbooks
Keep SKILL.md lean; push detail into references/, scripts/, assets/
[Visual: skill folder anatomy]
Ralph v2: one case study, all six primitives
Primitive
Ralph v2 usage
Instructions
shared workflow logic
Agents
6 roles × runtime variants
Skills
planning, signals, knowledge, session ops
Plugins
runtime-scoped bundles
Hooks
session lifecycle automation
Prompts
VS Code orchestration shortcuts
Ralph matters because it proves the primitives compose
Ralph v2 as context + harness engineering
6-agent flow: Orchestrator → Planner → Questioner → Executor → Reviewer → Librarian
Shared instructions provide durable workflow logic
Skills inject context only when needed
A deterministic state machine provides ownership boundaries and quality gates
Session artifacts make the workflow legible and auditable
[Visual: role diagram + state strip]
Ralph v2 runtime variants
VS Code variants use @AgentName and agents: references
CLI variants dispatch with task("ResolvedAgentName", ...)
Both variants embed the same shared instruction files at build time
Result: one behavior source of truth, two thin runtime wrappers
.plugin-managed keeps CLI distribution plugin-owned
Built-in agents in Copilot CLI
Agent
Best for
Avoid when
explore
codebase research
you already know the exact file
task
tests, builds, installs
you need every log line inline
general-purpose
complex multi-step work
a narrower worker will do
code-review
high-signal bug/security review
you want style feedback
Start fresh sessions with git status or git log -5 so the agent sees recent reality
Use Git as both context loader and safety net: commits, branches, reflog, diff review
VS Code capabilities + the blended workflow
VS Code shines when visual editing, diagnostics, and inline review matter
Key built-ins to name: Agent Mode, Fleet Mode, Code Review, Sessions View, Handoffs
CLI shines when the task is shell-heavy, automation-heavy, or batch-oriented
Expert practice is blended:
VS Code = where you see
CLI = where you orchestrate
Pragmatic adoption ladder
Beginner : use built-in agent mode for bounded tasks
Intermediate : add repo instructions, a small skill catalog, maybe one hook
Advanced : build custom agents, package plugins, and mine session history
Earn complexity from repeated pain, not from excitement
Slido Poll 3 — Which primitive will you try first?
instructions
skills
prompts
custom agents
hooks / plugins
[Visual: Slido QR code + “why this one?” follow-up]
Built-in orchestration: the agent loop
flowchart LR
P[Prompt] --> PL[Plan]
PL --> T[Tool call]
T --> E[Execute]
E --> O[Observe]
O --> I[Iterate]
I --> T
Loading
Built-in agents are not one-shot generators
This is the literal meaning of "agents run tools in a loop to achieve a goal"
They already research, act, inspect, and retry
Copilot built-in orchestration modes
Mode
What it gives you
Interactive
human approval on each step
Autopilot
end-to-end agent execution
Plan mode
review the plan before action
Fleet mode
parallel subagent execution
Start interactive; graduate to more autonomy as trust rises
Use fleet/subagents when work is independent or likely to burn root context; don't split tasks just because you can
Claude Code comparison: same loop, different defaults
Copilot
Claude Code
GitHub ecosystem + multi-surface
terminal-first heavy agenting
Fleet + GitHub-native workflows
custom subagents, hooks, checkpointing
flat subscription tiers
per-plan / per-token style tradeoffs
great daily velocity
great deep terminal workflows
Many advanced users combine both instead of choosing one forever
When built-in orchestration is enough
A single codebase, standard tools, and a human close to the loop
Great for test generation, refactors, bug fixes, docs, and code review
Move to custom orchestration when you need:
external business systems
durable queues / retries / state
multi-provider routing
custom approval branches
Why Spec-Driven Development matters
Fast AI code generation is useful, but vague prompting creates ambiguity and rework
SDD makes the spec the source of truth for the agent
Desired flow: idea → spec → plan → tasks → implementation → verification
This is the discipline layer between “wow” and “reliable”
Specs become durable context for the agent
Plans, tasks, and tests become the harness around execution
Human review happens at phase boundaries
Benefits: consistency, auditability, easier maintenance when requirements change
Speckit teaching workflow: six reviewable phases
Step
Command
Reviewable artifact
1
constitution
principles / constraints
2
specify
requirements + edge cases
3
plan
architecture + data model
4
tasks
dependency-ordered work
5
implement
code per approved tasks
6
analyze
consistency / gap report
Teaching simplification: today Speckit also surfaces clarify, checklist, and taskstoissues
Live demo setup: safest path and caveats
Safest live surface: VS Code Copilot Chat
In Copilot CLI , treat Speckit as custom-agent invocation , not guaranteed slash-command parity
Prefer the pinned init flow over an unknown global install
uvx --from git+https://github.com/github/spec-kit.git@v0.5.0 specify init --here --ai copilot --script ps --no-git --force
CLI fallback paths: /agent → choose speckit.specify, or copilot --agent=speckit.specify --prompt "..."
Demo part 1 — Constitution + Specify
Scenario: task-management REST API with ASP.NET Core Minimal APIs
Constitution locks stack and non-negotiables:
C# 14, .NET 9, xUnit, EF Core, SQLite
Specify captures real requirements:
CRUD
status filter
due date must be future
soft delete
404 behavior
[GIF: Speckit constitution + specify walkthrough]
Demo part 2 — Plan + Tasks
plan turns the spec into architecture, data model, contracts, and quickstart
tasks breaks work into atomic, ordered steps
Example tasks:
scaffold project
create Task entity + DbContext
add DTOs and mapping
implement CRUD + status filter
add validation and tests
This is the key review gate before code generation
Demo part 3 — Implement + Verify
First run the existing tests so the agent sees repo reality before editing
implement works task-by-task instead of taking one speculative leap
Use red/green TDD for new behavior: fail first, then fix
Verification remains explicit:
dotnet build && dotnet test
Live tip: show only the first 2–3 tasks; keep the rest as recorded backup
Speckit in context: OpenSpec is the safer fallback
Speckit = GitHub's general-purpose SDD toolkit
OpenSpec = this workspace's production SDD implementation
Same philosophy: specs → plan → tasks → implement → validate
Workshop fallback path:
@openspec-explore (optional)
@openspec-propose
@openspec-apply-change
Teaching line: same shape, safer local fallback
Built-in vs custom orchestration
Built-in orchestration
Custom orchestration
already inside coding agents
you design the workflow graph
fast to adopt
more setup effort
great for dev tasks in one environment
great for cross-system, durable workflows
limited provider / control boundaries
full routing, approval, and integration control
Built-in is for using agents well
Custom is for building agent systems
Microsoft Agent Framework at a glance
Unified .NET framework for building AI agents and multi-agent workflows
Key building blocks:
agents
workflow graphs
custom executors / middleware
Patterns: sequential, concurrent, conditional, deterministic
Enterprise-friendly: telemetry, type safety, CI/CD, provider flexibility
MAF architecture patterns
flowchart LR
A[Agent node] --> B[Agent node]
A --> C[Parallel agent]
B --> D[Validator]
C --> D
D --> E{Human approval?}
E -->|yes| F[Next step]
E -->|no| G[Exception queue]
Loading
Mix LLM-backed nodes with deterministic nodes
Sequence, parallelism, conditions, and approvals are first-class
Provider flexibility: Copilot SDK, Claude, Azure OpenAI
Option
Setup
Best for
Copilot SDK
Copilot CLI + subscription
teams already living in GitHub Copilot
Claude SDK
API key + Anthropic billing
Claude-first reasoning workflows
Azure/OpenAI direct
endpoint + API key
full control, residency, cost tuning
Core message: the workflow code can stay the same while the provider changes
Code walkthrough: one AsAIAgent() surface
AIAgent copilot = copilotClient . AsAIAgent ( instructions : "..." ) ;
AIAgent claude = anthropicClient . AsAIAgent ( model : "claude-sonnet-4-6" , instructions : "..." ) ;
AIAgent azure = openAiClient . GetChatClient ( "gpt-4o-mini" ) . AsAIAgent ( instructions : "..." ) ;
Same orchestration API
Different model/provider behind the node
Swap the brain without rewriting the workflow
Demo 3 architecture — HR onboarding workflow
flowchart TD
F[New-hire form / HRIS event] --> I[Intake Agent\nCopilot SDK]
I --> P[Provisioning Agent\nClaude]
P --> N[Notification Agent\nAzure OpenAI]
N --> V[Deterministic Validator]
V --> H{Human approval}
H -->|approve| O[Checklist + email draft + tasks]
H -->|reject / missing data| Q[Exception queue]
Loading
3 providers, 1 deterministic node, 1 explicit HITL checkpoint
Demo 3 walkthrough — where the human approves
Intake extracts role, department, manager, start date, location
Provisioning creates equipment, access, and training checklist
Notification drafts welcome email and orientation schedule
Validator checks completeness and policy violations
Human approves before tickets, messages, or calendar events are created
[GIF: HR onboarding workflow run + MP4 fallback]
What is an agentic workflow?
A structured, repeatable, agent-executed process for a specific domain
It combines:
context = domain knowledge
harness = constraints + checks + cleanup
orchestration = how work moves
Same pattern works in software, HR, sales, marketing, and ops
If the result ships but nobody can explain it later, the workflow is creating cognitive debt
flowchart LR
R[Research] --> P[Plan]
P --> E[Execute]
E --> V[Verify]
V --> R
Loading
Research gives context
Plan shapes the harness
Execute uses built-in or custom orchestration
Verify closes the loop
After verify, ask for a walkthrough, diagram, or interactive explanation so understanding compounds too
Ralph v2 as a software engineering workflow
Requirement → discovery → plan → execute → review → knowledge extraction → iterate
6 agents split responsibilities instead of overloading one giant worker
Context Engineering: skills, instructions, session knowledge
Harness Engineering: signal protocols, review checklists, atomic commits
Orchestration: deterministic delegation with isolated contexts
Ralph state machine + knowledge loop
flowchart LR
A[Initialize] --> B[Planning]
B --> C[Batching]
C --> D[Execute batch]
D --> E[Review batch]
E --> F[Knowledge extract]
F --> G[Iteration review]
G --> H[Complete]
G --> I[Critique]
I --> B
Loading
Knowledge is promoted after review, not dumped into the system by default
Engineering use cases: Module 9 anchors
3 live GIF anchors
multi-service auth refactoring
CI/CD diagnosis and fix
test coverage automation
Reference earlier modules instead of replaying them
spec-driven development
Ralph v2 / multi-agent orchestration
P1/P2 backups
documentation validation, browser testing, migrations, plugins, diagrams, atomic commits
[Visual: use-case map by impact vs demoability]
Use case — Multi-service auth refactoring
Problem: security logic drifts across services
Agentic pattern: discover auth seams → refactor shared patterns → verify across services
Context: repo auth conventions, service maps, framework docs
Harness: contract tests, auth regression suite, CI gates
Monday morning takeaway: AI is safest when the repo already teaches the target pattern
[GIF: multi-file auth refactor, ~15s]
Use case — CI/CD diagnosis and fix
Problem: failed pipelines block delivery and create waiting time
Agentic pattern: read logs → patch workflow/script → rerun until green
Context: workflow YAML, command conventions, failure history
Harness: required checks, rerun loop, diff review, branch protection
Monday morning takeaway: agents are debuggers and operators, not just code generators
[GIF: red ❌ to green ✅ pipeline, ~20s]
Use case — Test coverage automation
Problem: teams know the gaps but rarely schedule the work
Agentic pattern: find weakly tested modules → generate tests → fix failures → report delta
Context: test style, fixtures, examples of good assertions
Harness: coverage threshold, failing-test loop, reviewer checks for meaningful tests
Monday morning takeaway: harnesses turn “write tests” into a measurable outcome
[GIF: coverage jump highlight, ~15s]
More engineering cases to steal from
Beginner / quick win
Intermediate / team value
Advanced / system value
atomic Git commits
issue-to-PR feature delivery
spec-driven development
architecture diagrams
docs validation
Ralph v2 orchestration
visual slide decks
browser testing
legacy migration
plugin scaffolding
database query automation
cross-service refactors
Non-engineering workflows: same pattern, new domain
Discipline
Engineering example
Non-engineering equivalent
Context Engineering
repo docs, tickets, logs
policies, templates, CRM schemas
Harness Engineering
tests, CI, linters
approvals, compliance, audit trails
Orchestration
agent loops, subagents
handoffs, queues, event-driven workflows
The domain changes; the mental model does not
Non-engineering use case — Google Workspace executive ops
One operator asks an agent to prepare tomorrow's schedule, draft replies, refresh a sheet, and organize files
Context: executive preferences, calendar norms, Drive taxonomy, email tone
Harness: dry-run mode, approval before send/edit, narrow permissions, action logs
Best implementation path: coding agent + skills/tools
Why it matters: easiest example of repurposing a coding agent for business work
Non-engineering use cases — Lead processing + RFP automation
B2B SaaS lead processing
ingest → enrich → score → route → update CRM
harness = confidence thresholds, dedupe, manager override, audit trail
RFP response automation
retrieve approved content → draft → validate → review → approve
harness = citations, legal review, locked final approval
These are the moments where teams often graduate from prompting to orchestration
Non-engineering use case — HR onboarding is the selected demo
Universally relatable; no deep domain setup required
Clear agent boundaries: intake, provisioning, notification, validation
Shows the full workshop thesis in one compact flow:
context = role templates, HR policies, comms templates
harness = completeness checks + approvals
orchestration = workflow graph across specialists
Compact enough to explain and credible enough to matter
When skills are enough vs when orchestration must grow up
Coding agent + skills is enough when...
Custom orchestration is better when...
a human operator is present
the workflow runs repeatedly or semi-autonomously
actions are reversible
actions affect money, access, or compliance
one agent can call a few tools in sequence
many handoffs, queues, or approval branches exist
you need a fast pilot
you need durable state and governance
Copilot Studio is a strong middle layer between prompting and a full custom framework
Slido Poll 4 — Which use case matters most to you?
codebase refactoring
CI/CD and testing
documentation / knowledge work
internal business workflow automation
customer / support operations
[Visual: Slido QR code + “what would you pilot first?”]
Context anti-patterns and the fix
Anti-pattern
What goes wrong
Better pattern
giant instruction file
token waste, quality drops
progressive disclosure
no instructions
generic output
minimal repo rules
stale docs
wrong implementation choices
freshness checks
random tool additions
active context gets crowded
intentional capability design
Harness anti-patterns and the fix
Anti-pattern
What goes wrong
Better pattern
YOLO coding
entropy explosion
tests + review gates
blind trust in output
subtle bugs, security issues
diff review + verification
reviewer disrespect
teammates do the first real review
review it yourself + include evidence
cognitive debt
future changes slow because nobody understands it
walkthroughs + diagrams + interactive explanations
no cleanup loop
drift accumulates
scheduled garbage collection
Enterprise governance essentials
Area
What to say briefly
Access control
manage licenses and features by team / repo
Policy
disable risky modes where needed
Privacy
org code and prompts should stay out of training on business tiers
Audit
log settings changes and agent activity
Cost
watch premium requests, dashboards, and overage risk
Compliance
map to SOC / ISO / trust-center evidence
Governance model: start interactive, graduate to autonomy as trust rises
Team rule: no unreviewed AI PRs — reviewers should never do the first meaningful review of agent-generated code or PR text
Learning roadmap — Context Engineering resources
GitHub docs: custom agents configuration
VS Code blog: “Context is all you need”
Community catalogs: awesome-copilot, Skills Hub, SkillsMP
Internal follow-up: publish one repo instruction file and one reusable skill
[Visual: QR code block for curated resource list]
Learning roadmap — Harness + orchestration resources
OpenAI: Harness Engineering
Anthropic: harness design for long-running apps
Martin Fowler: harness engineering analysis
Microsoft Agent Framework quickstart + samples
GitHub Copilot SDK and Claude Agent SDK docs
[Visual: QR code block for the second resource list]
Take-home scorecard + adoption ladder
Use the 6-dimension maturity scorecard to assess your current environment
Then act at the smallest useful level:
this week: add repository instructions and hoard one working example
this month: add one skill, one guardrail, and ask for one walkthrough after a significant change
this quarter: build one repeatable workflow
Goal: progressive leverage, not instant platform engineering
Slido Poll 5 — What is your #1 takeaway?
open text → word cloud
Use this as the closing reflection, not as a quiz
[Visual: Slido word-cloud prompt]
Reliable agent outcomes are designed, not wished into existence
Start with context
Add harness
Choose the lightest orchestration that fits the job
Then make one workflow real in your own domain
[Visual: Thank-you slide with contact / follow-up placeholders]
Appendix — optional advanced material
Safe to skip live if timing is tight
Keep these as backup or Q&A slides
Focus: Copilot CLI session topology and feedback loops
CLI session topology: the two-layer model
flowchart TB
W[Layer 1: Working directory\nGit repo, code, tests, .github/hooks] --> S[Layer 2: Session workspace\n~/.copilot/session-state/<uuid>/]
S --> F[files/ for plans, outputs, shared artifacts]
S --> E[events.jsonl for canonical event history]
Loading
Never confuse product code with runtime/session artifacts
Subagents are workers inside one shared session container
Session directory anatomy
~/.copilot/session-state/<uuid>/
├── workspace.yaml
├── events.jsonl
├── checkpoints/
├── rewind-snapshots/
├── files/
│ ├── plan.md
│ ├── research/
│ └── shared artifacts
└── session.db
events.jsonl is the ground truth
files/ is the right place for session-scoped deliverables like this deck
Session lifecycle + /chronicle feedback loop
flowchart LR
C[Create session] --> A[Active work]
A --> P[Checkpoint / compact]
P --> R[Resume / rewind]
R --> S[Shutdown]
S --> CH[/chronicle analytics]
CH --> I[Better instructions and workflows]
I --> C
Loading
Files are canonical; searchable catalogs are derived views
/chronicle standup, tips, improve, and reindex turn session history into better future sessions