theme

default

title

Agentic Engineering Workshop

info

First full Slidev implementation artifact synthesized from the reconciled workshop plan and drafted component artifacts.

colorSchema

dark

highlighter

shiki

lineNumbers

true

mdc

true

monaco

true

aspectRatio

16/9

fonts

sans	mono
Inter	JetBrains Mono

Agentic Engineering Workshop

Context Engineering + Harness Engineering → Agentic Workflows

For developers moving from autocomplete to reliable agent systems
Live demos: context before/after, Speckit SDD, MAF HR onboarding
Format: Slidev, dark theme, Mermaid-ready, GIF fallbacks where needed

[Visual: GitHub-dark cover with the three pillars color-coded]

Slido Poll 1 — Where are you today?

I only use autocomplete
I use chat regularly
I have tried agent mode a little
I use AI agents weekly
I build custom workflows already

[Visual: Slido join QR code + room code]

The Evolution Ladder

flowchart LR
  A[Manual coding] --> B[Autocomplete]
  B --> C[Chat]
  C --> D[Agent mode]
  D --> E[Multi-agent]
  E --> F[Agentic workflows]

Autocomplete predicts the next line
Agent mode tries to complete the job
This workshop is about climbing from useful assistance to reliable systems

Why this matters now

20M+ Copilot users; 77K+ enterprise organizations
Copilot now writes ~46% of active developers' code
84% of professional developers use AI tools daily
Adoption is no longer the bottleneck; reliability is
Code is cheap now; quality is still expensive once you include tests, review, and explainability

[Visual: staircase infographic with a moving “you are here” marker]

What is Agentic Engineering?

Agent = software that lets an LLM run tools in a loop to achieve a goal
Coding agent = that loop plus repo access, code edits, and execution
Agentic engineering = disciplined, reviewable engineering amplified by that loop
Context Engineering = give the agent the right knowledge
Harness Engineering = keep the loop on track with constraints, feedback, and cleanup
Agentic Workflows = apply both to real delivery loops in code or business operations

Vibe coding vs agentic engineering

Vibe coding	Agentic engineering
Prompt into existence	Direct a tool-using loop toward a goal
One-shot output	Loop: research → plan → execute → verify
Prototype-grade confidence	Reviewable specs, tests, and notes
"Seems fine"	"I reviewed it, ran it, and can explain it"
Hard to maintain	Easy to rerun, review, and improve

The conceptual framework

flowchart TD
  CE[Context Engineering] --> AW[Agentic Workflows]
  HE[Harness Engineering] --> AW
  BO[Built-in orchestration] --> AW
  CO[Custom orchestration] --> AW
  AW --> ENG[Engineering delivery]
  AW --> BIZ[Business workflows]

Better context improves decisions
Better harnesses improve reliability
Orchestration decides how work moves

Glossary checkpoint

Term	Plain-English meaning
Agent	An LLM using tools in a loop to achieve a goal
Vibe coding	Prototype-first prompting with light review rigor
Context window	The model's working memory for this turn
Skill	A reusable training manual loaded on relevance
Instruction	Always-on guidance or policy
MCP server	A tool/capability provider for the agent
Subagent	A specialist worker with its own context window
HITL	Human-in-the-loop approval or review checkpoint
Cognitive debt	Working code the team can no longer confidently explain
Agentic workflow	A repeatable process executed by agents

Context Engineering: the working definition

Anything outside the agent's context effectively does not exist
Context Engineering = curate, structure, and inject the right knowledge at the right time
Best context is often a reusable proof: starter repos, specs, successful diffs, screenshots
Hoard things you know how to do so agents can recombine working examples instead of guessing from theory

Context anti-patterns

Anti-pattern	Failure mode	Better move
Context dump	token waste, noisy output	progressive disclosure
Context starvation	generic, convention-breaking output	repo instructions
No reusable proofs	agent guesses from theory	hoard examples + starter repos
Stale context	agents follow old patterns	freshness checks + reviews
Tool sprawl	capabilities crowd out reasoning	add MCP intentionally

Copilot's layered context model

flowchart TB
  S[System + safety] --> R[Repo instructions]
  R --> P[Path instructions]
  P --> A[Selected agent / prompt]
  A --> K[Skills loaded on relevance]
  K --> C[Current files + diagnostics + terminal]
  C --> LLM[Model context window]

Skills are lazy-loaded expertise
MCP adds capabilities through connected tools

Context precedence and runtime behavior

Personal / user instructions > repository instructions > organization rules
On-demand: prompts, agents, skills
Automatic: current file, open tabs, diagnostics, terminal output
Compaction kicks in near a full context window; older turns are summarized
Practical rule: use skills for knowledge, MCP for capabilities

Demo 1 — Same prompt, weak context

Prompt: “Create a user registration endpoint with password hashing and validation.”
No project instructions
Typical failure modes:
- generic naming and folder placement
- weak hashing or missing validation
- no ProblemDetails / inconsistent error handling

[GIF: Context Engineering before/after — generic output]

Demo 1 — Same prompt, upgraded context

Add repo instructions: BCrypt, Clean Architecture, ProblemDetails, FluentValidation, CancellationToken
Re-run the exact same prompt
Output now follows house style, safer defaults, and reviewable conventions
Teaching beat: same prompt, same model, different context

[GIF: Context Engineering before/after — improved output + fallback screenshot]

Slido Poll 2 — Which context injection method surprised you?

repository instructions
path-specific instructions
skills
MCP servers
editor / terminal context

[Visual: Slido QR code + short debrief prompt]

Token economics: skills vs MCP

Use it for	Skills	MCP
Purpose	expertise / procedure	external capabilities / tools
Loading model	lazy	eager
Scale signal	~1,000 skills ≈ 15K tokens	10 tools ≈ 50K tokens
Best use	“How should it do this?”	“What can it access or act on?”

Top workshop MCP examples: GitHub, Context7, Playwright, Chroma, Filesystem
Rule of thumb: prefer skills by default; add MCP intentionally

Harness Engineering: why it exists

Writing code is cheap now; correctness, reviewability, and changeability still cost real effort
Great prompts are not enough for repeated, high-stakes work
Harness Engineering = the scaffolding around the agent:
- context sources
- architectural constraints
- feedback loops
- entropy management
When the agent struggles, inspect the environment before blaming the model

The three harness components

Component	What it does	Concrete example
Context	Gives the agent the right map	repo docs, specs, logs, browser state
Constraints	Enforces safe boundaries	layering tests, lint rules, approval gates
Garbage collection	Reduces drift and entropy	cleanup agents, stale-doc sweeps, dead-code checks

Planner → Generator → Evaluator beats self-critique

Separate planning, generation, and evaluation when quality matters
First run the tests before editing; it loads repo reality into the loop
Use red/green TDD when you need a real regression guard
Use deterministic validators where you can
Janitorial / cleanup agents are part of the system, not an afterthought
Reliability comes from feedback loops, not hero prompts

[Visual: Planner → Generator → Evaluator + cleanup loop]

Copilot customizations mapped to the harness

Context: .github/copilot-instructions.md, *.instructions.md, skills, prompts
Constraints: tests, linters, CI, hooks, approval modes
Garbage collection: review agents, scheduled cleanup, drift detection, doc freshness checks
Recommended order:
1. context first
2. constraints second
3. cleanup automation last

Harness maturity scorecard

Score each 0–2:
1. Repository knowledge
2. Documentation structure
3. Architectural constraints
4. Application legibility
5. Feedback loops
6. CI/CD gates
≤5 = early stage
6–8 = developing
9–12 = mature

[Visual: one-page scorecard handout preview]

Break — 10 minutes

Resume at: [insert local time]
Drop questions in chat while we reset demo windows
Next up: GitHub Copilot deep dive, SDD, and custom orchestration

The Copilot ecosystem

Product	Interface	Best for
Copilot in VS Code	IDE	interactive coding, visual edits
Copilot CLI	terminal	automation, shell-heavy tasks
Copilot SDK	application code	embed Copilot into custom apps
Copilot in GitHub.com	web	PRs, issues, review
Copilot Mobile	phone	quick review and Q&A
Copilot Coding Agent	GitHub Actions	issue-to-PR automation

Learn the model once; apply it across surfaces

Copilot customization primitives

Primitive	Purpose	Analogy
Instructions	always-on rules	house rules on the wall
Prompts	reusable task templates	fill-in-the-blank form
Agents	persona + tools + workflow	specialist teammate
Skills	lazy-loaded expertise	training manual
Plugins	installable bundles	app install
Hooks	lifecycle triggers	automated security camera

Choosing the right primitive

Use instructions when the rule should apply every time
Use a prompt when the task is repeatable but user-triggered
Use an agent when persona, tool access, or orchestration matters
Use a skill when domain knowledge should load only on demand
Use a plugin when multiple components must ship together
Use hooks when governance should happen automatically

CLI vs VS Code: same goal, different runtime

Runtime	Native context	Best at
VS Code	open files, diagnostics, editor UI	inspect, edit, review
Copilot CLI	shell, filesystem, session workspace	automate, script, batch, analyze

The right question is not “Which runtime wins?”
The right question is “Which runtime fits this task shape?”

Portability rules across CLI and VS Code

Primitive	Portability
Skills	✅ best option for cross-runtime sharing
Instructions	⚠️ content shareable, delivery differs
Hooks	⚠️ mostly shareable, payload details differ
Agents	❌ runtime-specific wrappers required
Prompts / toolsets	❌ VS Code-centric
Plugins	↔ same concept, built per runtime

Write expertise as skills if you want it to survive runtime changes

Skills: the most portable primitive

Same folder-based artifact in both runtimes: skills/<name>/SKILL.md
Lazy-loaded on relevance, not always-on
Great for reusable procedures: commits, release flows, framework playbooks
Keep SKILL.md lean; push detail into references/, scripts/, assets/

[Visual: skill folder anatomy]

Ralph v2: one case study, all six primitives

Primitive	Ralph v2 usage
Instructions	shared workflow logic
Agents	6 roles × runtime variants
Skills	planning, signals, knowledge, session ops
Plugins	runtime-scoped bundles
Hooks	session lifecycle automation
Prompts	VS Code orchestration shortcuts

Ralph matters because it proves the primitives compose

Ralph v2 as context + harness engineering

6-agent flow: Orchestrator → Planner → Questioner → Executor → Reviewer → Librarian
Shared instructions provide durable workflow logic
Skills inject context only when needed
A deterministic state machine provides ownership boundaries and quality gates
Session artifacts make the workflow legible and auditable

[Visual: role diagram + state strip]

Ralph v2 runtime variants

VS Code variants use @AgentName and agents: references
CLI variants dispatch with task("ResolvedAgentName", ...)
Both variants embed the same shared instruction files at build time
Result: one behavior source of truth, two thin runtime wrappers
.plugin-managed keeps CLI distribution plugin-owned

Built-in agents in Copilot CLI

Agent	Best for	Avoid when
`explore`	codebase research	you already know the exact file
`task`	tests, builds, installs	you need every log line inline
`general-purpose`	complex multi-step work	a narrower worker will do
`code-review`	high-signal bug/security review	you want style feedback

Start fresh sessions with git status or git log -5 so the agent sees recent reality
Use Git as both context loader and safety net: commits, branches, reflog, diff review

VS Code capabilities + the blended workflow

VS Code shines when visual editing, diagnostics, and inline review matter
Key built-ins to name: Agent Mode, Fleet Mode, Code Review, Sessions View, Handoffs
CLI shines when the task is shell-heavy, automation-heavy, or batch-oriented
Expert practice is blended:
- VS Code = where you see
- CLI = where you orchestrate

Pragmatic adoption ladder

Beginner: use built-in agent mode for bounded tasks
Intermediate: add repo instructions, a small skill catalog, maybe one hook
Advanced: build custom agents, package plugins, and mine session history
Earn complexity from repeated pain, not from excitement

Slido Poll 3 — Which primitive will you try first?

instructions
skills
prompts
custom agents
hooks / plugins

[Visual: Slido QR code + “why this one?” follow-up]

Built-in orchestration: the agent loop

flowchart LR
  P[Prompt] --> PL[Plan]
  PL --> T[Tool call]
  T --> E[Execute]
  E --> O[Observe]
  O --> I[Iterate]
  I --> T

Built-in agents are not one-shot generators
This is the literal meaning of "agents run tools in a loop to achieve a goal"
They already research, act, inspect, and retry

Copilot built-in orchestration modes

Mode	What it gives you
Interactive	human approval on each step
Autopilot	end-to-end agent execution
Plan mode	review the plan before action
Fleet mode	parallel subagent execution

Start interactive; graduate to more autonomy as trust rises
Use fleet/subagents when work is independent or likely to burn root context; don't split tasks just because you can

Claude Code comparison: same loop, different defaults

Copilot	Claude Code
GitHub ecosystem + multi-surface	terminal-first heavy agenting
Fleet + GitHub-native workflows	custom subagents, hooks, checkpointing
flat subscription tiers	per-plan / per-token style tradeoffs
great daily velocity	great deep terminal workflows

Many advanced users combine both instead of choosing one forever

When built-in orchestration is enough

A single codebase, standard tools, and a human close to the loop
Great for test generation, refactors, bug fixes, docs, and code review
Move to custom orchestration when you need:
- external business systems
- durable queues / retries / state
- multi-provider routing
- custom approval branches

Why Spec-Driven Development matters

Fast AI code generation is useful, but vague prompting creates ambiguity and rework
SDD makes the spec the source of truth for the agent
Desired flow: idea → spec → plan → tasks → implementation → verification
This is the discipline layer between “wow” and “reliable”

What SDD changes

Specs become durable context for the agent
Plans, tasks, and tests become the harness around execution
Human review happens at phase boundaries
Benefits: consistency, auditability, easier maintenance when requirements change

Speckit teaching workflow: six reviewable phases

Step	Command	Reviewable artifact
1	`constitution`	principles / constraints
2	`specify`	requirements + edge cases
3	`plan`	architecture + data model
4	`tasks`	dependency-ordered work
5	`implement`	code per approved tasks
6	`analyze`	consistency / gap report

Teaching simplification: today Speckit also surfaces clarify, checklist, and taskstoissues

Live demo setup: safest path and caveats

Safest live surface: VS Code Copilot Chat
In Copilot CLI, treat Speckit as custom-agent invocation, not guaranteed slash-command parity
Prefer the pinned init flow over an unknown global install

uvx --from git+https://github.com/github/spec-kit.git@v0.5.0 specify init --here --ai copilot --script ps --no-git --force

CLI fallback paths: /agent → choose speckit.specify, or copilot --agent=speckit.specify --prompt "..."

Demo part 1 — Constitution + Specify

Scenario: task-management REST API with ASP.NET Core Minimal APIs
Constitution locks stack and non-negotiables:
- C# 14, .NET 9, xUnit, EF Core, SQLite
Specify captures real requirements:
- CRUD
- status filter
- due date must be future
- soft delete
- 404 behavior

[GIF: Speckit constitution + specify walkthrough]

Demo part 2 — Plan + Tasks

plan turns the spec into architecture, data model, contracts, and quickstart
tasks breaks work into atomic, ordered steps
Example tasks:
1. scaffold project
2. create Task entity + DbContext
3. add DTOs and mapping
4. implement CRUD + status filter
5. add validation and tests
This is the key review gate before code generation

Demo part 3 — Implement + Verify

First run the existing tests so the agent sees repo reality before editing
implement works task-by-task instead of taking one speculative leap
Use red/green TDD for new behavior: fail first, then fix
Verification remains explicit:

dotnet build && dotnet test

Live tip: show only the first 2–3 tasks; keep the rest as recorded backup

Speckit in context: OpenSpec is the safer fallback

Speckit = GitHub's general-purpose SDD toolkit
OpenSpec = this workspace's production SDD implementation
Same philosophy: specs → plan → tasks → implement → validate
Workshop fallback path:
- @openspec-explore (optional)
- @openspec-propose
- @openspec-apply-change
Teaching line: same shape, safer local fallback

Built-in vs custom orchestration

Built-in orchestration	Custom orchestration
already inside coding agents	you design the workflow graph
fast to adopt	more setup effort
great for dev tasks in one environment	great for cross-system, durable workflows
limited provider / control boundaries	full routing, approval, and integration control

Built-in is for using agents well
Custom is for building agent systems

Microsoft Agent Framework at a glance

Unified .NET framework for building AI agents and multi-agent workflows
Key building blocks:
- agents
- workflow graphs
- custom executors / middleware
Patterns: sequential, concurrent, conditional, deterministic
Enterprise-friendly: telemetry, type safety, CI/CD, provider flexibility

MAF architecture patterns

flowchart LR
  A[Agent node] --> B[Agent node]
  A --> C[Parallel agent]
  B --> D[Validator]
  C --> D
  D --> E{Human approval?}
  E -->|yes| F[Next step]
  E -->|no| G[Exception queue]

Mix LLM-backed nodes with deterministic nodes
Sequence, parallelism, conditions, and approvals are first-class

Provider flexibility: Copilot SDK, Claude, Azure OpenAI

Option	Setup	Best for
Copilot SDK	Copilot CLI + subscription	teams already living in GitHub Copilot
Claude SDK	API key + Anthropic billing	Claude-first reasoning workflows
Azure/OpenAI direct	endpoint + API key	full control, residency, cost tuning

Core message: the workflow code can stay the same while the provider changes

Code walkthrough: one `AsAIAgent()` surface

AIAgent copilot = copilotClient.AsAIAgent(instructions: "...");
AIAgent claude = anthropicClient.AsAIAgent(model: "claude-sonnet-4-6", instructions: "...");
AIAgent azure = openAiClient.GetChatClient("gpt-4o-mini").AsAIAgent(instructions: "...");

Same orchestration API
Different model/provider behind the node
Swap the brain without rewriting the workflow

Demo 3 architecture — HR onboarding workflow

flowchart TD
  F[New-hire form / HRIS event] --> I[Intake Agent\nCopilot SDK]
  I --> P[Provisioning Agent\nClaude]
  P --> N[Notification Agent\nAzure OpenAI]
  N --> V[Deterministic Validator]
  V --> H{Human approval}
  H -->|approve| O[Checklist + email draft + tasks]
  H -->|reject / missing data| Q[Exception queue]

3 providers, 1 deterministic node, 1 explicit HITL checkpoint

Demo 3 walkthrough — where the human approves

Intake extracts role, department, manager, start date, location
Provisioning creates equipment, access, and training checklist
Notification drafts welcome email and orientation schedule
Validator checks completeness and policy violations
Human approves before tickets, messages, or calendar events are created

[GIF: HR onboarding workflow run + MP4 fallback]

What is an agentic workflow?

A structured, repeatable, agent-executed process for a specific domain
It combines:
- context = domain knowledge
- harness = constraints + checks + cleanup
- orchestration = how work moves
Same pattern works in software, HR, sales, marketing, and ops
If the result ships but nobody can explain it later, the workflow is creating cognitive debt

The R-P-E-V loop

flowchart LR
  R[Research] --> P[Plan]
  P --> E[Execute]
  E --> V[Verify]
  V --> R

Research gives context
Plan shapes the harness
Execute uses built-in or custom orchestration
Verify closes the loop
After verify, ask for a walkthrough, diagram, or interactive explanation so understanding compounds too

Ralph v2 as a software engineering workflow

Requirement → discovery → plan → execute → review → knowledge extraction → iterate
6 agents split responsibilities instead of overloading one giant worker
Context Engineering: skills, instructions, session knowledge
Harness Engineering: signal protocols, review checklists, atomic commits
Orchestration: deterministic delegation with isolated contexts

Ralph state machine + knowledge loop

flowchart LR
  A[Initialize] --> B[Planning]
  B --> C[Batching]
  C --> D[Execute batch]
  D --> E[Review batch]
  E --> F[Knowledge extract]
  F --> G[Iteration review]
  G --> H[Complete]
  G --> I[Critique]
  I --> B

Knowledge is promoted after review, not dumped into the system by default

Engineering use cases: Module 9 anchors

3 live GIF anchors
- multi-service auth refactoring
- CI/CD diagnosis and fix
- test coverage automation
Reference earlier modules instead of replaying them
- spec-driven development
- Ralph v2 / multi-agent orchestration
P1/P2 backups
- documentation validation, browser testing, migrations, plugins, diagrams, atomic commits

[Visual: use-case map by impact vs demoability]

Use case — Multi-service auth refactoring

Problem: security logic drifts across services
Agentic pattern: discover auth seams → refactor shared patterns → verify across services
Context: repo auth conventions, service maps, framework docs
Harness: contract tests, auth regression suite, CI gates
Monday morning takeaway: AI is safest when the repo already teaches the target pattern

[GIF: multi-file auth refactor, ~15s]

Use case — CI/CD diagnosis and fix

Problem: failed pipelines block delivery and create waiting time
Agentic pattern: read logs → patch workflow/script → rerun until green
Context: workflow YAML, command conventions, failure history
Harness: required checks, rerun loop, diff review, branch protection
Monday morning takeaway: agents are debuggers and operators, not just code generators

[GIF: red ❌ to green ✅ pipeline, ~20s]

Use case — Test coverage automation

Problem: teams know the gaps but rarely schedule the work
Agentic pattern: find weakly tested modules → generate tests → fix failures → report delta
Context: test style, fixtures, examples of good assertions
Harness: coverage threshold, failing-test loop, reviewer checks for meaningful tests
Monday morning takeaway: harnesses turn “write tests” into a measurable outcome

[GIF: coverage jump highlight, ~15s]

More engineering cases to steal from

Beginner / quick win	Intermediate / team value	Advanced / system value
atomic Git commits	issue-to-PR feature delivery	spec-driven development
architecture diagrams	docs validation	Ralph v2 orchestration
visual slide decks	browser testing	legacy migration
plugin scaffolding	database query automation	cross-service refactors

Non-engineering workflows: same pattern, new domain

Discipline	Engineering example	Non-engineering equivalent
Context Engineering	repo docs, tickets, logs	policies, templates, CRM schemas
Harness Engineering	tests, CI, linters	approvals, compliance, audit trails
Orchestration	agent loops, subagents	handoffs, queues, event-driven workflows

The domain changes; the mental model does not

Non-engineering use case — Google Workspace executive ops

One operator asks an agent to prepare tomorrow's schedule, draft replies, refresh a sheet, and organize files
Context: executive preferences, calendar norms, Drive taxonomy, email tone
Harness: dry-run mode, approval before send/edit, narrow permissions, action logs
Best implementation path: coding agent + skills/tools
Why it matters: easiest example of repurposing a coding agent for business work

Non-engineering use cases — Lead processing + RFP automation

B2B SaaS lead processing
- ingest → enrich → score → route → update CRM
- harness = confidence thresholds, dedupe, manager override, audit trail
RFP response automation
- retrieve approved content → draft → validate → review → approve
- harness = citations, legal review, locked final approval
These are the moments where teams often graduate from prompting to orchestration

Non-engineering use case — HR onboarding is the selected demo

Universally relatable; no deep domain setup required
Clear agent boundaries: intake, provisioning, notification, validation
Shows the full workshop thesis in one compact flow:
- context = role templates, HR policies, comms templates
- harness = completeness checks + approvals
- orchestration = workflow graph across specialists
Compact enough to explain and credible enough to matter

When skills are enough vs when orchestration must grow up

Coding agent + skills is enough when...	Custom orchestration is better when...
a human operator is present	the workflow runs repeatedly or semi-autonomously
actions are reversible	actions affect money, access, or compliance
one agent can call a few tools in sequence	many handoffs, queues, or approval branches exist
you need a fast pilot	you need durable state and governance

Copilot Studio is a strong middle layer between prompting and a full custom framework

Slido Poll 4 — Which use case matters most to you?

codebase refactoring
CI/CD and testing
documentation / knowledge work
internal business workflow automation
customer / support operations

[Visual: Slido QR code + “what would you pilot first?”]

Context anti-patterns and the fix

Anti-pattern	What goes wrong	Better pattern
giant instruction file	token waste, quality drops	progressive disclosure
no instructions	generic output	minimal repo rules
stale docs	wrong implementation choices	freshness checks
random tool additions	active context gets crowded	intentional capability design

Harness anti-patterns and the fix

Anti-pattern	What goes wrong	Better pattern
YOLO coding	entropy explosion	tests + review gates
blind trust in output	subtle bugs, security issues	diff review + verification
reviewer disrespect	teammates do the first real review	review it yourself + include evidence
cognitive debt	future changes slow because nobody understands it	walkthroughs + diagrams + interactive explanations
no cleanup loop	drift accumulates	scheduled garbage collection

Enterprise governance essentials

Area	What to say briefly
Access control	manage licenses and features by team / repo
Policy	disable risky modes where needed
Privacy	org code and prompts should stay out of training on business tiers
Audit	log settings changes and agent activity
Cost	watch premium requests, dashboards, and overage risk
Compliance	map to SOC / ISO / trust-center evidence

Governance model: start interactive, graduate to autonomy as trust rises
Team rule: no unreviewed AI PRs — reviewers should never do the first meaningful review of agent-generated code or PR text

Learning roadmap — Context Engineering resources

GitHub docs: custom agents configuration
VS Code blog: “Context is all you need”
Community catalogs: awesome-copilot, Skills Hub, SkillsMP
Internal follow-up: publish one repo instruction file and one reusable skill

[Visual: QR code block for curated resource list]

Learning roadmap — Harness + orchestration resources

OpenAI: Harness Engineering
Anthropic: harness design for long-running apps
Martin Fowler: harness engineering analysis
Microsoft Agent Framework quickstart + samples
GitHub Copilot SDK and Claude Agent SDK docs

[Visual: QR code block for the second resource list]

Take-home scorecard + adoption ladder

Use the 6-dimension maturity scorecard to assess your current environment
Then act at the smallest useful level:
- this week: add repository instructions and hoard one working example
- this month: add one skill, one guardrail, and ask for one walkthrough after a significant change
- this quarter: build one repeatable workflow
Goal: progressive leverage, not instant platform engineering

Slido Poll 5 — What is your #1 takeaway?

open text → word cloud
Use this as the closing reflection, not as a quiz

[Visual: Slido word-cloud prompt]

Closing call to action

Reliable agent outcomes are designed, not wished into existence
Start with context
Add harness
Choose the lightest orchestration that fits the job
Then make one workflow real in your own domain

[Visual: Thank-you slide with contact / follow-up placeholders]

Appendix — optional advanced material

Safe to skip live if timing is tight
Keep these as backup or Q&A slides
Focus: Copilot CLI session topology and feedback loops

CLI session topology: the two-layer model

flowchart TB
  W[Layer 1: Working directory\nGit repo, code, tests, .github/hooks] --> S[Layer 2: Session workspace\n~/.copilot/session-state/<uuid>/]
  S --> F[files/ for plans, outputs, shared artifacts]
  S --> E[events.jsonl for canonical event history]

Never confuse product code with runtime/session artifacts
Subagents are workers inside one shared session container

Session directory anatomy

~/.copilot/session-state/<uuid>/
├── workspace.yaml
├── events.jsonl
├── checkpoints/
├── rewind-snapshots/
├── files/
│   ├── plan.md
│   ├── research/
│   └── shared artifacts
└── session.db

events.jsonl is the ground truth
files/ is the right place for session-scoped deliverables like this deck

Session lifecycle + `/chronicle` feedback loop

flowchart LR
  C[Create session] --> A[Active work]
  A --> P[Checkpoint / compact]
  P --> R[Resume / rewind]
  R --> S[Shutdown]
  S --> CH[/chronicle analytics]
  CH --> I[Better instructions and workflows]
  I --> C

Files are canonical; searchable catalogs are derived views
/chronicle standup, tips, improve, and reindex turn session history into better future sessions

FilesExpand file tree

slidev-deck.md

Latest commit

History

slidev-deck.md

File metadata and controls

Agentic Engineering Workshop

Context Engineering + Harness Engineering → Agentic Workflows

Slido Poll 1 — Where are you today?

The Evolution Ladder

Why this matters now

What is Agentic Engineering?

Vibe coding vs agentic engineering

The conceptual framework

Glossary checkpoint

Context Engineering: the working definition

Context anti-patterns

Copilot's layered context model

Context precedence and runtime behavior

Demo 1 — Same prompt, weak context

Demo 1 — Same prompt, upgraded context

Slido Poll 2 — Which context injection method surprised you?

Token economics: skills vs MCP

Harness Engineering: why it exists

The three harness components

Planner → Generator → Evaluator beats self-critique

Copilot customizations mapped to the harness

Harness maturity scorecard

Break — 10 minutes

The Copilot ecosystem

Copilot customization primitives

Choosing the right primitive

CLI vs VS Code: same goal, different runtime

Portability rules across CLI and VS Code

Skills: the most portable primitive

Ralph v2: one case study, all six primitives

Ralph v2 as context + harness engineering

Ralph v2 runtime variants

Built-in agents in Copilot CLI

VS Code capabilities + the blended workflow

Pragmatic adoption ladder

Slido Poll 3 — Which primitive will you try first?

Built-in orchestration: the agent loop

Copilot built-in orchestration modes

Claude Code comparison: same loop, different defaults

When built-in orchestration is enough

Why Spec-Driven Development matters

What SDD changes

Speckit teaching workflow: six reviewable phases

Live demo setup: safest path and caveats

Demo part 1 — Constitution + Specify

Demo part 2 — Plan + Tasks

Demo part 3 — Implement + Verify

Speckit in context: OpenSpec is the safer fallback

Built-in vs custom orchestration

Microsoft Agent Framework at a glance

MAF architecture patterns

Provider flexibility: Copilot SDK, Claude, Azure OpenAI

Code walkthrough: one AsAIAgent() surface

Demo 3 architecture — HR onboarding workflow

Demo 3 walkthrough — where the human approves

What is an agentic workflow?

The R-P-E-V loop

Ralph v2 as a software engineering workflow

Ralph state machine + knowledge loop

Engineering use cases: Module 9 anchors

Use case — Multi-service auth refactoring

Use case — CI/CD diagnosis and fix

Use case — Test coverage automation

More engineering cases to steal from

Non-engineering workflows: same pattern, new domain

Non-engineering use case — Google Workspace executive ops

Non-engineering use cases — Lead processing + RFP automation

Non-engineering use case — HR onboarding is the selected demo

When skills are enough vs when orchestration must grow up

Slido Poll 4 — Which use case matters most to you?

Context anti-patterns and the fix

Harness anti-patterns and the fix

Enterprise governance essentials

Learning roadmap — Context Engineering resources

Code walkthrough: one `AsAIAgent()` surface

Session lifecycle + `/chronicle` feedback loop