This artifact is designed for Module 9 of the workshop. Use it to decide what to show live, what to turn into short GIFs, and what to keep as slide cards or backup examples.
Recommended delivery mix
- Lead with 3 anchor GIFs that make Context Engineering + Harness Engineering visible in under a minute.
- Go deeper on 4-7 one-pager cases depending on audience maturity.
- Keep the remaining cases as poll options, appendix cards, or Q&A material.
Provenance
- Use cases 1-10 are grounded in this workspace's agents, skills, and engineering workflows.
- Use cases 11-16 are validated external examples from GitHub, Microsoft ISE, Xebia, and related industry writeups.
Priority legend
- P0 = anchor story for the workshop; ideal for live walkthrough or GIF.
- P1 = strong slide card / narrated example.
- P2 = quick win, appendix, or audience-dependent backup.
| # | Priority | Use case | Why it matters | Agentic pattern | Customization / tooling | Guardrails / harness | Demo format |
|---|---|---|---|---|---|---|---|
| 1 | P0 | 🎬 Multi-service auth refactoring | Security drift across services is expensive and risky. | Discover auth seams -> refactor shared patterns -> verify across services. | Repo-wide instructions, path-specific service rules, GitHub/filesystem tools, Context7 docs. | Security tests, contract tests, cross-service CI gates. | 15s GIF / live |
| 2 | P0 | 🎬 CI/CD workflow diagnosis & fix | Failed pipelines block delivery and frustrate teams. | Read logs -> patch workflow/script -> rerun until green. | GitHub MCP, Actions logs, workflow files, terminal context. | Check runs, rerun loop, branch protection, diff review. | 20s GIF / live |
| 3 | P0 | 🎬 Test coverage automation | Teams know the gaps but rarely prioritize them. | Find untested modules -> generate tests -> fix failures -> report delta. | Test conventions, existing patterns, coverage tooling, Context7 docs. | Coverage threshold gate, test runner feedback, reviewer checks. | 15s GIF / live |
| 4 | P0 | Spec-driven development (specs -> tasks -> code) | Requirements drift is one of the biggest causes of churn. | Specify -> plan -> task -> implement -> verify. | Speckit or OpenSpec skills, constitutions, specs, task plans. | Cross-ref validation, acceptance criteria, build + test verification. | Live demo / highlight reel |
| 5 | P0 | Multi-agent orchestration (Ralph v2) | Complex work needs specialization, memory, and structured review. | Orchestrator delegates to planner/executor/reviewer/librarian with iteration loops. | Custom agents, instructions, skills, session artifacts, knowledge base. | State machine, signal protocols, review checkpoints, promotion gates. | Slide walkthrough |
| 6 | P0 | Issue-to-PR feature delivery | Teams want help with the whole dev loop, not just snippets. | Read issue -> plan -> implement -> test -> open PR -> review -> iterate. | Repo instructions, GitHub MCP, task agents, code-review agent. | Check runs, PR templates, test suite, required review. | Narrated walkthrough |
| 7 | P1 | Documentation validation | Broken setup docs waste onboarding time and erode trust. | Follow docs as a new user -> fail -> patch docs/config -> rerun. | README/setup guides, filesystem tools, CLI, Playwright if UI is involved. | Executable setup validation, smoke tests, reproducible environment scripts. | Short screen capture |
| 8 | P1 | Ad-hoc browser testing for Blazor apps | Interactive UI regressions are hard to catch from code alone. | Launch app -> drive browser -> assert behavior -> capture evidence. | Playwright tooling, Blazor testing skill, app logs, screenshots. | Assertions, screenshots, repeatable test scripts, log review. | Live or GIF |
| 9 | P1 | End-to-end feature implementation | Story-to-prototype handoffs slow teams down. | Parse story -> scaffold backend + frontend -> write tests -> ship prototype. | Design specs, Context7, GitHub/filesystem tools, repo docs. | Code review, coverage checks, staged verification. | Case study |
| 10 | P1 | Legacy migration (Node.js -> TypeScript) | Modernization programs stall without disciplined incrementalism. | Inventory targets -> migrate incrementally -> typecheck -> fix CI. | TS config, migration guide, framework docs, repo conventions. | Type checker, incremental CI, rollback points. | Timelapse / slide card |
| 11 | P1 | Build a CLI plugin for Copilot | Teams need reusable workflows, not repeated prompt craft. | Scaffold plugin -> bundle -> install -> reuse. | copilot-plugin-creator, publish scripts, plugin manifest patterns. |
Plugin schema validation, body-size limits, reinstall verification. | Slide walkthrough |
| 12 | P1 | Create architecture diagrams from code | Architecture is often implicit and hard to explain. | Inspect code -> synthesize components -> render diagram. | mermaid-creator skill, repo structure, component docs. |
Diagram rendering/validation, review for correctness. | Quick live win |
| 13 | P2 | Generate atomic Git commits | Many teams lose history quality even when code is good. | Analyze diff -> group changes -> draft conventional commit. | git-atomic-commit, scope constitution, git metadata. |
Commit format validation, human diff review. | 60s quick win |
| 14 | P2 | Scaffold a Chrome extension | MV3 setup is repetitive and easy to get wrong. | Scaffold manifest + scripts -> wire features -> load and test. | chrome-extension skill, browser tooling, template patterns. |
Manifest validation, smoke test in browser, extension constraints. | Live snippet |
| 15 | P2 | Database query automation | Schema exploration and ops queries are still slow for many teams. | Connect -> inspect schema -> query -> explain results. | mssql-cli skill, connection-string parsing, SQL tooling. |
Read-only discipline, query review, environment separation. | Controlled demo |
| 16 | P2 | Visual slide decks from technical content | Engineers struggle to turn notes into clear visuals. | Outline -> generate slides -> validate presentation. | visual-explainer / frontend-skill, source notes, design prompts. |
HTML validation, accessibility review, fact-checking. | Backstage example |
These are the three strongest visual stories because they make the workshop thesis visible fast: better context produces better actions, and better harnesses produce safer outcomes.
| GIF candidate | What the audience sees | Context Engineering mapping | Harness Engineering mapping | Recording notes |
|---|---|---|---|---|
| 1. Multi-file auth refactoring | Prompt -> agent edits 4+ files/services -> tests pass. | Repo auth conventions, service-specific instructions, architecture map, security docs. | Contract tests, auth regression suite, cross-service CI gates. | Capture with ScreenToGif; show multiple tabs changing and end on green tests. |
| 2. CI/CD diagnosis & fix | Red pipeline -> log investigation -> YAML/script fix -> green rerun. | Workflow YAML, terminal logs, failure history, repo command guide. | Required check runs, rerun loop, branch protection, explicit diff review. | Keep the red-to-green arc visible; OBS MP4 backup is useful here. |
| 3. Test coverage automation | "Generate tests" -> new test files appear -> coverage jumps. | Existing test style, naming rules, fixtures, module behavior docs. | Coverage threshold, failing-test loop, reviewer check for meaningful assertions. | End on the coverage percentage change; trim to under 20 seconds. |
Recording workflow
- Primary recorder: ScreenToGif on Windows.
- Backup: OBS Studio for MP4, then convert/select highlights.
- Terminal-only variant: asciinema + agg.
- Post-process: trim to <20s, add captions, optimize for slide size.
- Problem: Legacy authentication logic drifts across services. Security fixes become repetitive, slow, and easy to miss.
- Agentic pattern: The agent researches current auth flows, proposes the target pattern, applies coordinated edits across services, and validates the change set.
- Customization / tooling:
- Context Engineering: repo-wide auth conventions, service maps, path-specific instructions, framework/API docs.
- Tooling: GitHub/filesystem access, terminal test runner, code review support.
- Guardrails:
- Harness Engineering: auth regression tests, token/claims contract tests, cross-service CI, reviewer sign-off for security-sensitive changes.
- Demoability: Excellent. This is the best visual demo because the audience sees a coordinated, cross-cutting change rather than a single-file edit.
- Takeaway: The win is not "AI writes auth." The win is "AI can safely execute a coordinated refactor when the repo already teaches it the rules."
- Problem: Pipeline failures create waiting time, context switching, and unclear ownership.
- Agentic pattern: The agent reads logs, identifies the likely root cause, patches workflow or script files, and reruns until the signal turns green.
- Customization / tooling:
- Context Engineering: workflow files, command conventions, job ownership notes, historical failure patterns.
- Tooling: GitHub MCP, Actions logs, terminal access, YAML/script editing.
- Guardrails:
- Harness Engineering: required check runs, rerun loop, branch protection, human review of CI config diffs.
- Demoability: Excellent. It has a clean red-to-green story and is relatable to nearly every engineering audience.
- Takeaway: Agents are not only code writers; they are effective debuggers and operators when they can observe real system feedback.
- Problem: Teams know where the testing gaps are, but those gaps rarely get scheduled.
- Agentic pattern: The agent finds untested or weakly tested modules, generates tests in the house style, runs the suite, fixes failures, and reports the coverage delta.
- Customization / tooling:
- Context Engineering: test naming conventions, fixture patterns, examples of good assertions, module behavior notes.
- Tooling: test runner, coverage tooling, docs lookup when framework APIs are unclear.
- Guardrails:
- Harness Engineering: coverage thresholds, failing tests block merge, reviewer checks for meaningful assertions over shallow snapshots.
- Demoability: Excellent. It is quantifiable, visually obvious, and easy to map to day-to-day team value.
- Takeaway: Harness engineering turns an open-ended request like "write tests" into a measurable outcome the team can trust.
- Problem: Prompt-only development loses requirements, edge cases, and shared alignment.
- Agentic pattern: Start from a constitution/spec, break the work into tasks, implement in sequence, and verify against the spec instead of vague intent.
- Customization / tooling:
- Context Engineering: constitutions, user stories, acceptance criteria, data models, domain vocabulary.
- Tooling: Speckit CLI or the workspace's OpenSpec skills.
- Guardrails:
- Harness Engineering: cross-reference validation, task decomposition, acceptance tests, build + test verification.
- Demoability: High. Works well as a 10-12 minute live walkthrough or a condensed highlight reel.
- Takeaway: In agentic engineering, the spec is structured context and the task/test chain is structured harness.
- Problem: A single generalist agent struggles when work requires planning, questioning, execution, review, and knowledge capture at the same time.
- Agentic pattern: A deterministic orchestrator delegates to specialized agents, isolates context windows, and moves the work through explicit states.
- Customization / tooling:
- Context Engineering: agent-specific skills, instructions, session state, project knowledge base, reusable planning artifacts.
- Tooling: Ralph agent fleet, session folders, knowledge promotion pipeline.
- Guardrails:
- Harness Engineering: state machine transitions, mailbox/signal protocols, review checkpoints, atomic commits, knowledge promotion gates.
- Demoability: Medium live, high as an architecture walk-through. Best used as the "what scaling looks like" example.
- Takeaway: Orchestration is what lets agentic engineering scale beyond a single clever prompt.
- Problem: Engineering teams want help with the whole delivery loop, not just with local code generation.
- Agentic pattern: The agent reads the issue, proposes a plan, implements the change, runs verification, opens a PR, and iterates on review comments.
- Customization / tooling:
- Context Engineering: repo instructions, coding standards, architecture docs, PR templates, issue conventions.
- Tooling: GitHub MCP, task agents, code-review agent, existing CI workflows.
- Guardrails:
- Harness Engineering: required checks, test suite, reviewer approval, commit/changelog conventions.
- Demoability: Good as a narrated workflow or screenshot sequence from issue -> diff -> PR -> checks.
- Takeaway: The real productivity lift comes when the repository teaches the agent how your team actually ships software.
- Problem: Onboarding and setup docs quietly decay, then cost teams time every week.
- Agentic pattern: The agent behaves like a first-time contributor, follows the guide step by step, records where it fails, patches the docs or missing config, and reruns.
- Customization / tooling:
- Context Engineering: README, onboarding guide, environment assumptions, sample env files, setup scripts.
- Tooling: CLI, filesystem edits, browser automation when docs include UI steps.
- Guardrails:
- Harness Engineering: executable setup validation in CI, smoke tests, reproducible environments, review for user clarity.
- Demoability: Medium live, high as a slide card because the pain point is universal.
- Takeaway: One of the most practical engineering uses for agents is improving the developer experience itself.
If the audience is earlier in their journey, these are easier to explain than Ralph or greenfield feature delivery:
- Generate atomic Git commits: tiny scope, immediate daily value, easy to map to team conventions.
- Create architecture diagrams from code: fast payoff, visually strong, low prerequisite knowledge.
- Build a CLI plugin for Copilot: strong example of turning a repeated prompt pattern into a reusable team asset.
- Start with a visual win: show the 3 GIFs quickly.
- Name the mechanism: explain that the visible difference comes from Context Engineering + Harness Engineering.
- Move to structured practice: use Spec-driven development as the clean bridge from theory to implementation.
- Finish with scale: use Ralph v2 and Issue-to-PR delivery to show what happens when these ideas become a system.
That sequence keeps the workshop practical for beginners while still giving advanced attendees a credible path toward deeper agentic workflows.