flightcheck: port BL-014 network validator + Workday REST diagnostic + contributor guardrails#94
Conversation
…y-auth scope gate
Port of ess-preflight-validator commit 9ed2055 (PowerShell
Test-NetworkConnectivity.ps1 + Export-FirewallRequirements.ps1) into the
Python FlightCheck framework as in-runner checks.
What's new
----------
- NET-001 / NET-002 / NET-003 checkpoints for Workday / ServiceNow / SAP
SuccessFactors outbound TCP+HTTPS reachability. Catches missing
firewall allow-list entries, SSL inspection, and proxy interference
WEEKS before deployment cutover, when corporate firewall change
requests still have time to land.
- New `--scope network` and standalone `--export-firewall-requirements`
modes on the FlightCheck CLI.
- Vendored `required-endpoints.json` (Workday + ServiceNow + SAP only;
Microsoft endpoints intentionally excluded, with links to Microsoft's
authoritative allowlist docs from the JSON).
Architecture notes
------------------
- New `_requires_microsoft_auth(scope)` gate in `cli.py` mirrors the
existing PVA scope gate so `--scope network` and
`--export-firewall-requirements` skip Dataverse / Graph / Power
Platform Admin auth entirely. Otherwise the new transport-only check
would still trigger an interactive MSAL prompt before any probe ran,
defeating the shift-left value of BL-014.
- The check uses an injectable `TcpProber` / `HttpsProber` protocol;
production uses stdlib `socket.create_connection` + `requests` HEAD,
tests substitute deterministic fakes. `responses` and `respx` can't
fake TCP socket refusal, so a probe-shaped abstraction is the
cleanest seam.
- Concurrent probes via `ThreadPoolExecutor` (default 8) so a single
silently-dropped packet doesn't serialize the whole check behind 5s
timeouts.
- Integration selection via `runner.config['network']`:
network.integrations: ["Workday", "ServiceNow", ...] # opt-in
network.servicenow_instance: "<instance>" # required
# for SN
No `workday_tenant` knob (Workday hostnames are data-center based,
not tenant-prefixed, per the JSON's `_hostingNote`).
- The cardinal rule (every external API call needs a validated /
validatable / documented mock) does NOT bind transport-level probes
that don't consume vendor API response contracts. A new "Vendor
TCP/HTTPS reachability" row in the API tier registry documents this
exception explicitly so future readers don't have to re-derive it.
Testing
-------
- 22 new tests across the 6 failure branches (REFUSED, TIMEOUT,
DNS_FAILURE, TLS_ERROR, HTTP_5XX, HTTP_4xx-still-reachable) plus
selection / config edge cases for `network.py`.
- Golden-file render tests for `firewall_export.py` against a fixed
timestamp.
- `test_cli_lazy_auth.py` pins `_requires_microsoft_auth` per-scope to
prevent future scopes from silently regressing the no-auth path.
Verified
--------
- `pytest tests/ -q` → 125 passed, 8 skipped (was 103 baseline; +22 new)
- `cli.py --export-firewall-requirements` writes markdown with no MS auth
- `cli.py --scope network` runs with "Skipping Microsoft auth (not
required for --scope network)" banner; probes execute end-to-end
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…erence Port of ess-preflight-validator commits 70065b9 (SSO test flow template) and 5eb19bc / 44ac58d (Test-WorkdayRESTEndpoints.ps1). Why a standalone diagnostic instead of an in-runner FlightCheck check -------------------------------------------------------------------- Workday REST endpoints accept only OAuth 2.0 bearer tokens, and obtaining one requires a customer-registered Workday OAuth API Client. That's the same chicken-and-egg problem documented in `tests/fixtures/cassettes/INDEX.md` under "Workday WQL config-validation pattern" — validating the customer set up Workday correctly is hard to automate because the validator itself needs the same kind of setup. The source PowerShell repo accepts this and ships an interactive customer-run script that prompts for the API Client credentials and opens a browser for the Authorization Code flow. We do the same here in Python so the runner's cassette-backed tier system stays intact: this diagnostic lives outside the FlightCheck runner, and the runner surfaces it via a single `WD-REST-MANUAL` `NotConfigured` checkpoint in `checks/workday.py` that fires ONLY when Workday is configured for the agent (so non-Workday customers see no noise). What's new ---------- - `scripts/diagnostics/test_workday_rest_endpoints.py` — Python OAuth Authorization Code diagnostic for all 9 ESS Workday REST connector actions. Same checkpoint IDs as the source PowerShell (`WD-REST-AUTH` / `WD-REST-ME` / `WD-REST-001`..`WD-REST-008`). Default UX is paste-the-redirect-URL (works with the conventional `https://localhost:8888/callback` redirect URI without shipping a self-signed TLS cert). Optional `--listen` flag spins up an HTTP loopback server when the redirect URI is `http://localhost` — for customers who registered an HTTP redirect URI specifically. - `scripts/diagnostics/README.md` — full operator-facing setup guide. - `scripts/flightcheck/checks/workday.py` adds the `WD-REST-MANUAL` checklist entry surfacing the diagnostic to operators. - `src/reference/workday-sso-test-flow/` — vendored Power Automate flow template that tests the OAuthUser Entra SSO connection via the `Get_Workers` SOAP operation. Reference content; not deployed by any script. - API tier registry note in the existing Workday WQL/REST row documenting that the runner deliberately doesn't automate these 9 endpoints and points readers at the standalone diagnostic. Secrets and PII hygiene ----------------------- - OAuth client secret, authorization code, access token, refresh token are NEVER logged. Token-endpoint error responses are reduced to status + error class so they can't leak the code or secret on failure. Mirrors the WS-Security UsernameToken redaction discipline in `checks/workday.py:_soap_call` (which has a CodeQL exemption row for the same reason — see commit 3ead8d7). - `state` parameter generated via `secrets.token_urlsafe(24)` and verified on the callback. Rejects pasted URLs whose state doesn't match (catches CSRF + stale-paste mistakes). - GetWorkerMe response PII (descriptor, primaryWorkEmail, businessTitle, primarySupervisoryOrganization.descriptor, raw WID) is redacted in the JSON output by default. `--include-pii` opt-in for in-tenant debugging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Port of ess-preflight-validator commit 2ac16e9, adapted to this repo's conventions. What's new ---------- - `.github/PULL_REQUEST_TEMPLATE.md`: new "Pre-PR Checklist (REQUIRED)" section before the existing checklist. - `CONTRIBUTING.md`: same checklist as a top-level subsection, cross-linked from / to the PR template. Translations from the source's PowerShell-shaped checklist: - "Rebased on latest `master`" → "Rebased on latest `main`" - "Script runs without errors" → "`pytest tests/ -q` clean and `cli.py --help` parses" - "`-OutputPath` defaults to `Desktop\\ESS-Reports\\…`" → "Output paths default to `workspace/flightcheck/…`" - "Suite integration — new tests are wired into `Invoke-*ValidationSuite.ps1` as opt-in switches" → "FlightCheck integration — new checks are wired into `cli.py` `SCOPE_MAP` and `FULL_SCOPE`; tests live under `tests/flightcheck/checks/`" - New line: "API tier registry honored — new external API calls reference the tier in `tests/fixtures/cassettes/INDEX.md`" (specific to this repo's cassette-tier discipline; not present in the source). Why "rebased on latest `main`" matters: stale branches can silently delete files added after the branch was cut. The source's callout text is preserved (with the right repo's naming). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
|
||
| def test_includes_all_endpoint_hosts(self, catalog_path: Path, tmp_path: Path) -> None: | ||
| text = _render(catalog_path, tmp_path) | ||
| assert "wd2-impl-services1.workday.com" in text |
| def test_includes_all_endpoint_hosts(self, catalog_path: Path, tmp_path: Path) -> None: | ||
| text = _render(catalog_path, tmp_path) | ||
| assert "wd2-impl-services1.workday.com" in text | ||
| assert "wd5.myworkday.com" in text |
| ) -> None: | ||
| text = _render(catalog_path, tmp_path, | ||
| config={"network": {"servicenow_instance": "contoso"}}) | ||
| assert "contoso.service-now.com" in text |
|
|
||
| # Confirm placeholder substitution happened. | ||
| sn_hosts = [call[0] for call in tcp.calls] | ||
| assert "contoso.service-now.com" in sn_hosts |
nehaoss
left a comment
There was a problem hiding this comment.
Nice work on the network validator port. The OAuth flow, PII redaction, auth gating, firewall export, and test coverage are well-structured. Found one counting bug in the HTTPS failure aggregation.
| warning_count = sum( | ||
| 1 for _, t, h in probe_results | ||
| if t.status == ProbeStatus.REACHABLE | ||
| and h.status in (ProbeStatus.HTTP_5XX, ProbeStatus.TLS_ERROR) |
There was a problem hiding this comment.
Bug: Incomplete HTTPS failure mode counting causes false 'Passed' status
warning_count only counts HTTP_5XX and TLS_ERROR HTTPS outcomes, but _RequestsHttpsProber.probe() can also return TIMEOUT (from RequestsTimeout, line ~147) and REFUSED (from RequestsConnectionError, line ~150).
When TCP succeeds but HTTPS returns either of these, the endpoint isn't counted in any of the three aggregation categories (reachable_count, warning_count, or failed_count), so the status falls through to Passed. The detail text does print [WARN] for these cases (line ~324), creating contradictory output.
Suggested fix:
and h.status in (ProbeStatus.HTTP_5XX, ProbeStatus.TLS_ERROR, ProbeStatus.TIMEOUT, ProbeStatus.REFUSED)
nehaoss
left a comment
There was a problem hiding this comment.
Large PR (2603 additions across 16 files) — porting a network validator + Workday REST diagnostic + contributor guardrails. Comments:
-
Workday REST diagnostic standalone design: The rationale for placing this outside FlightCheck (chicken-and-egg OAuth problem) is well documented in the README. The decision to treat HTTP 400/422 as PASS for write endpoints is pragmatic — endpoint reachability is the goal, not business logic.
-
Network check (
etwork.py): At 407 lines this is a substantial new check. Please confirm it's covered by the 381-line test file (\ est_network.py) — ratio looks healthy but verify all branches are exercised. -
equired-endpoints.json\ config: Externalizing the endpoint list as JSON is a good separation of concerns. Makes it easy for customers to audit/extend without reading Python. -
*\ est_cli_lazy_auth.py* (85 additions): Good to test that the CLI doesn't eagerly authenticate when running checks that don't need it — performance and UX improvement.
-
Scope concern: 16 files across diagnostics, flightcheck checks, tests, fixtures, reference docs, and PR template changes. Consider whether this could be split into 2-3 smaller PRs for easier review (e.g., network validator separate from Workday REST diagnostic separate from contributor guardrails). If not, consider a more detailed PR description mapping each file to its purpose.
-
PR template changes overlap with PR #105: Both PRs modify .github/PULL_REQUEST_TEMPLATE.md. Whoever merges second will have a conflict — coordinate merge order.
Description
Port of 5 commits from
JohnQuinn-Dev/ess-preflight-validator(since 2026-05-01) into this repo's Python FlightCheck framework.Source commits → ported as:
9ed2055checks/network.py,checks/firewall_export.py,config/required-endpoints.json, lazy-auth gate incli.py, new "Vendor TCP/HTTPS reachability" tier-registry row5eb19bc+44ac58dscripts/diagnostics/test_workday_rest_endpoints.py(standalone Python OAuth diagnostic) +WD-REST-MANUALchecklist entry inchecks/workday.py+ tier-registry note70065b9src/reference/workday-sso-test-flow/with adapted README2ac16e9.github/PULL_REQUEST_TEMPLATE.md+CONTRIBUTING.md, translated to this repo's conventionsRelated issue
Refs the BL-014 backlog item from the source repo (no equivalent issue tracked here yet — happy to file one if maintainers want it for changelog purposes).
Type of change
Testing
New test modules:
tests/flightcheck/checks/test_network.py(15 tests covering the 6 failure modes + selection/config edge cases)tests/flightcheck/checks/test_firewall_export.py(7 tests, golden-file markdown render)tests/flightcheck/test_cli_lazy_auth.py(10 tests pinning per-scope auth requirement)Smoke-tested in a temp working dir with a minimal
.local/config.json:cli.py --export-firewall-requirements→ writesworkspace/flightcheck/firewall-requirements.md, no Microsoft auth attemptedcli.py --scope network→ runs withSkipping Microsoft auth (not required for --scope network)banner, executes probesdiagnostics/test_workday_rest_endpoints.py --help→ parses cleanlyFull E2E run of the Workday REST diagnostic requires a customer Workday tenant + an OAuth API Client, which is the documented design — see
scripts/diagnostics/README.md.Key design decisions
After the rubber-duck pass caught 10 issues in the initial plan, the final design:
tests/fixtures/cassettes/INDEX.md's "Workday WQL config-validation pattern" section ("doesn't escape the chicken-and-egg"). The source repo didn't solve this either — it accepts the customer doing the OAuth API Client registration as a prerequisite. Faithful port = same interactive UX, just in Python._requires_microsoft_auth(scope). Without it,--scope networkwould trigger an interactive MSAL prompt before any probe ran — defeating the shift-left value. Mirrors the existing PVA scope gate atcli.py:162(which is the precedent).TcpProber/HttpsProberprotocol, notresponses/respx. Those libraries only mock HTTP-level traffic and can't fakesocket.create_connectionrefusal. A protocol abstraction is the cleanest testable seam.requests(nothttpx).httpxis NOT insolutions/ess-maker-skills/scripts/requirements.txt— verified before coding.requestsis, so we stay on it.--listenflag activates a stdlibhttp.serverloopback only when--redirect-uriishttp://localhost...(advanced customers who registered an HTTP redirect URI).WD-REST-MANUALis conditional. Only fires when_workday_flowswas already detected — non-Workday customers see no noise.--include-piiopt-in.Pre-PR Checklist (REQUIRED)
main— branched from887dd41, no rebase neededpytest tests/ -qclean;cli.py --helpparsesworkspace/flightcheck/; write tests in the Workday REST diagnostic are opt-in via--include-write-testsnetworkscope wired intoSCOPE_MAPandFULL_SCOPE; tests undertests/flightcheck/checks/; newtest_cli_lazy_auth.pypins the scope-to-auth mappingtests/fixtures/cassettes/INDEX.md, newscripts/diagnostics/README.md, newsrc/reference/workday-sso-test-flow/README.md,cli.pymodule docstring scopes listChecklist
Notes for reviewer
cli.py; commit 2 is the largest by line count (mostly the diagnostic + its README); commit 3 is small docs-only.WD-REST-MANUALNotConfiguredcheckpoint will show up in any FlightCheck run scope that includes Workday — by design (it's a deployment checklist item). Let me know if you'd rather scope it tighter (e.g. only--scope publishing).CHANGELOG.md/VERSION.txthousekeeping from commit9ed2055is out of scope (this repo isn't versioned the same way; no top-level CHANGELOG). Happy to add one if maintainers want it.ESS-Validator-Session.ps1interactive integration-selection menu is replaced by the config-drivennetwork.integrations/network.servicenow_instancekeys in.local/config.json— fits this repo's non-interactive CLI model.