Skip to content

feat: auto-tiered sync, parquet sidecar cache, incremental extraction#127

Open
BeArchiTek wants to merge 5 commits into
mainfrom
feat/parquet-sidecar-cache
Open

feat: auto-tiered sync, parquet sidecar cache, incremental extraction#127
BeArchiTek wants to merge 5 commits into
mainfrom
feat/parquet-sidecar-cache

Conversation

@BeArchiTek
Copy link
Copy Markdown
Contributor

@BeArchiTek BeArchiTek commented May 21, 2026

  • New Features

    • Incremental extraction with cursor-based change tracking (TIMESTAMP, PAGE_TOKEN, INFRAHUB_DIFF tiers).
    • Auto-computed write-order tiers from schema mappings; order field is now optional.
    • Cached plan generation (diff) and apply operations for repeatable syncs.
    • Rowcount guardrails and parallel tier-by-tier syncing via --parallel flag.
    • New CLI flags: --concurrent-load, --full-extract, --allow-rowcount-drop, --continue-on-error, --run-id.
  • Documentation

    • Updated CLI reference, cache layout, and incremental extraction guides.
    • Simplified README; updated config references for auto-tiered execution.
  • Chores

    • Updated example configs to use auto-computed tiers.
    • Added .infrahub-sync-cache/ to .gitignore.

Three layered features on top of the recently-landed ty migration, plus a CLI defaults flip and a bench harness. One commit so the history is easy to review and revert as a unit.

What's in here

1. Auto-tiered sync order

  • infrahub_sync/dependency_graph.py: build_dependency_graph, compute_tiers (with optional-edge cycle breaking), flatten_tiers.
  • SyncConfig.compute_order() falls back to auto-tiered when order: is omitted.
  • infrahub-sync sync --parallel (now default on) runs tier-by-tier via Potenda.sync_in_tiers; warns and falls back to serial when order: is set explicitly.
  • All examples/*/config.yml have their order: lists commented out with an "uncomment to override" note.

2. Parquet sidecar cache + apply workflow

  • infrahub_sync/cache/ scaffold: atomic Parquet writes (PLAN_SCHEMA, ERRORS_SCHEMA, per-resource write_resource_side with _extract_ts / _source_id / _tombstone), JSON sidecars (cursors, rowcounts, run.json, schema-sub-hash), cross-process pipeline_lock, and a RowcountGuardrail that rejects >50% drops without --allow-rowcount-drop.
  • Potenda writes side A/B parquet snapshots after each load and serializes the diff into plan.parquet.
  • infrahub-sync diff wraps in pipeline_lock and persists run.json + plan.parquet.
  • New infrahub-sync apply --run-id <id> re-applies a cached plan; refuses on schema-subhash mismatch.

3. Concurrent loads + sync writes cache artifacts

  • Potenda loads source and destination concurrently by default (--concurrent-load). Disable for non-thread-safe custom adapters.
  • infrahub-sync sync (serial and --parallel) writes plan.parquet + run.json + baseline rowcounts on success.
  • New --continue-on-error logs and skips peer relationships whose identifiers are missing instead of aborting.

4. Incremental (changed-since) extraction

  • DiffSyncMixin adapter contract: cursor_tier_for, list_changed_since, list_existing_ids. Default tier NONE.
  • Potenda.load_one_side picks incremental vs full based on a schema-subhash gate, a RunCounterFile-driven periodic full-resync cadence (default every 10 runs, configurable via incremental.full_resync_every), and --full-extract.
  • Adapters wired:
    • NetBox / Nautobot: TIMESTAMP via endpoint.filter(last_updated__gte=…). Nautobot gracefully falls back to a full extract when an endpoint rejects the filter (predicate checks status_code == 400 and parses the response JSON, not the message wording).
    • Infrahub: TIMESTAMP via GraphQL node_metadata__updated_at__after.

5. CLI defaults flip

--parallel, --full-extract, and--concurrent-load default to True.
A stock infrahub-sync sync runs tier-by-tier on the auto-computed dep graph and re-extracts every resource each run.

The cursor-driven warm path is opt-in via --no-full-extract.

Bench numbers (nautobot-v2 demo dataset, interfaces dropped, ~14 kinds)

Scenario Cold Warm
baseline (serial, no concurrent) 463.8s 154.3s
--parallel + --concurrent-load 437.5s 132.3s
--no-full-extract (cursor-driven warm) 434.5s 6.3s

Reproducer: scripts/bench-clean-nautobot.sh rebuilds Infrahub between scenarios so each cold+warm pair sees a fresh destination.

Docs

  • docs/docs/guides/run.mdx — diff/apply/sync flag tables updated.
  • docs/docs/reference/cli.mdx — regenerated.
  • docs/docs/reference/cache-layout.mdx — new.
  • docs/docs/reference/incremental-extraction.mdx — new.
  • examples/nautobot-v2_to_infrahub/config.yml refreshed.

Test plan

  • CI: ruff + ty pass across the 3.10/3.11/3.12/3.13 matrix.
  • Local: uv run invoke lint exits 0.
  • Local: uv run pytest -o "addopts=" tests/ -q → 93 passed (13 known fails on tests/adapters/test_{netbox,nautobot}_incremental.py due to optional pynetbox/pynautobot not in the basic dev env).
  • Local: uv run infrahub-sync list --directory examples/ lists all projects.
  • Local: uv run infrahub-sync --help, diff --help, sync --help, apply --help exit 0.
  • Local bench reproduces the table above on a fresh Infrahub + the demo Nautobot.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added incremental extraction support with cursor-based change detection for improved sync performance
    • Implemented auto-tiered execution that automatically derives write order from schema relationships
    • Added parallel tier-by-tier sync orchestration with --parallel flag
    • Introduced per-run caching with Parquet snapshots for plan reuse and audit trails
    • Added --concurrent-load flag to speed up dual-side data loading
    • Added --full-extract / --no-full-extract flags to control extraction mode
    • Added rowcount guardrails (--allow-rowcount-drop) to prevent accidental data loss
    • Added --continue-on-error flag to tolerate missing peer identifiers
    • New apply command to replay cached sync plans
  • Documentation

    • Enhanced CLI reference with new command options and parameters
    • Added guide for incremental extraction configuration and behavior
    • Added cache layout documentation describing run state persistence
    • Updated configuration reference explaining auto-tiered execution
    • Updated example configurations to use auto-tiering
  • Improvements

    • Improved error messages for peer identifier issues
    • Added schema compatibility validation for cached plan application

Review Change Stack

Adds three layered features to `infrahub-sync` (auto-tiering →
parquet/sidecar cache → cursor-driven incremental) plus a CLI
defaults flip and a bench harness, all on a single commit toward main
now that `chore/migrate-to-ty` is merged.

## 1. Auto-tiered sync order

* `infrahub_sync/dependency_graph.py`: `build_dependency_graph`,
  `compute_tiers` (with optional-edge cycle breaking), `flatten_tiers`.
* `SyncConfig.compute_order()` falls back to auto-tiered when `order:`
  is omitted from `config.yml`.
* `infrahub-sync sync --parallel` (now default on) runs tier-by-tier
  via `Potenda.sync_in_tiers`; warns and falls back to serial when
  `order:` is set explicitly.
* All `examples/*/config.yml` have their `order:` lists commented out
  with an "uncomment to override" note — auto-tier is the default path.

## 2. Parquet sidecar cache + apply workflow

* `infrahub_sync/cache/` scaffold: run_id allocation, atomic Parquet
  writes (`PLAN_SCHEMA`, `ERRORS_SCHEMA`, per-resource
  `write_resource_side` with `_extract_ts` / `_source_id` /
  `_tombstone`), JSON sidecars (cursors, rowcounts, run.json,
  schema-sub-hash), cross-process `pipeline_lock`, and a
  `RowcountGuardrail` that rejects >50 % drops without
  `--allow-rowcount-drop`.
* `Potenda` writes side A/B parquet snapshots after each load and
  serializes the diff into `plan.parquet`.
* `infrahub-sync diff` wraps in `pipeline_lock` and persists run.json
  + plan.parquet.
* New `infrahub-sync apply --run-id <id>` re-applies a cached plan;
  refuses on schema-subhash mismatch.

## 3. Concurrent loads + sync writes cache artifacts

* `Potenda` loads source and destination concurrently by default
  (`--concurrent-load`). Disable for non-thread-safe custom adapters.
* `infrahub-sync sync` (serial and `--parallel`) writes plan.parquet
  + run.json + baseline rowcounts on success.
* New `--continue-on-error` logs and skips peer relationships whose
  identifiers are missing instead of aborting.

## 4. Incremental (changed-since) extraction

* `DiffSyncMixin` adapter contract: `cursor_tier_for`,
  `list_changed_since`, `list_existing_ids`. Default tier is NONE.
* `Potenda.load_one_side` chooses incremental vs full based on a
  schema-subhash gate, a `RunCounterFile`-driven periodic full-resync
  cadence (default every 10 runs, configurable via
  `incremental.full_resync_every`), and the `--full-extract` CLI flag.
* `infrahub_sync/cache/incremental.py`: `previous_successful_run_dir`,
  `should_use_incremental`, `hydrate_from_parquet` (replays prior
  side snapshots into the adapter store), and per-side cursor sidecars
  (`load_cursors` / `persist_cursors`).
* Adapters wired:
  - NetBox / Nautobot: TIMESTAMP via pynetbox/pynautobot
    `endpoint.filter(last_updated__gte=…)`. Nautobot gracefully falls
    back to a full extract when an endpoint rejects the filter — the
    fallback predicate checks `status_code == 400` and parses the
    response JSON, not the message wording.
  - Infrahub: TIMESTAMP via GraphQL
    `node_metadata__updated_at__after`.
* Snapshots include identifier fields (DiffSync needs them on
  hydrate); destination re-snapshots post-sync.

## 5. CLI defaults flip

`--parallel` and `--full-extract` now default to True; `--concurrent-load`
already did. A stock `infrahub-sync sync` runs tier-by-tier on the
auto-computed dep graph and re-extracts every resource each run; the
cursor-driven warm path is opt-in via `--no-full-extract`.

## 6. Bench

`scripts/bench-clean-nautobot.sh` rebuilds Infrahub between scenarios so
each cold+warm pair sees a fresh destination. Recent results on the
nautobot-v2 demo dataset (interfaces dropped, ~14 kinds):

| Scenario                                    |  Cold |   Warm |
| ------------------------------------------- | -----: | -----: |
| baseline (serial, no concurrent)            | 463.8s | 154.3s |
| --parallel + --concurrent-load              | 437.5s | 132.3s |
| --no-full-extract (cursor-driven warm)      | 434.5s |   6.3s |

`tasks/bench.py` provides an `invoke bench.run` task that compares the
auto-tier × concurrent-load matrix; pass `--scenarios` to subset it.

## Docs

* `docs/docs/guides/run.mdx` — diff/apply/sync flag tables updated.
* `docs/docs/reference/cli.mdx` — regenerated from Typer.
* `docs/docs/reference/cache-layout.mdx` — new: on-disk layout +
  apply / `--allow-rowcount-drop` semantics.
* `docs/docs/reference/incremental-extraction.mdx` — new: activation
  conditions, supported backends, soft-delete cadence,
  `--no-full-extract` opt-in.
* `examples/nautobot-v2_to_infrahub/config.yml` refreshed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@BeArchiTek BeArchiTek added type/breaking-change Breaking Changes type/feature New feature or request labels May 21, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 763f205e-b416-4d74-9e92-676a1fc2f4cc

📥 Commits

Reviewing files that changed from the base of the PR and between f8734bc and b25942a.

📒 Files selected for processing (14)
  • .gitignore
  • examples/infrahub_to_peering-manager/config.yml
  • examples/peering-manager_to_infrahub/config.yml
  • examples/peeringdb_to_infrahub/config.yml
  • infrahub_sync/__init__.py
  • infrahub_sync/adapters/nautobot.py
  • infrahub_sync/adapters/netbox.py
  • infrahub_sync/cache/paths.py
  • infrahub_sync/cache/sidecars.py
  • infrahub_sync/cli.py
  • infrahub_sync/dependency_graph.py
  • infrahub_sync/potenda/__init__.py
  • infrahub_sync/utils.py
  • tests/cache/test_locks.py
✅ Files skipped from review due to trivial changes (2)
  • examples/infrahub_to_peering-manager/config.yml
  • .gitignore
🚧 Files skipped from review as they are similar to previous changes (7)
  • tests/cache/test_locks.py
  • infrahub_sync/cache/paths.py
  • infrahub_sync/dependency_graph.py
  • infrahub_sync/cache/sidecars.py
  • infrahub_sync/adapters/netbox.py
  • infrahub_sync/adapters/nautobot.py
  • infrahub_sync/cli.py

Walkthrough

Adds incremental extraction with cursor tiers, Parquet-backed cache (snapshots, plan, sidecars), auto-tiered write-order computation, and CLI run locking/status with apply-from-plan support. Updates adapters (Infrahub, NetBox, Nautobot) for incremental methods and error handling. Extends Potenda to orchestrate cached loads, concurrent loading, plan serialization/application, rowcount guardrails, and tiered sync. Documentation, examples, sidebar, scripts, tasks, and comprehensive tests are included.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 21, 2026

Deploying infrahub-sync with  Cloudflare Pages  Cloudflare Pages

Latest commit: b25942a
Status: ✅  Deploy successful!
Preview URL: https://da409ad2.infrahub-sync.pages.dev
Branch Preview URL: https://feat-parquet-sidecar-cache.infrahub-sync.pages.dev

View logs

`tests/adapters/test_nautobot_incremental.py:80` imports `pynautobot`
inside the fallback regression test. pynautobot is an optional extra,
not in `--extra dev`, so ty's import resolution fails in CI even
though the test is gated and skipped at runtime when the lib is missing.
Add the same `# ty: ignore[unresolved-import]  # optional dep, see
pyproject extras` annotation already used on the adapter file's top
import.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_potenda_parallel.py (1)

85-93: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Strengthen tier-boundary assertion to require full tier-0 completion.

Line 90 currently only proves that one Tag/Role happened before Device. The docstring/comment says no Device call should occur before the previous tier is done. Assert both Tag and Role were seen before any Device.

Suggested assertion update
-    seen_tier0 = False
+    seen_tier0: set[str] = set()
     for _, kind in by_time:
         if kind == "Device":
-            assert seen_tier0
+            assert seen_tier0 == {"Tag", "Role"}
         if kind in {"Tag", "Role"}:
-            seen_tier0 = True
+            seen_tier0.add(kind)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_potenda_parallel.py` around lines 85 - 93, The current loop in
tests/test_potenda_parallel.py only checks that some Tag/Role occurred before a
Device by using seen_tier0; update the assertion to require that both a Tag and
a Role have completed before any Device call. Replace the boolean seen_tier0
with two flags (e.g., seen_tag and seen_role) and in the for-loop over by_time
(sorted dst.calls) set these flags when kind == "Tag" or kind == "Role", and
change the Device check to assert seen_tag and seen_role are True before
allowing a Device entry.
🧹 Nitpick comments (4)
tests/cache/test_sidecars.py (1)

37-46: ⚡ Quick win

Assert all persisted run metadata fields you set.

This test writes mode and summary["diff_rows"] but doesn’t verify them after reload, so a partial persistence regression could pass unnoticed.

Proposed test delta
 def test_run_file_records_status(tmp_path: Path) -> None:
     f = RunFile.load_or_default(tmp_path / "run.json")
     f.status = "dry-run"
     f.mode = "diff"
     f.summary = {"resources": 17, "diff_rows": 42}
     f.save()
     g = RunFile.load_or_default(tmp_path / "run.json")
     assert g.status == "dry-run"
+    assert g.mode == "diff"
     assert g.summary["resources"] == 17
+    assert g.summary["diff_rows"] == 42
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/cache/test_sidecars.py` around lines 37 - 46, The test
test_run_file_records_status sets RunFile fields status, mode, and summary
(including summary["diff_rows"]) but only asserts status and
summary["resources"]; update the test to assert that mode was persisted and that
summary["diff_rows"] equals 42 after reloading via RunFile.load_or_default so
mode and the nested summary key are verified alongside status and resources.
tests/adapters/test_nautobot_incremental.py (1)

30-31: ⚡ Quick win

Scope the _create_nautobot_client patch to avoid cross-test leakage.

Direct class-level assignment in _make_adapter can leak globally across tests. Prefer patch.object(...) (or pytest monkeypatch) scoped to adapter construction.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/adapters/test_nautobot_incremental.py` around lines 30 - 31, The test
currently assigns NautobotAdapter._create_nautobot_client at class level which
can leak the stub across tests; instead, scope the patch to adapter construction
by using patch.object(NautobotAdapter, "_create_nautobot_client", ...) or
pytest's monkeypatch.setattr inside the test or the _make_adapter helper so the
MagicMock is applied only for the duration of creating the NautobotAdapter
instance (reference NautobotAdapter._create_nautobot_client and the
_make_adapter/_make_adapter-like helper used to return the adapter).
infrahub_sync/cache/sidecars.py (1)

46-103: ⚡ Quick win

Add concise docstrings to the remaining public sidecar classes.

RowcountsFile, RunFile, and SchemaHashFile are public and currently undocumented.

📝 Suggested update
 `@dataclass`
 class RowcountsFile:
+    """Stores per-resource row counts for a run."""
     path: Path
     counts: dict[str, int] = field(default_factory=dict)

@@
 `@dataclass`
 class RunFile:
+    """Stores run lifecycle metadata (status/mode/summary/timestamps)."""
     path: Path
@@
 `@dataclass`
 class SchemaHashFile:
+    """Stores the schema sub-hash used to validate cache compatibility."""
     path: Path
     value: str = ""

As per coding guidelines, infrahub_sync/**/*.py: Public functions and classes require concise docstrings.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@infrahub_sync/cache/sidecars.py` around lines 46 - 103, Add concise
docstrings to the public sidecar classes RowcountsFile, RunFile, and
SchemaHashFile by placing a short one- or two-sentence description immediately
under each class definition that explains the purpose of the class and its
important attributes/methods (e.g., RowcountsFile: stores resource counts and
provides load_or_default, set, get, save; RunFile: tracks run metadata and uses
KEYS with load_or_default and save; SchemaHashFile: stores schema hash with load
and save). Keep each docstring brief, in imperative/descriptive form, and
mention any return semantics where relevant (e.g., load_or_default may return a
default instance when the path does not exist).
infrahub_sync/potenda/__init__.py (1)

265-274: 💤 Low value

Second future's exception may be swallowed if both fail.

When using wait([src_fut, dst_fut], return_when=FIRST_EXCEPTION), if both futures fail, the loop at lines 273-274 will re-raise the first failure from src_fut.result() but never check dst_fut.result(). This is typically fine since we want to surface at least one error, but the second error's details are lost.

This is acceptable behavior for a fail-fast strategy, but consider logging the second exception if it exists, so operators can see all failures.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@infrahub_sync/potenda/__init__.py` around lines 265 - 274, The current use of
ThreadPoolExecutor with wait(..., return_when=FIRST_EXCEPTION) may re-raise only
the first future's exception when iterating over src_fut.result() and
dst_fut.result(), losing the second failure's details; update the block that
submits self.source_load and self.destination_load (src_fut and dst_fut) to call
.result() for each future inside a try/except that re-raises the first exception
but also captures and logs (using the module logger) any secondary exception
from the other future so both errors are preserved; keep the fail-fast behavior
(FIRST_EXCEPTION) but ensure destination_load/source_load exceptions are logged
if they occur after the first raised error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/docs/reference/config.mdx`:
- Around line 39-41: The docs currently imply --parallel is opt-in; update the
infrahub-sync sync docs to state that --parallel is enabled by default and
describe how to disable it (mention the flag name --no-parallel or the CLI
equivalent), and clarify that the engine processes one tier at a time per
destination by narrowing top_level but runs multiple tiers in parallel across
destinations when enabled; update the sentence starting with "Set `--parallel`
on `infrahub-sync sync`" to reflect default-on behavior and include the disable
example and short note about the default semantics.

In `@examples/infrahub_to_peering-manager/config.yml`:
- Around line 16-25: The example config contains an embedded API token (the
literal token on the config line referenced in the comment) that must be
redacted; replace the hard-coded token with a placeholder or environment
variable reference (e.g. INFRAHUB_API_TOKEN or a clearly marked
<REDACTED_TOKEN>) and update any corresponding example usage/comments to
instruct using an env var, and scan the rest of examples for any other
credential-like literals to remove or replace similarly.

In `@examples/netbox_to_infrahub/config.yml`:
- Line 25: Replace the real-looking API token value for the token key in the
example config with a redacted placeholder (e.g. REPLACE_WITH_API_TOKEN or
<REDACTED>) so no credential-like secrets are shipped in examples; locate the
token: entry in the examples/netbox_to_infrahub/config.yml and update its value
accordingly.

In `@examples/peeringdb_to_infrahub/config.yml`:
- Around line 24-27: Redact the sensitive token value in the example config
(replace the literal token string under the token key with a placeholder like
"<REDACTED_TOKEN>" or an instruction to set via env) and fix the typo in the
config key from dyffsync_flags to diffsync_flags so the example uses the correct
setting name (ensure any values under diffsync_flags remain valid YAML/list
format).

In `@infrahub_sync/adapters/netbox.py`:
- Around line 69-75: The _resolve_endpoint function currently lets a bare
AttributeError bubble up for bad mapping strings; wrap the loop that calls
getattr in a try/except AttributeError and raise a specific config validation
error (e.g., ConfigValidationError or ConfigError — create it if it doesn't
exist) with a clear message that includes the offending mapping value and chain
the original exception (raise ... from e) so callers get both context and the
underlying error; ensure you only catch AttributeError (don't use a broad except
Exception:).

In `@infrahub_sync/cache/__init__.py`:
- Around line 21-31: compute_schema_subhash() currently builds payload only from
schema_mapping names/identifiers/field names, which omits mapping-critical
inputs; update the payload construction (the payload variable used in
compute_schema_subhash) to serialize full mapping semantics for each entry in
config.schema_mapping — include mapping definitions, mapping type/flags (e.g.,
field mapping vs reference vs static), per-field mapping details, filters,
transforms and any other mapping-specific options — ensure deterministic
ordering (sort lists/keys) before hashing so changes to mapping, filters or
transforms will change the sub-hash and prevent stale plan/cursor reuse.

In `@infrahub_sync/cache/cursors.py`:
- Around line 20-35: Add concise class docstrings to the public classes
CursorTier and CursorState: for CursorTier, document that it is an IntEnum
representing cache cursor tiers and briefly list/describe the enum members
(NONE, PAGE_TOKEN, TIMESTAMP, INFRAHUB_DIFF); for CursorState, document that it
is a frozen dataclass holding a CursorTier and an optional value, describe the
meaning of the fields (tier and value) and note the __post_init__ constraint
that non-NONE tiers require a non-None value. Place the docstrings immediately
under each class declaration.

In `@infrahub_sync/cache/guardrails.py`:
- Around line 22-29: Add concise docstrings to the public class
RowcountGuardrail and its public method check describing their purpose,
parameters, and behavior: document that RowcountGuardrail tracks previous row
counts, what drop_threshold and allow_drop control, and that triggered records
store resources that exceeded the threshold; for check(resource, *, current)
document parameters (resource: str, current: int), what constitutes a trigger
(percentage drop > drop_threshold), side effects (appends to triggered, raises
or blocks drops depending on allow_drop), and return type (None). Place the
docstrings immediately above the class definition and the check method using
standard triple-quote format.

In `@infrahub_sync/cache/paths.py`:
- Around line 18-21: Validate that sync_name and run_id are single safe path
segments before joining: use pathlib.PurePath(sync_name).parts (and same for
run_id) and reject values where len(parts) != 1 or parts[0] in (".","..") or
os.path.isabs(sync_name) (and run_id). On validation failure raise ValueError
(or a clear custom exception) instead of joining; then join only the validated
segment to the base (INFRAHUB_SYNC_CACHE_DIR or
Path.cwd()/".infrahub-sync-cache"). Apply the same checks where run_id is joined
(the other block mentioned around lines 34-36).

In `@infrahub_sync/cli.py`:
- Around line 260-265: When catching ValueError from ptd.load_both_sides() in
the serial path, set run_file.status = "failed" and persist the run file before
calling print_error_and_abort so run.json no longer incorrectly shows "running";
i.e., in the except block assign run_file.status = "failed" and call the run
file's save/persist method (e.g., run_file.save() or run_file.write()) then call
print_error_and_abort(str(exc)).

In `@tests/adapters/test_nautobot_incremental.py`:
- Line 54: Remove the unused type-ignore annotations on the assignment lines
that set adapter.client.dcim.devices = fake_endpoint (and the equivalent
assignments with fake_endpoint elsewhere in the same test file); delete the
trailing "# ty: ignore[unresolved-attribute]" comments from those statements so
the linter no longer reports unused suppressions, then run the test/lint suite
to confirm nothing else breaks.

In `@tests/cache/test_concurrent_load.py`:
- Around line 50-53: The current check uses only src_start < dst_end which can
be true for sequential runs; instead compute both start and end timestamps for
each adapter (use next(... for label == "start") and next(... for label ==
"end") on src.events and dst.events to get src_start, src_end, dst_start,
dst_end) and assert interval overlap by asserting src_start < dst_end and
dst_start < src_end so the time ranges actually intersect.

In `@tests/cache/test_locks.py`:
- Around line 23-37: The test test_pipeline_lock_excludes_concurrent_run is
using time.sleep to wait for the child process to acquire the lock, which is
flaky; change it to coordinate with the child via a multiprocessing.Event
(create an Event in the parent, pass it into _hold_lock, have _hold_lock set the
Event once it holds the lock), wait on that Event with a bounded timeout before
asserting pipeline_lock("p1", timeout=0.05) raises Timeout, and ensure the child
process is always cleaned up in a finally block (join with timeout and terminate
if still alive) so teardown is bounded.

In `@tests/test_dependency_graph.py`:
- Around line 106-110: The test currently asserts completeness against cfg.order
(loop over cfg.order), which is skipped when order is omitted; change the
assertion to use the schema-mapping names instead: compute the expected name set
from cfg.schema_mapping (e.g., its keys) and then assert each name from that set
appears in flat = set(flatten_tiers(tiers)), replacing the loop that references
cfg.order so the test catches tiering regressions even when order is absent.

---

Outside diff comments:
In `@tests/test_potenda_parallel.py`:
- Around line 85-93: The current loop in tests/test_potenda_parallel.py only
checks that some Tag/Role occurred before a Device by using seen_tier0; update
the assertion to require that both a Tag and a Role have completed before any
Device call. Replace the boolean seen_tier0 with two flags (e.g., seen_tag and
seen_role) and in the for-loop over by_time (sorted dst.calls) set these flags
when kind == "Tag" or kind == "Role", and change the Device check to assert
seen_tag and seen_role are True before allowing a Device entry.

---

Nitpick comments:
In `@infrahub_sync/cache/sidecars.py`:
- Around line 46-103: Add concise docstrings to the public sidecar classes
RowcountsFile, RunFile, and SchemaHashFile by placing a short one- or
two-sentence description immediately under each class definition that explains
the purpose of the class and its important attributes/methods (e.g.,
RowcountsFile: stores resource counts and provides load_or_default, set, get,
save; RunFile: tracks run metadata and uses KEYS with load_or_default and save;
SchemaHashFile: stores schema hash with load and save). Keep each docstring
brief, in imperative/descriptive form, and mention any return semantics where
relevant (e.g., load_or_default may return a default instance when the path does
not exist).

In `@infrahub_sync/potenda/__init__.py`:
- Around line 265-274: The current use of ThreadPoolExecutor with wait(...,
return_when=FIRST_EXCEPTION) may re-raise only the first future's exception when
iterating over src_fut.result() and dst_fut.result(), losing the second
failure's details; update the block that submits self.source_load and
self.destination_load (src_fut and dst_fut) to call .result() for each future
inside a try/except that re-raises the first exception but also captures and
logs (using the module logger) any secondary exception from the other future so
both errors are preserved; keep the fail-fast behavior (FIRST_EXCEPTION) but
ensure destination_load/source_load exceptions are logged if they occur after
the first raised error.

In `@tests/adapters/test_nautobot_incremental.py`:
- Around line 30-31: The test currently assigns
NautobotAdapter._create_nautobot_client at class level which can leak the stub
across tests; instead, scope the patch to adapter construction by using
patch.object(NautobotAdapter, "_create_nautobot_client", ...) or pytest's
monkeypatch.setattr inside the test or the _make_adapter helper so the MagicMock
is applied only for the duration of creating the NautobotAdapter instance
(reference NautobotAdapter._create_nautobot_client and the
_make_adapter/_make_adapter-like helper used to return the adapter).

In `@tests/cache/test_sidecars.py`:
- Around line 37-46: The test test_run_file_records_status sets RunFile fields
status, mode, and summary (including summary["diff_rows"]) but only asserts
status and summary["resources"]; update the test to assert that mode was
persisted and that summary["diff_rows"] equals 42 after reloading via
RunFile.load_or_default so mode and the nested summary key are verified
alongside status and resources.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8ce9723b-ed4c-4e2e-8ff6-35672a1ed9d2

📥 Commits

Reviewing files that changed from the base of the PR and between 4b70958 and c8b4aec.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (73)
  • .gitignore
  • .vale/styles/spelling-exceptions.txt
  • docs/docs/guides/run.mdx
  • docs/docs/readme.mdx
  • docs/docs/reference/cache-layout.mdx
  • docs/docs/reference/cli.mdx
  • docs/docs/reference/config.mdx
  • docs/docs/reference/incremental-extraction.mdx
  • docs/sidebars.ts
  • examples/aci_to_infrahub/config.yml
  • examples/custom_adapter/config.yml
  • examples/device42_to_infrahub/config.yml
  • examples/infrahub_to_peering-manager/config.yml
  • examples/ipfabric_to_infrahub/config.yml
  • examples/librenms_to_infrahub/config.yml
  • examples/nautobot-v1_to_infrahub/config.yml
  • examples/nautobot-v2_to_infrahub/config.yml
  • examples/netbox_to_infrahub/config.yml
  • examples/observium_to_infrahub/config.yml
  • examples/peering-manager_to_infrahub/config.yml
  • examples/peeringdb_to_infrahub/config.yml
  • examples/prometheus_to_infrahub (node_exporter)/config.yml
  • examples/slurpit_to_infrahub/config.yml
  • infrahub_sync/__init__.py
  • infrahub_sync/adapters/infrahub.py
  • infrahub_sync/adapters/nautobot.py
  • infrahub_sync/adapters/netbox.py
  • infrahub_sync/cache/__init__.py
  • infrahub_sync/cache/cursors.py
  • infrahub_sync/cache/guardrails.py
  • infrahub_sync/cache/incremental.py
  • infrahub_sync/cache/locks.py
  • infrahub_sync/cache/parquet_io.py
  • infrahub_sync/cache/paths.py
  • infrahub_sync/cache/sidecars.py
  • infrahub_sync/cli.py
  • infrahub_sync/dependency_graph.py
  • infrahub_sync/potenda/__init__.py
  • infrahub_sync/utils.py
  • pyproject.toml
  • scripts/bench-clean-nautobot.sh
  • scripts/bench-incremental-only.sh
  • tasks/__init__.py
  • tasks/bench.py
  • tests/adapters/test_infrahub_incremental.py
  • tests/adapters/test_infrahub_peer_identifier.py
  • tests/adapters/test_nautobot_incremental.py
  • tests/adapters/test_netbox_incremental.py
  • tests/cache/__init__.py
  • tests/cache/test_apply_plan.py
  • tests/cache/test_cli_sync_cache.py
  • tests/cache/test_concurrent_load.py
  • tests/cache/test_cursors.py
  • tests/cache/test_guardrails.py
  • tests/cache/test_incremental_engine.py
  • tests/cache/test_incremental_helpers.py
  • tests/cache/test_locks.py
  • tests/cache/test_parquet_io.py
  • tests/cache/test_paths.py
  • tests/cache/test_plan_serialization.py
  • tests/cache/test_potenda_cache_hook.py
  • tests/cache/test_run_counter.py
  • tests/cache/test_schema_subhash_persist.py
  • tests/cache/test_sidecars.py
  • tests/cache/test_sync_cache_flow.py
  • tests/test_cli_full_extract.py
  • tests/test_cli_parallel.py
  • tests/test_dependency_graph.py
  • tests/test_diffsync_mixin_contract.py
  • tests/test_get_potenda_top_level.py
  • tests/test_potenda_parallel.py
  • tests/test_potenda_tiers.py
  • tests/test_sync_config_order.py
💤 Files with no reviewable changes (1)
  • .vale/styles/spelling-exceptions.txt

Comment thread docs/docs/reference/config.mdx Outdated
Comment thread examples/infrahub_to_peering-manager/config.yml
Comment thread examples/netbox_to_infrahub/config.yml Outdated
Comment thread examples/peeringdb_to_infrahub/config.yml
Comment thread infrahub_sync/adapters/netbox.py
Comment thread infrahub_sync/cli.py
Comment thread tests/adapters/test_nautobot_incremental.py
Comment thread tests/cache/test_concurrent_load.py Outdated
Comment thread tests/cache/test_locks.py
Comment thread tests/test_dependency_graph.py Outdated
Verified each of the 14 inline comments. Fixes (11) + skipped (3, with
reasoning below):

## Correctness / security

- `infrahub_sync/cache/__init__.py`: `compute_schema_subhash` now hashes
  the full mapping semantics — per-resource `mapping`, per-field
  `mapping`/`reference`/`static`, plus `filters` and `transforms`. The
  previous payload only captured names + identifiers, so editing a
  filter or transform would not bust the cache and a stale plan could
  be reused.
- `infrahub_sync/cache/paths.py`: validate `sync_name` and `run_id` are
  single relative path segments (reject absolute paths and `..`).
  Closes a path-traversal risk in `infrahub-sync apply --run-id <id>`
  where the operator-supplied id is joined into a filesystem path.
- `infrahub_sync/cli.py`: serial sync path now sets
  `run_file.status = "failed"` and saves before calling
  `print_error_and_abort()` when `ptd.load_both_sides()` raises. The
  parallel path already did this; the serial fork was leaking
  `status: "running"` in run.json after a failed load.

## Robustness

- `infrahub_sync/adapters/{netbox,nautobot}.py`: `_resolve_endpoint`
  now wraps the `getattr` walk so a bad `schema_mapping.mapping`
  string surfaces as `ValueError("Invalid <vendor> mapping path …")`
  instead of a bare `AttributeError`.
- `examples/netbox_to_infrahub/config.yml`: revert the token to the
  placeholder — my squash inadvertently restored an older real-looking
  token. The placeholder matches main.

## Tests

- `tests/cache/test_concurrent_load.py`: the original
  `assert src_start < dst_end` was trivially true for sequential
  execution. Tightened to `assert max(starts) < min(ends)` so the
  test actually proves overlap.
- `tests/test_dependency_graph.py`: the completeness assertion looped
  over `cfg.order`, which is empty when the example opts into
  auto-tiering — silently a no-op. Replaced with a loop over
  `cfg.schema_mapping` kinds so tiering regressions get caught.

## Docs / style

- `docs/docs/reference/config.mdx`: document `--parallel` as on by
  default and mention `--no-parallel`.
- `infrahub_sync/cache/cursors.py`: add concise class docstrings on
  `CursorTier` and `CursorState`.
- `infrahub_sync/cache/guardrails.py`: add docstrings on
  `RowcountGuardrail` and `.check()`.

## Skipped

- Pre-existing token at `examples/infrahub_to_peering-manager/config.yml:14`
  and `examples/peeringdb_to_infrahub/config.yml:12`, plus the
  `dyffsync_flags` typo at `peeringdb_to_infrahub/config.yml:22`: all
  pre-existing on main, outside this PR's diff. Worth fixing in a
  dedicated redaction sweep, not bundled here.
- `tests/cache/test_locks.py:37` sleep-based synchronization: the
  flake hypothesis is real but the test passes consistently and the
  multiprocessing.Event refactor adds complexity without an observed
  failure to justify it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
infrahub_sync/adapters/netbox.py (1)

168-171: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

NetBox model_loader lacks ValidationError handling present in NautobotAdapter.

The Nautobot adapter's model_loader handles ValidationError with continue_on_error, but this NetBox implementation does not. This inconsistency means --continue-on-error won't work the same way for NetBox sources.

Suggested fix to add consistent error handling
         # Create model instances after filtering and transforming
         for data in self._records_to_diffsync(element=element, model=model, raw_records=raw_records):
-            item = model(**data)
-            self.add(item)
+            continue_on_error = getattr(self, "continue_on_error", False)
+            try:
+                item = model(**data)
+            except ValidationError as exc:
+                if not continue_on_error:
+                    raise
+                logger.warning(
+                    "Skipping %s[%s]: cannot build DiffSync model "
+                    "(likely a required peer was skipped earlier). Pydantic errors: %s",
+                    model_name,
+                    data.get("local_id"),
+                    exc.errors(include_url=False),
+                )
+                continue
+            self.add(item)

Also add the import at the top:

from pydantic import ValidationError
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@infrahub_sync/adapters/netbox.py` around lines 168 - 171, The NetBox
adapter's model_loader loop currently constructs model(**data) and
self.add(item) without catching pydantic ValidationError like the
NautobotAdapter does; update model_loader to import pydantic.ValidationError and
wrap the model(**data)/self.add(item) in a try/except that catches
ValidationError and, when continue_on_error is true, logs the error and
continues, otherwise re-raises the exception, mirroring the behavior implemented
in NautobotAdapter's model_loader.
🧹 Nitpick comments (1)
examples/netbox_to_infrahub/config.yml (1)

25-25: 💤 Low value

Consider using a more explicit placeholder for the API token.

While the all-a string is not a real token, it still resembles a valid NetBox API token format and may trigger secret scanners. A more explicit placeholder would be clearer for users.

Suggested change
-    token: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
+    token: "${NETBOX_TOKEN}"  # or: "<REDACTED_NETBOX_TOKEN>"

As per coding guidelines: examples/**/*.{yaml,yml}: Example configurations must be minimal, accurate, and redacted of sensitive information.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/netbox_to_infrahub/config.yml` at line 25, The example uses a
realistic-looking API token string for the token key which can trigger secret
scanners; replace the current value for the token field with an explicit,
clearly redacted placeholder (e.g., "REDACTED_TOKEN" or "<NETBOX_API_TOKEN>") so
the token key remains visible but no plausible secret is present and update any
nearby comment to indicate this is a placeholder for the NetBox API token.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@infrahub_sync/adapters/netbox.py`:
- Around line 168-171: The NetBox adapter's model_loader loop currently
constructs model(**data) and self.add(item) without catching pydantic
ValidationError like the NautobotAdapter does; update model_loader to import
pydantic.ValidationError and wrap the model(**data)/self.add(item) in a
try/except that catches ValidationError and, when continue_on_error is true,
logs the error and continues, otherwise re-raises the exception, mirroring the
behavior implemented in NautobotAdapter's model_loader.

---

Nitpick comments:
In `@examples/netbox_to_infrahub/config.yml`:
- Line 25: The example uses a realistic-looking API token string for the token
key which can trigger secret scanners; replace the current value for the token
field with an explicit, clearly redacted placeholder (e.g., "REDACTED_TOKEN" or
"<NETBOX_API_TOKEN>") so the token key remains visible but no plausible secret
is present and update any nearby comment to indicate this is a placeholder for
the NetBox API token.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3589ad84-c977-4812-b7e9-89778352a758

📥 Commits

Reviewing files that changed from the base of the PR and between c8b4aec and f8734bc.

📒 Files selected for processing (12)
  • docs/docs/reference/config.mdx
  • examples/netbox_to_infrahub/config.yml
  • infrahub_sync/adapters/nautobot.py
  • infrahub_sync/adapters/netbox.py
  • infrahub_sync/cache/__init__.py
  • infrahub_sync/cache/cursors.py
  • infrahub_sync/cache/guardrails.py
  • infrahub_sync/cache/paths.py
  • infrahub_sync/cli.py
  • tests/adapters/test_nautobot_incremental.py
  • tests/cache/test_concurrent_load.py
  • tests/test_dependency_graph.py
✅ Files skipped from review due to trivial changes (1)
  • docs/docs/reference/config.mdx

@BeArchiTek BeArchiTek requested a review from minitriga May 28, 2026 13:53
Copy link
Copy Markdown
Contributor

@minitriga minitriga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is a high-quality PR with a well-considered architecture. The bench numbers are compelling. A few issues worth addressing before merging — see inline comments. The main asks: fix the double filter_records call in netbox.py, collapse the two compute_tiers calls, guard the empty-snapshot hydration case, replace assert type-narrowing with explicit if/raise, and confirm the .vale removals are intentional.

Comment thread infrahub_sync/adapters/netbox.py Outdated
logger.info("%s: Loading %d/%d %s", self.type, len(filtered_objs), total, resource_name)
# Transform records
transformed_objs = model.transform_records(records=filtered_objs, schema_mapping=element)
filtered_count = len(model.filter_records(records=raw_records, schema_mapping=element))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double filter_records call. filter_records is called here to derive a log count, then _records_to_diffsync (line 169) calls it a second time internally via the same filter_records → transform_records pipeline. Every record gets filtered twice per load.

Consider computing the filtered list once and passing it to _records_to_diffsync (or adding an optional pre-filtered path), or simply omitting the log count here and relying on the count logged inside _records_to_diffsync.

as specified in the schema mapping, and loads the processed data into the adapter.
"""
# Retrieve schema mapping for this model
for element in self.config.schema_mapping:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asymmetric logging vs NetBox. The original model_loader logged Loading X/Y (filtered/total); the NetBox path still does this. Nautobot's refactored path only logs the total (or nothing for the source path). Minor inconsistency but can confuse operators comparing adapter output during a real sync run.

Comment thread infrahub_sync/utils.py Outdated
if not explicit_order:
from infrahub_sync.dependency_graph import compute_tiers

tiers, _dropped = compute_tiers(sync_instance.schema_mapping)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_tiers called twice on the hot path. compute_order() (called on line 238) internally calls compute_tiers again when order is empty. This PR therefore runs two topological sorts of the same graph on every invocation.

Consider having compute_order() return the tiers alongside the flat list, or passing the already-computed tiers result from here into a leaner version of compute_order that skips the sort.

{
"_extract_ts": pa.array([], type=pa.timestamp("ns", tz="UTC")),
"_source_id": pa.array([], type=pa.string()),
"_tombstone": pa.array([], type=pa.bool_()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty snapshot loses resource columns. When rows is empty, the written Parquet file has only the three internal columns (_extract_ts, _source_id, _tombstone). When hydrate_from_parquet later reads this file, cols = [c for c in table.column_names if c not in SNAPSHOT_INTERNAL_COLUMNS] will be [], so every add_row call receives {} — pydantic validation will fail on any required identifier field.

This matters concretely when an adapter returns zero destination records on a first diff (common for a fresh Infrahub), then a subsequent warm run tries to hydrate side B from that empty snapshot. Suggest writing an empty table with the correct per-resource schema, or skipping snapshot hydration when the file has zero rows and falling back to model_loader.

Comment thread infrahub_sync/utils.py Outdated
concurrent_load=concurrent_load,
)
ptd.run_id = rid # expose for CLI logging
ptd.cache_root = rdir.parent # .infrahub-sync-cache/<sync_name>/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-construction attribute mutation. run_id, cache_root, _schema_subhash, and force_full_extract are all set on ptd after the constructor returns — the object is in a partially-initialised state between Potenda(...) and the last assignment on line 284. If Potenda.__init__ ever calls into a method that reads any of these (or a subclass overrides __init__), it will see the None/empty defaults declared in potenda/__init__.py rather than the intended values.

These four fields are already declared in Potenda.__init__ (lines 55–57 and friends); consider promoting them to constructor parameters so the object is fully valid on construction.

Comment thread infrahub_sync/cli.py Outdated
sync_instance = get_instance(name=name, config_file=config_file, directory=directory)
if not sync_instance:
print_error_and_abort("Failed to load sync instance.")
assert sync_instance is not None # type-narrowing; print_error_and_abort raises
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert for type-narrowing in production code. assert statements are eliminated at python -O (optimised runtime). If the runner ever invokes with -O, this becomes a silent None-dereference on the next line. Same pattern on lines 144, 216, 241, 308, and 318.

Replace with an explicit check:

if sync_instance is None:
    raise RuntimeError("sync_instance is unexpectedly None")

Or, since print_error_and_abort already raises typer.Exit, annotate its return type as NoReturn so the type checker narrows without any runtime guard needed.

Comment thread infrahub_sync/adapters/nautobot.py Outdated
return field in str(exc)
if isinstance(payload, dict) and field in payload:
return True
return field in str(exc)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overly broad fallback in _is_unknown_filter_error. The final return field in str(exc) fires when JSON parsing failed, but str(exc) can include the field name in unrelated contexts (log messages, repr of nested objects, HTTP body fragments). The JSON-key check on line 46 is reliable and specific; the string-match fallback may produce false positives for a 400 that happens to mention last_updated__gte for a different reason.

Consider tightening: only use the JSON check, or require both status_code == 400 and at least one of the field-in-JSON conditions to be satisfied before treating it as an unknown-filter error.

optional = _collect_optional_edges(schema_mapping)
dropped: list[tuple[str, str]] = []

for _ in range(_MAX_CYCLE_BREAK_ATTEMPTS):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cycle-breaking loop drops one edge per iteration. With 50 max attempts, each breaking only one optional edge and re-running a full topological sort, this is O(n_cycles × sort_cost) in the worst case. For real schema_mappings (tens of kinds) this is fine, but the budget is invisible from the error message.

A simpler and faster approach: collect all optional edges appearing in any reported cycle in a single pass, drop them all, and retry once. That reduces it to at most 2 sorts regardless of cycle count.

location (e.g., an NFS mount used by a fleet of runners).
"""
_require_safe_segment(sync_name, "sync_name")
base = os.environ.get("INFRAHUB_SYNC_CACHE_DIR")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INFRAHUB_SYNC_CACHE_DIR env-var not sanitised. cache_root_for calls _require_safe_segment on sync_name (line 33) but not on the INFRAHUB_SYNC_CACHE_DIR value itself. A misconfigured or malicious env var like INFRAHUB_SYNC_CACHE_DIR=/etc would be accepted and used as the base. The risk is low in practice (only an operator can set env vars), but it's worth documenting or adding a validation note.

Comment thread infrahub_sync/cache/sidecars.py Outdated
if not path.exists():
return cls(path=path)
data = json.loads(path.read_text(encoding="utf-8"))
return cls(path=path, **{k: data.get(k) for k in cls.KEYS if data.get(k) is not None})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_or_default silently drops keys whose stored value is None. The if data.get(k) is not None guard means a serialised {"status": null} is treated identically to a missing key and falls back to the dataclass default ("pending"). For finished_at this is intentional. For status, it would silently reset a stored null to "pending", which could mask a corrupted run file.

Consider using k in data instead of data.get(k) is not None and letting None be a valid stored value for fields that allow it.

Comment thread tasks/bench.py
context: Context,
name: str = "from-netbox",
directory: str = "examples/",
csv_out: str = "bench-results.csv",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bench artefacts written to the repo root by default. bench-results.csv (line 75) and .bench-filtered-config.yml (line 113) both default to repo_root / <name>. Neither is in .gitignore. Anyone running invoke bench.run will get a dirty worktree. Consider defaulting csv_out to a temp directory or at least adding both paths to .gitignore.

Containerlab
content_type
convert_query_response
coroutine
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional spelling-exception removals? This patch removes diffsync, Typer, Prefect, Slurp'it, cron, cutover, and declaratively from the Vale spelling exceptions. diffsync and Typer are still used in the codebase and docs. If these were removed to silence unrelated Vale noise, that's fine — but if it was accidental collateral, it will cause doc linting failures.

@BeArchiTek BeArchiTek self-assigned this May 29, 2026
Benoit Kohler and others added 2 commits May 29, 2026 13:39
Redact credential-like API tokens in the peering example configs (including
the unflagged peering-manager_to_infrahub sibling carrying the same token) to
the placeholder convention. Correct the dyffsync_flags -> diffsync_flags typo
in peeringdb_to_infrahub, which silently dropped SKIP_UNMATCHED_SRC.

Replace the sleep-based synchronisation in test_pipeline_lock_excludes_concurrent_run
with a multiprocessing.Event signalled once the child holds the lock, plus a
bounded finally teardown, to remove a CI race.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- adapters: filter records once per load (drop double filter_records) and log
  filtered/total symmetrically across NetBox and Nautobot
- nautobot: tighten _is_unknown_filter_error so a non-JSON 400 only matches when
  the body names the field AND mentions a filter, avoiding false positives
- cli: drop asserts used for type-narrowing (stripped under python -O); raise
  explicitly when run_dir is unexpectedly missing
- dependency_graph: break all optional edges found in cycles in one pass instead
  of one edge per topological sort
- cache/paths: expand and reject '..' traversal in INFRAHUB_SYNC_CACHE_DIR
- cache/sidecars: RunFile.load_or_default keeps a stored null instead of silently
  resetting it to the dataclass default
- engine: compute order and tiers in a single topological pass; pass run_id,
  cache_root and schema_subhash to Potenda.__init__ so the object is valid on
  construction rather than mutated afterwards
- gitignore bench artifacts; restore still-used Vale spelling exceptions

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/breaking-change Breaking Changes type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants