Skip to content

Add file adapter for syncing from local CSV exports#131

Open
pcDamasceno wants to merge 3 commits into
opsmill:mainfrom
pcDamasceno:csv-adapter
Open

Add file adapter for syncing from local CSV exports#131
pcDamasceno wants to merge 3 commits into
opsmill:mainfrom
pcDamasceno:csv-adapter

Conversation

@pcDamasceno
Copy link
Copy Markdown

@pcDamasceno pcDamasceno commented May 31, 2026

Summary by CodeRabbit

  • New Features

    • One-way file adapter to ingest CSVs into Infrahub with configurable mappings, identifier rules, delimiters, encodings, list handling and reference resolution.
  • Documentation

    • Comprehensive docs covering configuration, mapping semantics, supported/planned formats, examples, and common errors.
  • Examples

    • Added a sample file-to-Infrahub configuration demonstrating mappings and processing order.
  • Tests

    • New tests covering CSV parsing, mapping behavior, references, and error cases.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 31, 2026

Review Change Stack

Warning

Review limit reached

@pcDamasceno, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 33 minutes and 46 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a592dc25-f299-495d-9c45-3c37cf972882

📥 Commits

Reviewing files that changed from the base of the PR and between 51c9627 and 6aa2d5f.

📒 Files selected for processing (2)
  • infrahub_sync/adapters/file.py
  • tests/adapters/test_file.py

Walkthrough

This pull request introduces a new read-only file adapter for Infrahub that loads CSV exports into DiffSync models. The implementation includes a CSV reader with configurable delimiters and encoding, a FileAdapter class that resolves file paths relative to a configured base directory and reads records via registered format readers, and record conversion logic that derives local identifiers and applies schema field mappings including static values, direct scalar mappings, delimited list splitting, and reference field resolution. A FileModel base class marks the adapter as source-only. Comprehensive tests verify CSV parsing, end-to-end model loading with identifier and reference resolution, and error handling for unsupported formats and missing files. Documentation and example configuration demonstrate the adapter's capabilities and usage.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a file adapter for CSV synchronization, which is the primary focus of all modifications across documentation, configuration examples, implementation, and tests.
Docstring Coverage ✅ Passed Docstring coverage is 85.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
tests/adapters/test_file.py (1)

87-100: ⚡ Quick win

Add an explicit empty-page CSV test for the reader/adapter path.

Please add a test for an empty page case (e.g., header-only CSV and/or fully empty CSV) to validate the no-record behavior explicitly.

As per coding guidelines: "Unit tests for utils and adapter edge cases (timeouts, 401/403, empty pages); parametrized tests for config parsing; keep tests atomic and single-purpose".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/adapters/test_file.py` around lines 87 - 100, Add a new atomic unit
test in tests/adapters/test_file.py that verifies read_csv returns an empty list
for empty-page CSVs: create a tmp_path file for two cases (fully empty file and
header-only file like "name,description\n"), call read_csv(path, settings={})
for each, and assert the result is [] to explicitly cover the no-record edge
case for the reader/adapter; reference the existing test helpers and the
read_csv function when locating where to add the new test(s).
examples/file_to_infrahub/config.yml (1)

7-12: ⚡ Quick win

Use a local relative directory and trim default CSV options in the example.

This example is less portable than necessary. Using examples/file_to_infrahub/data ties it to repo-root execution, and delimiter/encoding repeat defaults.

Proposed cleanup
 source:
   name: file
   settings:
-    # Base directory used to resolve relative file paths below.
-    # Defaults to this config's directory when omitted.
-    directory: "examples/file_to_infrahub/data"
-    # CSV options (applied to all .csv files)
-    delimiter: ","
-    encoding: "utf-8"
+    # Base directory used to resolve relative file paths below.
+    directory: "data"

As per coding guidelines: "examples/**/*.{yaml,yml}: Example configurations must be minimal, accurate, and redacted of sensitive information".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/file_to_infrahub/config.yml` around lines 7 - 12, The example config
uses a repo-root-dependent directory and repeats default CSV options; change the
directory key to a local relative value (e.g., "directory: ./data" or remove the
directory key entirely to use the config's directory) and remove the redundant
CSV option keys "delimiter" and "encoding" so the example is minimal and relies
on defaults.
infrahub_sync/adapters/file.py (1)

4-4: ⚡ Quick win

Align logging with repo convention (or migrate to structlog project-wide).

infrahub_sync/adapters/file.py uses stdlib logging (import logging / logging.getLogger(__name__)), and the other adapters under infrahub_sync/adapters/ also use stdlib logging (no structlog usage found). If structlog is a required standard, this file should be updated as part of a broader convention change.

The config.source.name.title() == self.type.title() gating matches sibling adapters, so that concern is resolved.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@infrahub_sync/adapters/file.py` at line 4, This file currently imports the
stdlib logging but doesn't follow the repo's logger pattern; add a module-level
logger (e.g., logger = logging.getLogger(__name__)) and replace any direct
logging.* calls with logger.* to match the other adapters, or if the repo
standard is to use structlog, replace the import with structlog and create a
logger via structlog.get_logger() and update all logger usages accordingly
(ensure you update the module-level logger symbol so other functions/classes in
this file use the same logger).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@examples/file_to_infrahub/config.yml`:
- Around line 7-12: The example config uses a repo-root-dependent directory and
repeats default CSV options; change the directory key to a local relative value
(e.g., "directory: ./data" or remove the directory key entirely to use the
config's directory) and remove the redundant CSV option keys "delimiter" and
"encoding" so the example is minimal and relies on defaults.

In `@infrahub_sync/adapters/file.py`:
- Line 4: This file currently imports the stdlib logging but doesn't follow the
repo's logger pattern; add a module-level logger (e.g., logger =
logging.getLogger(__name__)) and replace any direct logging.* calls with
logger.* to match the other adapters, or if the repo standard is to use
structlog, replace the import with structlog and create a logger via
structlog.get_logger() and update all logger usages accordingly (ensure you
update the module-level logger symbol so other functions/classes in this file
use the same logger).

In `@tests/adapters/test_file.py`:
- Around line 87-100: Add a new atomic unit test in tests/adapters/test_file.py
that verifies read_csv returns an empty list for empty-page CSVs: create a
tmp_path file for two cases (fully empty file and header-only file like
"name,description\n"), call read_csv(path, settings={}) for each, and assert the
result is [] to explicitly cover the no-record edge case for the reader/adapter;
reference the existing test helpers and the read_csv function when locating
where to add the new test(s).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b518736a-d408-4931-9c59-5c8f8e065a33

📥 Commits

Reviewing files that changed from the base of the PR and between 4b70958 and 5a2816f.

⛔ Files ignored due to path filters (2)
  • examples/file_to_infrahub/data/devices.csv is excluded by !**/*.csv
  • examples/file_to_infrahub/data/organizations.csv is excluded by !**/*.csv
📒 Files selected for processing (5)
  • docs/docs/adapters/file.mdx
  • docs/sidebars.ts
  • examples/file_to_infrahub/config.yml
  • infrahub_sync/adapters/file.py
  • tests/adapters/test_file.py

  - Simplify example config: drop repo-root-bound directory, redundant
    delimiter/encoding defaults; prefix mappings with data/
  Flat-file sources naturally produce duplicate identifiers when a model
  is derived from a foreign-key column (e.g. one LocationSite per unique
  'location' value across many device rows). Previously the second
  occurrence raised ObjectAlreadyExists and aborted the load. Now the
  first occurrence wins and a summary count is logged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant