Add file adapter for syncing from local CSV exports#131
Conversation
|
Warning Review limit reached
More reviews will be available in 33 minutes and 46 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThis pull request introduces a new read-only file adapter for Infrahub that loads CSV exports into DiffSync models. The implementation includes a CSV reader with configurable delimiters and encoding, a FileAdapter class that resolves file paths relative to a configured base directory and reads records via registered format readers, and record conversion logic that derives local identifiers and applies schema field mappings including static values, direct scalar mappings, delimited list splitting, and reference field resolution. A FileModel base class marks the adapter as source-only. Comprehensive tests verify CSV parsing, end-to-end model loading with identifier and reference resolution, and error handling for unsupported formats and missing files. Documentation and example configuration demonstrate the adapter's capabilities and usage. 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
tests/adapters/test_file.py (1)
87-100: ⚡ Quick winAdd an explicit empty-page CSV test for the reader/adapter path.
Please add a test for an empty page case (e.g., header-only CSV and/or fully empty CSV) to validate the no-record behavior explicitly.
As per coding guidelines: "Unit tests for utils and adapter edge cases (timeouts, 401/403, empty pages); parametrized tests for config parsing; keep tests atomic and single-purpose".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/adapters/test_file.py` around lines 87 - 100, Add a new atomic unit test in tests/adapters/test_file.py that verifies read_csv returns an empty list for empty-page CSVs: create a tmp_path file for two cases (fully empty file and header-only file like "name,description\n"), call read_csv(path, settings={}) for each, and assert the result is [] to explicitly cover the no-record edge case for the reader/adapter; reference the existing test helpers and the read_csv function when locating where to add the new test(s).examples/file_to_infrahub/config.yml (1)
7-12: ⚡ Quick winUse a local relative
directoryand trim default CSV options in the example.This example is less portable than necessary. Using
examples/file_to_infrahub/dataties it to repo-root execution, anddelimiter/encodingrepeat defaults.Proposed cleanup
source: name: file settings: - # Base directory used to resolve relative file paths below. - # Defaults to this config's directory when omitted. - directory: "examples/file_to_infrahub/data" - # CSV options (applied to all .csv files) - delimiter: "," - encoding: "utf-8" + # Base directory used to resolve relative file paths below. + directory: "data"As per coding guidelines: "examples/**/*.{yaml,yml}: Example configurations must be minimal, accurate, and redacted of sensitive information".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/file_to_infrahub/config.yml` around lines 7 - 12, The example config uses a repo-root-dependent directory and repeats default CSV options; change the directory key to a local relative value (e.g., "directory: ./data" or remove the directory key entirely to use the config's directory) and remove the redundant CSV option keys "delimiter" and "encoding" so the example is minimal and relies on defaults.infrahub_sync/adapters/file.py (1)
4-4: ⚡ Quick winAlign logging with repo convention (or migrate to
structlogproject-wide).
infrahub_sync/adapters/file.pyuses stdliblogging(import logging/logging.getLogger(__name__)), and the other adapters underinfrahub_sync/adapters/also use stdliblogging(nostructlogusage found). Ifstructlogis a required standard, this file should be updated as part of a broader convention change.The
config.source.name.title() == self.type.title()gating matches sibling adapters, so that concern is resolved.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@infrahub_sync/adapters/file.py` at line 4, This file currently imports the stdlib logging but doesn't follow the repo's logger pattern; add a module-level logger (e.g., logger = logging.getLogger(__name__)) and replace any direct logging.* calls with logger.* to match the other adapters, or if the repo standard is to use structlog, replace the import with structlog and create a logger via structlog.get_logger() and update all logger usages accordingly (ensure you update the module-level logger symbol so other functions/classes in this file use the same logger).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@examples/file_to_infrahub/config.yml`:
- Around line 7-12: The example config uses a repo-root-dependent directory and
repeats default CSV options; change the directory key to a local relative value
(e.g., "directory: ./data" or remove the directory key entirely to use the
config's directory) and remove the redundant CSV option keys "delimiter" and
"encoding" so the example is minimal and relies on defaults.
In `@infrahub_sync/adapters/file.py`:
- Line 4: This file currently imports the stdlib logging but doesn't follow the
repo's logger pattern; add a module-level logger (e.g., logger =
logging.getLogger(__name__)) and replace any direct logging.* calls with
logger.* to match the other adapters, or if the repo standard is to use
structlog, replace the import with structlog and create a logger via
structlog.get_logger() and update all logger usages accordingly (ensure you
update the module-level logger symbol so other functions/classes in this file
use the same logger).
In `@tests/adapters/test_file.py`:
- Around line 87-100: Add a new atomic unit test in tests/adapters/test_file.py
that verifies read_csv returns an empty list for empty-page CSVs: create a
tmp_path file for two cases (fully empty file and header-only file like
"name,description\n"), call read_csv(path, settings={}) for each, and assert the
result is [] to explicitly cover the no-record edge case for the reader/adapter;
reference the existing test helpers and the read_csv function when locating
where to add the new test(s).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b518736a-d408-4931-9c59-5c8f8e065a33
⛔ Files ignored due to path filters (2)
examples/file_to_infrahub/data/devices.csvis excluded by!**/*.csvexamples/file_to_infrahub/data/organizations.csvis excluded by!**/*.csv
📒 Files selected for processing (5)
docs/docs/adapters/file.mdxdocs/sidebars.tsexamples/file_to_infrahub/config.ymlinfrahub_sync/adapters/file.pytests/adapters/test_file.py
- Simplify example config: drop repo-root-bound directory, redundant
delimiter/encoding defaults; prefix mappings with data/
Flat-file sources naturally produce duplicate identifiers when a model is derived from a foreign-key column (e.g. one LocationSite per unique 'location' value across many device rows). Previously the second occurrence raised ObjectAlreadyExists and aborted the load. Now the first occurrence wins and a summary count is logged.
Summary by CodeRabbit
New Features
Documentation
Examples
Tests