Skip to content

[BugFix] br_ans_beneficiario#1450

Merged
tricktx merged 16 commits intomainfrom
fix/br_ans
Mar 30, 2026
Merged

[BugFix] br_ans_beneficiario#1450
tricktx merged 16 commits intomainfrom
fix/br_ans

Conversation

@tricktx
Copy link
Copy Markdown
Contributor

@tricktx tricktx commented Feb 25, 2026

Descrição do PR:

  • Como detalhado nessa issue, há partições erradas derivadas de um encoding no código.
  • Estou subindo uma forma "bruta" para corrigir o erro que impacta diretamente no usuário, após a correção, irei refatorar a pipeline que para a equipe_dados consiga fazer um rollback nos dados retroativos apenas com um parâmetro.

Summary by CodeRabbit

  • New Features

    • Added optional year-based filtering parameter for dataset selection.
  • Improvements

    • Enabled automatic daily refresh schedule for pipeline execution.
    • Optimized memory usage during data file processing.
    • Updated character encoding to UTF-8 for improved data compatibility.
    • Enhanced logging for validation operations.

@tricktx tricktx self-assigned this Mar 9, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 26, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6fd02c4b-110e-4d97-b088-7b9a340ad529

📥 Commits

Reviewing files that changed from the base of the PR and between 1183f22 and 716e449.

📒 Files selected for processing (4)
  • pipelines/datasets/br_ans_beneficiario/flows.py
  • pipelines/datasets/br_ans_beneficiario/schedules.py
  • pipelines/datasets/br_ans_beneficiario/tasks.py
  • pipelines/datasets/br_ans_beneficiario/utils.py

📝 Walkthrough

Walkthrough

This PR refactors the ANS beneficiary dataset pipeline flow to support optional year-based file filtering. Changes include introducing a year parameter, restructuring conditional file download logic with explicit branching and merging, updating task signatures, changing date format handling, adding garbage collection calls, modifying CSV encoding, enabling the daily schedule, and updating code ownership.

Changes

Cohort / File(s) Summary
Flow Control & Scheduling
pipelines/datasets/br_ans_beneficiario/flows.py, pipelines/datasets/br_ans_beneficiario/schedules.py
Added optional year Parameter to flow; refactored control flow to branch download targets via files_to_download() with conditional year filtering, then merge results; changed date format from "%Y-%m-%d" to "%Y-%m"; replaced check_condition and check_if_update_date_is_today with check_if_data_is_outdated only; enabled daily schedule assignment; updated code\_owners from "equipe_pipelines" to "trick"; removed target parameter from schedule's parameter_defaults.
Task Signature & Logic Updates
pipelines/datasets/br_ans_beneficiario/tasks.py
Updated check_condition() signature from two parameters to one; updated files_to_download() signature to accept year parameter with conditional filtering by filename; modified get_file_max_date() to return "%Y-%m" format instead of first day of month; added gc.collect() call in crawler_ans() after processing each file; added redundant os.makedirs() call in crawler_ans(); enhanced is_empty() with logging.
Utility & Encoding Changes
pipelines/datasets/br_ans_beneficiario/utils.py
Added gc.collect() call in download/unzip branch after file deletion; changed CSV decoding in parquet_partition() from encoding="latin1" to encoding="utf-8".

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • [BugFix] br_ans_beneficiario #1468: Directly conflicts with this PR's schedule assignment change—the referenced PR removed/commented out the same schedule assignment that this PR enables.

Suggested reviewers

  • folhesgabriel

Poem

🐰 The ANS pipeline hops with cheer,

A year parameter now so dear,

Files merge and split with branching grace,

GC collects through time and space,

UTF-8 text now takes the lead,

This dataset feeds the daily need! 🌙

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/br_ans

Comment @coderabbitai help to get the list of available commands and usage tips.

@tricktx tricktx marked this pull request as ready for review March 30, 2026 22:11
@tricktx tricktx merged commit a47889e into main Mar 30, 2026
6 of 7 checks passed
@tricktx tricktx deleted the fix/br_ans branch March 30, 2026 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants