Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 18, 2025

Fix List column sampling when min_length > 32

  • Add test to reproduce the issue with min_length=33
  • Fix the max calculation in List._sample_unchecked to handle min_length > 32
  • Run tests to verify the fix
  • Run code review
  • Run security check
  • Fix mypy errors in test_list.py

Summary

Fixed a bug where sampling would fail with ValueError: low >= high for List columns when min_length > 32 and max_length is not specified.

Changes made:

  1. Modified dataframely/columns/list.py: Updated List._sample_unchecked method to calculate default_max = max(32, min_len) ensuring the max parameter is always greater than min when max_length is not specified
  2. Added test in tests/column_types/test_list.py: Parametrized test covering min_length values of 0, 10, 33, and 100
  3. Fixed mypy errors: Added cast import and used it to properly type the result of list.len().min()

Test results:

  • All 23 list-specific tests pass
  • All 609 column type tests pass
  • All 311 sampling-related tests pass
  • Pre-commit checks pass (including mypy)
  • Code review: No issues found
  • Security scan: No vulnerabilities found
Original prompt

This section details on the original issue you should resolve

<issue_title>Sampling fails for List column with min_length>32</issue_title>
<issue_description>The following test fails with ValueError: low >= high.

import dataframely as dy
from dataframely.testing import create_schema

create_schema("test", {"a": dy.List(dy.Int64(), min_length=33)}).sample()

The problem is probably a wrong setting of the max argument in the following line:

dataframely/columns/list.py
</issue_description>

<agent_instructions>Make sure to first reproduce the issue with a failing test in the correct place in the codebase.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: jjurm <6285777+jjurm@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix sampling issue for List column with min_length greater than 32 Fix List sampling when min_length > 32 Dec 18, 2025
Copilot AI requested a review from jjurm December 18, 2025 09:41
@codecov
Copy link

codecov bot commented Dec 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (b813ac9) to head (e1116b3).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #248   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           53        53           
  Lines         3066      3068    +2     
=========================================
+ Hits          3066      3068    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jjurm jjurm changed the title Fix List sampling when min_length > 32 fix: Sampling for List columns with min_length > 32 Dec 18, 2025
@github-actions github-actions bot added the fix label Dec 18, 2025
@jjurm
Copy link
Contributor

jjurm commented Dec 18, 2025

@copilot Run pixi run pre-commit run and fix mypy errors

Co-authored-by: jjurm <6285777+jjurm@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 18, 2025

@copilot Run pixi run pre-commit run and fix mypy errors

Fixed in e1116b3. Added cast import and used it to properly type the result of list.len().min() to resolve the mypy operator type errors.

@jjurm jjurm marked this pull request as ready for review December 18, 2025 12:25
@jjurm jjurm merged commit 1eedd87 into main Dec 19, 2025
48 of 59 checks passed
@jjurm jjurm deleted the copilot/fix-sampling-list-column-error branch December 19, 2025 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sampling fails for List column with min_length>32

3 participants