Skip to content

Comments

#851 Make '_corrupt_records' a nullable field.#823

Merged
yruslan merged 1 commit intomasterfrom
feature/821-corrupt_records_nullable_array
Feb 20, 2026
Merged

#851 Make '_corrupt_records' a nullable field.#823
yruslan merged 1 commit intomasterfrom
feature/821-corrupt_records_nullable_array

Conversation

@yruslan
Copy link
Collaborator

@yruslan yruslan commented Feb 20, 2026

Summary by CodeRabbit

  • Schema Updates

    • The corrupt-fields array in the generated Spark schema is now nullable instead of non-nullable; nested fields inside that array remain unchanged. This may affect schema validation and handling of corrupt records.
  • Tests

    • Updated test expectations to reflect the new nullable corrupt-fields schema configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 20, 2026

Walkthrough

The top-level _corrupt_fields array in the generated Spark schema is now nullable (nullable = true) when generateCorruptFields is enabled; corresponding test expectations were updated to match.

Changes

Cohort / File(s) Summary
Schema change & tests
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala, spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test41CorruptFieldsSpec.scala, spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala
Top-level _corrupt_fields ArrayType now created with nullable = true (was false); updated test expectations to reflect the new nullability. Inner StructType fields remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 I hopped through schemas, soft and light,
Found a field that needed night,
Now _corrupt_fields may rest and be,
Nullable, calm — a hop, whee! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title references '_corrupt_records' but the actual changes modify '_corrupt_fields', indicating a mismatch between the stated objective and the implemented changes. Update the PR title to accurately reflect the changes: 'Make _corrupt_fields a nullable field' or verify that the wrong field was modified.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/821-corrupt_records_nullable_array

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala (1)

132-141: ⚠️ Potential issue | 🟠 Major

Update CobolSchemaSpec.scala test expectation at line 309: change _corrupt_fields: array (nullable = false) to nullable = true.

The schema change is correct and the data serialization behavior (empty arrays [] rather than null) matches the intentional distinction between nullable = true (the array itself can be null) and containsNull = false (array elements are never null).

However, CobolSchemaSpec.scala line 309 still asserts the old nullable = false for the array field. This test will fail with the current code changes and must be updated. Test41CorruptFieldsSpec.scala already has the correct expectation with nullable = true.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala`
around lines 132 - 141, Update the CobolSchemaSpec.scala test expectation for
the corrupt fields array to match the schema change: change the assertion that
`_corrupt_fields: array (nullable = false)` to `_corrupt_fields: array (nullable
= true)` so it reflects that the array field (Constants.corruptFieldsField) is
nullable while its elements remain non-null (containsNull = false); ensure the
test text or expected schema string used in the spec (the assertion referencing
Constants.corruptFieldsField / `_corrupt_fields`) is updated accordingly to
avoid the mismatch with CobolSchema.generateCorruptFields behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala`:
- Around line 132-141: Update the CobolSchemaSpec.scala test expectation for the
corrupt fields array to match the schema change: change the assertion that
`_corrupt_fields: array (nullable = false)` to `_corrupt_fields: array (nullable
= true)` so it reflects that the array field (Constants.corruptFieldsField) is
nullable while its elements remain non-null (containsNull = false); ensure the
test text or expected schema string used in the spec (the assertion referencing
Constants.corruptFieldsField / `_corrupt_fields`) is updated accordingly to
avoid the mismatch with CobolSchema.generateCorruptFields behavior.

@yruslan yruslan force-pushed the feature/821-corrupt_records_nullable_array branch from c217345 to 01b76d1 Compare February 20, 2026 13:37
@github-actions
Copy link

JaCoCo code coverage report - 'cobol-parser'

Overall Project 91.45% 🍏

There is no coverage information present for the Files changed

@github-actions
Copy link

JaCoCo code coverage report - 'spark-cobol'

Overall Project 82.39% 🍏
Files changed 100% 🍏

File Coverage
CobolSchema.scala 95.51% 🍏

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala (1)

872-887: Optional: add nullability assertion to the builder test.

The test verifies the field name and that its type isInstanceOf[ArrayType], but doesn't assert nullable == true. Adding this assertion would close the coverage gap introduced by this PR.

✏️ Suggested addition
  assert(sparkSchema.fields(1).name == "_corrupt_fields")
  assert(sparkSchema.fields(1).dataType.isInstanceOf[ArrayType])
+ assert(sparkSchema.fields(1).nullable == true)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala`
around lines 872 - 887, The test "create schema with corrupt fields using
builder" lacks an assertion that the generated _corrupt_fields column is
nullable; update the test to assert nullability by calling
CobolSchema.builder(parsedCopybook).withGenerateCorruptFields(true).build().getSparkSchema
and verify that the field named "_corrupt_fields" (sparkSchema.fields(1)) has
nullable == true (e.g., add an assertion on sparkSchema.fields(1).nullable).
This uses the existing symbols CobolSchema.builder, withGenerateCorruptFields,
getSparkSchema and the "_corrupt_fields" field.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala`:
- Around line 872-887: The test "create schema with corrupt fields using
builder" lacks an assertion that the generated _corrupt_fields column is
nullable; update the test to assert nullability by calling
CobolSchema.builder(parsedCopybook).withGenerateCorruptFields(true).build().getSparkSchema
and verify that the field named "_corrupt_fields" (sparkSchema.fields(1)) has
nullable == true (e.g., add an assertion on sparkSchema.fields(1).nullable).
This uses the existing symbols CobolSchema.builder, withGenerateCorruptFields,
getSparkSchema and the "_corrupt_fields" field.

@yruslan yruslan merged commit 904c05f into master Feb 20, 2026
7 checks passed
@yruslan yruslan deleted the feature/821-corrupt_records_nullable_array branch February 20, 2026 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant