#851 Make '_corrupt_records' a nullable field. by yruslan · Pull Request #823 · AbsaOSS/cobrix

yruslan · 2026-02-20T13:03:44Z

Summary by CodeRabbit

Schema Updates
- The corrupt-fields array in the generated Spark schema is now nullable instead of non-nullable; nested fields inside that array remain unchanged. This may affect schema validation and handling of corrupt records.
Tests
- Updated test expectations to reflect the new nullable corrupt-fields schema configuration.

coderabbitai · 2026-02-20T13:04:04Z

Walkthrough

The top-level _corrupt_fields array in the generated Spark schema is now nullable (nullable = true) when generateCorruptFields is enabled; corresponding test expectations were updated to match.

Changes

Cohort / File(s)	Summary
Schema change & tests `spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala`, `spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test41CorruptFieldsSpec.scala`, `spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala`	Top-level `_corrupt_fields` ArrayType now created with `nullable = true` (was `false`); updated test expectations to reflect the new nullability. Inner StructType fields remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Make '_corrupt_fields' array nullable otherwise it breaks some workflows #821: Same objective — make the top-level _corrupt_fields array nullable.

Possibly related PRs

#723 Add the ability to generate _corrupt_fields column containing information on fields that Cobrix failed to decode #820: Introduced the _corrupt_fields/corrupt-fields schema feature that this PR tweaks.

Poem

🐰 I hopped through schemas, soft and light,
Found a field that needed night,
Now _corrupt_fields may rest and be,
Nullable, calm — a hop, whee! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title references '_corrupt_records' but the actual changes modify '_corrupt_fields', indicating a mismatch between the stated objective and the implemented changes.	Update the PR title to accurately reflect the changes: 'Make _corrupt_fields a nullable field' or verify that the wrong field was modified.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/821-corrupt_records_nullable_array

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala (1)
132-141: ⚠️ Potential issue | 🟠 Major

Update CobolSchemaSpec.scala test expectation at line 309: change _corrupt_fields: array (nullable = false) to nullable = true.

The schema change is correct and the data serialization behavior (empty arrays [] rather than null) matches the intentional distinction between nullable = true (the array itself can be null) and containsNull = false (array elements are never null).

However, CobolSchemaSpec.scala line 309 still asserts the old nullable = false for the array field. This test will fail with the current code changes and must be updated. Test41CorruptFieldsSpec.scala already has the correct expectation with nullable = true.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala`
around lines 132 - 141, Update the CobolSchemaSpec.scala test expectation for
the corrupt fields array to match the schema change: change the assertion that
`_corrupt_fields: array (nullable = false)` to `_corrupt_fields: array (nullable
= true)` so it reflects that the array field (Constants.corruptFieldsField) is
nullable while its elements remain non-null (containsNull = false); ensure the
test text or expected schema string used in the spec (the assertion referencing
Constants.corruptFieldsField / `_corrupt_fields`) is updated accordingly to
avoid the mismatch with CobolSchema.generateCorruptFields behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala`:
- Around line 132-141: Update the CobolSchemaSpec.scala test expectation for the
corrupt fields array to match the schema change: change the assertion that
`_corrupt_fields: array (nullable = false)` to `_corrupt_fields: array (nullable
= true)` so it reflects that the array field (Constants.corruptFieldsField) is
nullable while its elements remain non-null (containsNull = false); ensure the
test text or expected schema string used in the spec (the assertion referencing
Constants.corruptFieldsField / `_corrupt_fields`) is updated accordingly to
avoid the mismatch with CobolSchema.generateCorruptFields behavior.

github-actions · 2026-02-20T13:39:40Z

JaCoCo code coverage report - 'cobol-parser'

Overall Project	91.45%	🍏

There is no coverage information present for the Files changed

github-actions · 2026-02-20T13:39:42Z

JaCoCo code coverage report - 'spark-cobol'

Overall Project	82.39%	🍏
Files changed	100%	🍏

File	Coverage
CobolSchema.scala	95.51%	🍏

coderabbitai

🧹 Nitpick comments (1)

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala (1)

872-887: Optional: add nullability assertion to the builder test.

The test verifies the field name and that its type isInstanceOf[ArrayType], but doesn't assert nullable == true. Adding this assertion would close the coverage gap introduced by this PR.

✏️ Suggested addition

  assert(sparkSchema.fields(1).name == "_corrupt_fields")
  assert(sparkSchema.fields(1).dataType.isInstanceOf[ArrayType])
+ assert(sparkSchema.fields(1).nullable == true)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala`
around lines 872 - 887, The test "create schema with corrupt fields using
builder" lacks an assertion that the generated _corrupt_fields column is
nullable; update the test to assert nullability by calling
CobolSchema.builder(parsedCopybook).withGenerateCorruptFields(true).build().getSparkSchema
and verify that the field named "_corrupt_fields" (sparkSchema.fields(1)) has
nullable == true (e.g., add an assertion on sparkSchema.fields(1).nullable).
This uses the existing symbols CobolSchema.builder, withGenerateCorruptFields,
getSparkSchema and the "_corrupt_fields" field.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala`:
- Around line 872-887: The test "create schema with corrupt fields using
builder" lacks an assertion that the generated _corrupt_fields column is
nullable; update the test to assert nullability by calling
CobolSchema.builder(parsedCopybook).withGenerateCorruptFields(true).build().getSparkSchema
and verify that the field named "_corrupt_fields" (sparkSchema.fields(1)) has
nullable == true (e.g., add an assertion on sparkSchema.fields(1).nullable).
This uses the existing symbols CobolSchema.builder, withGenerateCorruptFields,
getSparkSchema and the "_corrupt_fields" field.

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

Make '_corrupt_records' a nullable field.

01b76d1

yruslan force-pushed the feature/821-corrupt_records_nullable_array branch from c217345 to 01b76d1 Compare February 20, 2026 13:37

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

yruslan merged commit 904c05f into master Feb 20, 2026
7 checks passed

yruslan deleted the feature/821-corrupt_records_nullable_array branch February 20, 2026 13:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

#851 Make '_corrupt_records' a nullable field.#823

#851 Make '_corrupt_records' a nullable field.#823
yruslan merged 1 commit intomasterfrom
feature/821-corrupt_records_nullable_array

yruslan commented Feb 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 20, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

yruslan commented Feb 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 20, 2026

JaCoCo code coverage report - 'cobol-parser'

Uh oh!

github-actions bot commented Feb 20, 2026

JaCoCo code coverage report - 'spark-cobol'

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yruslan commented Feb 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 20, 2026 •

edited

Loading