v9.7.0 by johngrimes · Pull Request #2622 · aehrc/pathling

johngrimes · 2026-05-19T09:53:43Z

No description provided.

Introduces let() binding in SqlFunctions to materialise non-deterministic Spark column expressions exactly once per row, and applies it throughout the fhirpath and sql packages to prevent TraceExpression side effects from firing multiple times where the same operand appears in both branches of a when() expression. Adds a RepeatedSqlEvaluation checkstyle rule (RegexpMultiline) to catch accidental duplicate SQL evaluation at compile time, scoped to the fhirpath and sql package trees. Includes regression tests for all fixed evaluation paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@throws

- Fix join() to use the lambda-bound parameter instead of getValue(), preventing duplicate evaluation of non-deterministic operands, and add a single-fire regression test with a string-array dataset. - Replace nullif(c, array()) in normaliseNull() with let() + size() check to avoid relying on element-type equality, which fails for MapType array elements in ANSI mode. - Document the 400-character false-negative trade-off in the RepeatedSqlEvaluation checkstyle rule comment. - Add @throws AnalysisException to SqlFunctions.let() Javadoc for the aggregate/window constraint. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix "Two" → "Three" design constraints count in checkstyle comment - Remove incorrect cdUnit as let() lambda parameter example; clarify that leftR/rightR are short local variables, not lambda parameters - Update PathlingContext Javadoc: add toBoolean() to list of affected helpers and change "post-3.0" to "Spark 3.0+" for consistency - Update SqlFunctions class Javadoc to mention union alongside deduplication - Add testNormaliseNull() and nullArray() case to testSingular() in DefaultRepresentationTest to cover semantic correctness after rewrite - Add trace-count regression guards for: convertToDateTime, convertToTime, and IMPLIES right-operand in TraceFunctionTest Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Backtick-quoted code references and removed bare semicolons from explanatory comments in ColumnRepresentation and TraceFunctionTest to avoid triggering java:S125. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Extend the let() materialisation pattern to five more sites where a Column parameter was referenced multiple times in a single Spark SQL expression tree, causing nondeterministic expressions (e.g. trace()) to fire once per reference instead of once per row. Fixed sites: - ArrayElementWiseColumnEquality.performArrayComparison() - QuantityComparator.wrap() - TemporalComparator.implementWithSql() - ReferenceValue.validateTypeFormat() - ValidationLogic.validateConversionToBoolean() Also adds a binary let(Column, Column, BinaryOperator<Column>) overload to SqlFunctions to reduce verbosity when materialising two operands. Each fix is covered by a new trace-count regression test that wraps the input column in TraceExpression and asserts exactly one fire per row. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

`sanitiseRow` only handled nested `Row` values but fell through for `scala.collection.Seq` values (how Spark represents array fields), so synthetic fields like `_fid` and null-valued fields leaked into the JSON output whenever a FHIRPath expression returned a type containing an array of structs (e.g. `CodeableConcept.coding`). Adds a new branch that iterates over `Seq` elements, recursively sanitises any `Row` elements, and updates the parent field's `ArrayType` elementType to the sanitised element schema so that `Row.json()` positional mapping remains correct. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Pre-size the ArrayList with the known sequence length, remove a redundant what-comment, and extract the shared coding row fixture into a helper to eliminate copy-paste between two test classes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Locks in that sanitiseRow correctly renders JSON for an array of structs where elements differ in which fields are null, and therefore have different post-sanitisation schemas.

Added suppressions for newly reported CVEs across core libraries, server, and site scopes following contextual impact assessment. All suppressed findings are either not bundled in the distribution or have unreachable vulnerable code paths. Upgraded mermaid from 11.12.2 to 11.15.0 via package.json override to fix four MEDIUM CVEs (CSS/HTML injection and DoS in diagram rendering) in the deployed static site. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@afterall

Three unit tests (SqlQueryResultStreamerTest, ViewRegistrationServiceTest, LibraryReferenceResolverTest.CanonicalReferences) called spark.stop() in @afterall, which destroyed the JVM-wide SparkContext and caused ViewDefinitionSearchTest and ViewDefinitionCreateTest to fail intermittently depending on test execution order. Converted all three tests to @SpringBootUnitTest so they receive the shared SparkSession via Spring injection, consistent with every other Spark-dependent test in the server module. The manually created sessions and @afterall teardowns are removed entirely. Closes #2615 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

BulkSubmitProviderTest was installing a Mockito mock as the active Spring SecurityContext in @beforeeach but never clearing it. Under JUnit 5 parallel execution the mock leaked onto adjacent threads, causing SearchProviderAuthTest to inherit a mock context in which setAuthentication() is a no-op, so checkHasAuthority() would throw "Token not present". Closes #2617. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Hadoop Path.toUri() drops the empty authority on file:// paths built via new Path(parent, child), yielding file:/path. Files discovered later via fs.listFiles + fs.makeQualified preserve the empty authority and come back as file:///path. UrlAllowlist's string-prefix match then rejects the downloaded file URLs against the staging-directory prefix, failing the import with an AccessDeniedError after the bulk export has already completed. Build the prefix via fs.getFileStatus so both sides use the same canonical URI form. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The shared static warehouse @tempdir is cleaned in @AfterEach, but Spark's catalog cache and Delta's global DeltaLog cache still hold references to the deleted tables. The next test rebuilds the warehouse from test fixtures, but isDeltaTable returns false against the stale log, so the import falls through to an ERROR_IF_EXISTS write that collides with the freshly-copied directory and fails with DELTA_PATH_EXISTS. Clear both caches before deleting files so cleanup restores both the on-disk and in-memory state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The test runs under the integration-test profile with PNP credentials configured and auth enabled, but its requests were missing the Authorization header. The pre-existing 401 was hidden by an earlier PNP allowlist bug; with that fix in place, the auth interlock now rejects the request before the poisoning scenario can exercise. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The integration tests configured pathling.bulk-submit.allowable-sources to bare http://localhost via @TestPropertySource. The URI-aware UrlAllowlist resolves that prefix to effective port 80 and no longer matches the dynamic http://localhost:{wireMockPort} the tests actually use. Move the property into @DynamicPropertySource so it picks up the WireMock port at runtime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The test relied on the WebTestClient default response timeout of 5 s, which is shorter than the cold-start latency of the first POST against a freshly started Spring Boot context with a Delta-backed warehouse. Match the 60 s timeout already used by the sibling integration tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix: Resolve intermittent test failures caused by shared state leaks

Bumps the org.hl7.fhir.r4 and org.hl7.fhir.utilities overrides from 6.9.6 to 6.9.7 in the core library POM to address CVE-2026-45367 (ReDoS via FHIRPath matches()/replaceMatches()). Adds explicit tomcat-embed-core/el/websocket overrides at 10.1.55 in the server POM to address five Tomcat CVEs (CVE-2026-41293 CRITICAL, CVE-2026-43512 CRITICAL, CVE-2026-41284 HIGH, CVE-2026-42498 HIGH, CVE-2026-43513 HIGH) introduced by Spring Boot 3.5.14 bundling 10.1.54. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Extract switch-based field dispatch and the array-of-struct branch into separate helper methods so sanitiseRow itself stays simple and the type-dispatch chain is expressed as a Java 21 switch expression. Resolves SonarCloud java:S3776 and java:S6880 findings on SingleInstanceEvaluator.

Previously FhirViewShareableComplianceTest only ran via the sof-compliance-test surefire execution, which disables exclusions and sets testFailureIgnore=true so the SoF compliance report can capture every result without failing the build. As a side effect, regressions in supported features were not detected by normal CI runs. The default test execution now picks up the compliance suite with the maintained exclusion list, so a failure in a supported case breaks the build. The unconstrained report run moves behind an opt-in sofComplianceReport profile, activated by the release workflow so the compliance report continues to be produced. %rowIndex cases are added to the exclusion set because that feature is not yet supported and is tracked separately.

The shareable compliance suite previously ran only in a report-only execution that ignored test failures, so regressions in features Pathling supports could land unnoticed. The suite now runs in the default build with the maintained exclusion list, failing the build on any regression in a supported case. The report-only run moves to an opt-in profile activated by the release workflow, so the SoF compliance report continues to be produced. %rowIndex cases are added to the exclusion list because that feature is not yet supported and is tracked separately.

When a repeat directive's traversal followed a path whose runtime value was null (e.g. multi-path repeat where one branch produced no value at certain nodes), the extractor returned a null array and Spark's Concat propagated the null upward, producing wrong results. Wrapping the extractor's output in Coalesce(_, []) keeps the typed empty array in place of nulls and lets the surrounding Concat assemble the projection correctly. Resolves the previously failing repeat compliance case "multi-path repeat inside forEach" in the expanded SQL on FHIR v2 test suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When a repeat directive recursively traversed past the encoder's maxNestingLevel, Pathling's FHIRPath evaluator continued resolving the path against HAPI definitions while the Catalyst schema no longer had the field. The FIELD_NOT_FOUND fallback emitted an untyped empty array, which crashed StructProduct with "NullType cannot be cast to StructType" whenever a sibling typed array combined with the empty result. The expected element type is now derived from the repeat's projection clause (declared sqlType, FHIR type, or materialised column type, wrapped in ArrayType for collection columns) and threaded through transformTree. When the root traversal hits FIELD_NOT_FOUND, the fallback emits a typed empty array matching the declared column shape, so downstream StructProduct sees a consistent element type and combines correctly. Resolves the previously failing repeat compliance cases (repeat inside repeat, triple-nested repeat, repeat with forEach with repeat) in the expanded SQL on FHIR v2 test suite. The symmetric forEach-past-cap case is tracked separately and excluded from the regression suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…d-empty Add regression tests covering three type-mapping bugs silently fixed by routing through FhirPathType (BASE64BINARY, DECIMAL, INSTANT). Add a behavioural test for the typed-empty fallback path in transformTree. Improve comments in RepeatSelection, Expressions, and ProjectedColumn to capture non-obvious intent. Update design.md to reflect as-built decisions (getSqlType on existing records, FhirPathType delegation). Archive the completed change and sync the repeat-directive delta spec to the main specs directory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add regression cases for EqualityOperator.handleNonEquivalentTypes covering both left and right traced operands. The branch is currently correct because isEmpty() references its operand once, but the asymmetry with handleEquivalentTypes (which already let()-wraps both sides) means a future refactor that adds a second operand reference would silently re-introduce issue #2594. The new assertions document the single-fire invariant at the FHIRPath surface.

Eliminate trace() entry duplication in ColumnRepresentation

fix: Ensure repeat directive implementation passes expanded test coverage

Extend sanitiseRow to recurse into arrays of structs

piotrszul and others added 19 commits May 11, 2026 20:00

chore: Update core version to 9.7.0-SNAPSHOT

c825197

fix: Suppress SonarCloud false-positive commented-code warnings

d006b2c

Backtick-quoted code references and removed bare semicolons from explanatory comments in ColumnRepresentation and TraceFunctionTest to avoid triggering java:S125. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: Add scan-trace-duplicates slash command for issue #2594

4f9468e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: Cover heterogeneous null patterns across array-of-struct elements

dc550b8

Locks in that sanitiseRow correctly renders JSON for an array of structs where elements differ in which fields are null, and therefore have different post-sanitisation schemas.

Merge pull request #2616 from aehrc/issue/2615

6569c8c

fix: Resolve intermittent test failures caused by shared state leaks

johngrimes self-assigned this May 19, 2026

johngrimes added the release Pull request that represents a new release label May 19, 2026

johngrimes added this to Pathling May 19, 2026

github-project-automation Bot moved this to Backlog in Pathling May 19, 2026

johngrimes moved this from Backlog to In progress in Pathling May 19, 2026

piotrszul deployed to maven-central May 19, 2026 12:02 — with GitHub Actions Active

johngrimes and others added 4 commits May 21, 2026 12:06

piotrszul and others added 5 commits May 22, 2026 18:03

refactor: Scope safeExtractor to mapChildren where it is used

83f93dd

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge pull request #2608 from aehrc/issue/2594

b22ed71

Eliminate trace() entry duplication in ColumnRepresentation

johngrimes had a problem deploying to maven-central May 22, 2026 18:20 — with GitHub Actions Failure

Merge pull request #2624 from aehrc/issue/2619

2be22d9

fix: Ensure repeat directive implementation passes expanded test coverage

johngrimes had a problem deploying to maven-central May 22, 2026 18:24 — with GitHub Actions Failure

Merge pull request #2599 from aehrc/issue/2592

4abfd46

Extend sanitiseRow to recurse into arrays of structs

johngrimes had a problem deploying to maven-central May 22, 2026 18:25 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v9.7.0#2622

v9.7.0#2622
johngrimes wants to merge 31 commits into
mainfrom
release/9.7.0

johngrimes commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

johngrimes commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants