Skip to content

Resolve job directory file system from warehouse URI#2613

Open
johngrimes wants to merge 14 commits into
mainfrom
issue/2612
Open

Resolve job directory file system from warehouse URI#2613
johngrimes wants to merge 14 commits into
mainfrom
issue/2612

Conversation

@johngrimes
Copy link
Copy Markdown
Member

Fixes #2612.

The async job cleanup path used Hadoop's default file system (FileSystem.get(conf)) to delete per-job directories, which failed with Wrong FS: s3a://..., expected: file:/// whenever the warehouse was on a non-default scheme such as s3a://. The deletion now goes through the existing JobDirectoryFileSystem so the file system is resolved from the warehouse URI.

Verified locally against s3a://pathling-demo that a job directory is now removed cleanly. Also confirmed that the prior code path reproduces the exact stack trace from the issue.

piotrszul and others added 2 commits May 11, 2026 20:00
The async job cleanup path used Hadoop's default file system to delete
per-job directories, which failed with "Wrong FS" when the warehouse
was on a non-default scheme such as s3a://. Route the deletion through
the existing JobDirectoryFileSystem so the file system is resolved
from the warehouse URI.

Fixes #2612.
@johngrimes johngrimes changed the title fix: Resolve job directory file system from warehouse URI Resolve job directory file system from warehouse URI May 13, 2026
@github-project-automation github-project-automation Bot moved this to Backlog in Pathling May 13, 2026
@johngrimes johngrimes moved this from Backlog to In progress in Pathling May 13, 2026
johngrimes and others added 12 commits May 13, 2026 15:56
The unit-test Spring context does not component-scan the server's io
package, so the JobDirectoryFileSystem dependency required by the test
jobProvider bean could not be resolved. This caused every unit test
importing FhirServerTestConfiguration to fail context loading.

Provide an explicit JobDirectoryFileSystem bean in the test
configuration, built from the running SparkSession and the existing
warehouse and database name properties.
Added suppressions for newly reported CVEs across core libraries, server,
and site scopes following contextual impact assessment. All suppressed
findings are either not bundled in the distribution or have unreachable
vulnerable code paths.

Upgraded mermaid from 11.12.2 to 11.15.0 via package.json override to
fix four MEDIUM CVEs (CSS/HTML injection and DoS in diagram rendering)
in the deployed static site.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three unit tests (SqlQueryResultStreamerTest, ViewRegistrationServiceTest,
LibraryReferenceResolverTest.CanonicalReferences) called spark.stop() in
@afterall, which destroyed the JVM-wide SparkContext and caused
ViewDefinitionSearchTest and ViewDefinitionCreateTest to fail intermittently
depending on test execution order.

Converted all three tests to @SpringBootUnitTest so they receive the shared
SparkSession via Spring injection, consistent with every other Spark-dependent
test in the server module. The manually created sessions and @afterall
teardowns are removed entirely.

Closes #2615

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BulkSubmitProviderTest was installing a Mockito mock as the active Spring
SecurityContext in @beforeeach but never clearing it. Under JUnit 5 parallel
execution the mock leaked onto adjacent threads, causing
SearchProviderAuthTest to inherit a mock context in which setAuthentication()
is a no-op, so checkHasAuthority() would throw "Token not present".

Closes #2617.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hadoop Path.toUri() drops the empty authority on file:// paths built via
new Path(parent, child), yielding file:/path. Files discovered later via
fs.listFiles + fs.makeQualified preserve the empty authority and come
back as file:///path. UrlAllowlist's string-prefix match then rejects
the downloaded file URLs against the staging-directory prefix, failing
the import with an AccessDeniedError after the bulk export has already
completed. Build the prefix via fs.getFileStatus so both sides use the
same canonical URI form.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The shared static warehouse @tempdir is cleaned in @AfterEach, but
Spark's catalog cache and Delta's global DeltaLog cache still hold
references to the deleted tables. The next test rebuilds the warehouse
from test fixtures, but isDeltaTable returns false against the stale
log, so the import falls through to an ERROR_IF_EXISTS write that
collides with the freshly-copied directory and fails with
DELTA_PATH_EXISTS. Clear both caches before deleting files so cleanup
restores both the on-disk and in-memory state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The test runs under the integration-test profile with PNP credentials
configured and auth enabled, but its requests were missing the
Authorization header. The pre-existing 401 was hidden by an earlier
PNP allowlist bug; with that fix in place, the auth interlock now
rejects the request before the poisoning scenario can exercise.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The integration tests configured pathling.bulk-submit.allowable-sources
to bare http://localhost via @TestPropertySource. The URI-aware
UrlAllowlist resolves that prefix to effective port 80 and no longer
matches the dynamic http://localhost:{wireMockPort} the tests
actually use. Move the property into @DynamicPropertySource so it
picks up the WireMock port at runtime.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The test relied on the WebTestClient default response timeout of 5 s,
which is shorter than the cold-start latency of the first POST against a
freshly started Spring Boot context with a Delta-backed warehouse. Match
the 60 s timeout already used by the sibling integration tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix: Resolve intermittent test failures caused by shared state leaks
@sonarqubecloud
Copy link
Copy Markdown

@sonarqubecloud
Copy link
Copy Markdown

@johngrimes johngrimes marked this pull request as ready for review May 21, 2026 10:19
@johngrimes johngrimes requested a review from piotrszul May 21, 2026 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

$import with s3 bucket as source and warehouse url fails with java.lang.IllegalArgumentException: Wrong FS: s3a://..., expected: file:///

2 participants