[Security] Disable external entity processing in XML upload to prevent XXE#4119
Open
zhoujinsong wants to merge 1 commit intoapache:masterfrom
Open
[Security] Disable external entity processing in XML upload to prevent XXE#4119zhoujinsong wants to merge 1 commit intoapache:masterfrom
zhoujinsong wants to merge 1 commit intoapache:masterfrom
Conversation
…t XXE When uploading XML configuration files (e.g. core-site.xml, hdfs-site.xml), the uploaded bytes are parsed by Hadoop's Configuration.addResource(). Although the current classpath includes Woodstox (which does not expand external entities by default), this implicit protection is fragile and can silently break if dependencies change. This patch explicitly disables external entity processing using a hardened XMLInputFactory before delegating to Hadoop Configuration, ensuring XXE protection regardless of the underlying XML parser implementation. Changes: - Pre-validate the XML stream with XMLInputFactory configured to: - IS_SUPPORTING_EXTERNAL_ENTITIES = false - SUPPORT_DTD = false - FEATURE_SECURE_PROCESSING = true - Switch to Configuration(false) to avoid loading default Hadoop configs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
When users upload XML configuration files (e.g.
core-site.xml,hdfs-site.xml) via the AMS dashboard, the uploaded bytes are parsed byHadoop Configuration.addResource(). Although the current classpath includes Woodstox (which does not expand external entities by default), this implicit protection is fragile — it can silently break if the dependency is excluded due to a version conflict in the future.This patch adds explicit XXE protection at the Amoro layer before delegating to Hadoop
Configuration, ensuring the security guarantee holds regardless of the underlying XML parser implementation on the classpath.Why are the changes needed?
Without explicit protection, a malicious user could upload a crafted XML file containing an external entity reference like:
If the XXE implicit protection were ever lost (e.g. Woodstox excluded), this could allow:
How was this patch tested?
Manual testing: uploaded a well-formed XML file (accepted) and an XML file with an external entity reference (rejected with error response).
Does this PR introduce any user-facing change?
No. Legitimate Hadoop XML configuration files (
core-site.xml,hdfs-site.xml,hive-site.xml) do not use external entities. Valid files continue to upload successfully.