Skip to content

[Security] Disable external entity processing in XML upload to prevent XXE#4119

Open
zhoujinsong wants to merge 1 commit intoapache:masterfrom
zhoujinsong:fix/xxe-xml-upload
Open

[Security] Disable external entity processing in XML upload to prevent XXE#4119
zhoujinsong wants to merge 1 commit intoapache:masterfrom
zhoujinsong:fix/xxe-xml-upload

Conversation

@zhoujinsong
Copy link
Contributor

What changes were proposed in this pull request?

When users upload XML configuration files (e.g. core-site.xml, hdfs-site.xml) via the AMS dashboard, the uploaded bytes are parsed by Hadoop Configuration.addResource(). Although the current classpath includes Woodstox (which does not expand external entities by default), this implicit protection is fragile — it can silently break if the dependency is excluded due to a version conflict in the future.

This patch adds explicit XXE protection at the Amoro layer before delegating to Hadoop Configuration, ensuring the security guarantee holds regardless of the underlying XML parser implementation on the classpath.

Why are the changes needed?

Without explicit protection, a malicious user could upload a crafted XML file containing an external entity reference like:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<configuration>&xxe;</configuration>

If the XXE implicit protection were ever lost (e.g. Woodstox excluded), this could allow:

  • Arbitrary local file read from the AMS server
  • SSRF (Server-Side Request Forgery) via external URLs in entity references

How was this patch tested?

Manual testing: uploaded a well-formed XML file (accepted) and an XML file with an external entity reference (rejected with error response).

Does this PR introduce any user-facing change?

No. Legitimate Hadoop XML configuration files (core-site.xml, hdfs-site.xml, hive-site.xml) do not use external entities. Valid files continue to upload successfully.

…t XXE

When uploading XML configuration files (e.g. core-site.xml, hdfs-site.xml),
the uploaded bytes are parsed by Hadoop's Configuration.addResource().
Although the current classpath includes Woodstox (which does not expand
external entities by default), this implicit protection is fragile and can
silently break if dependencies change.

This patch explicitly disables external entity processing using a
hardened XMLInputFactory before delegating to Hadoop Configuration,
ensuring XXE protection regardless of the underlying XML parser
implementation.

Changes:
- Pre-validate the XML stream with XMLInputFactory configured to:
  - IS_SUPPORTING_EXTERNAL_ENTITIES = false
  - SUPPORT_DTD = false
  - FEATURE_SECURE_PROCESSING = true
- Switch to Configuration(false) to avoid loading default Hadoop configs
@github-actions github-actions bot added the module:ams-server Ams server module label Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ams-server Ams server module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant