Skip to content

[Proposal] New Linux telemetry categories: Anonymous File Activity an…#199

Open
Aegrah wants to merge 1 commit into
tsale:mainfrom
Aegrah:elastic-defend-new-contributions-memfd-entropy
Open

[Proposal] New Linux telemetry categories: Anonymous File Activity an…#199
Aegrah wants to merge 1 commit into
tsale:mainfrom
Aegrah:elastic-defend-new-contributions-memfd-entropy

Conversation

@Aegrah
Copy link
Copy Markdown

@Aegrah Aegrah commented May 18, 2026

EDR Telemetry Pull Request

Contribution Details

This PR proposes two new Linux telemetry categories with four sub-categories total, plus the evidence and a generator for each. It is a sibling to #198 — that PR scores Elastic Defend against the existing Linux taxonomy, this PR proposes the additions to that taxonomy itself, so reviewing the two changes stays clean and independent.

The new sub-categories are:

Parent (Telemetry Feature Category) Sub-Category Elastic Defend status Notes
Anonymous File Activity Memory-Backed File Creation Yes Default since Elastic Defend 9.1.4. Backed by memfd_create(2) telemetry.
File Metadata File Entropy Via EnablingTelemetry Requires linux.advanced.events.populate_file_data: true (added 9.3+).
File Metadata File Header Bytes Via EnablingTelemetry Same toggle as File Entropy.
File Metadata File Size Via EnablingTelemetry Same toggle as File Entropy & File Header Bytes.

All other vendors are marked Pending Response. Following the precedent from #130 (Win32 API Telemetry) and #150 (macOS EDR Categories and Sub Categories), other contributors can fill those in as part of the normal review.

Telemetry Validation

  • Official documentation (links inline below)
  • Screenshots attached
  • Sanitized logs provided
  • Private documentation (will share confidentially)

Type of Contribution

  • Adding telemetry information for an existing EDR product
  • Adding a new EDR product that meets eligibility criteria
  • Proposing new event categories/sub-categories
  • Documentation improvement
  • Tool enhancement (generator scripts for the two new categories)

Validation Details

EDR Product Information

  • EDR Product Name: Elastic Defend
  • EDR Version: 9.1.4+ (Memory-Backed File Creation, via memfd_create), 9.3.0+ (File Entropy / File Header Bytes / File Size)
  • Operating System(s) Tested: Linux (per the Elastic support matrix)

Testing Methodology

Each new sub-category is exercised by a dedicated generator added under Tools/Telemetry-Generator/Linux/complex/:

  • memfd_create_exec.py calls memfd_create(2) with a representative set of flag combinations (MFD_CLOEXEC, MFD_ALLOW_SEALING, MFD_NOEXEC_SEAL, MFD_EXEC) and stages an execution from /proc/self/fd/<fd> so process.Ext.memfd.flag.* and the memfd-backed exec path can be observed.
  • file_metadata.py writes a small set of fixtures with known magic-byte headers (random/no-header, repeating ASCII, ELF, ZIP, XZ, GZIP, PDF) and then modifies/renames them so EDRs that compute file.Ext.entropy and file.Ext.header_bytes on file events have deterministic input.

Both are wired into lnx_telem_gen.py as MemfdCreate and FileMetadata events.


Proposed taxonomy additions

1. Anonymous File ActivityMemory-Backed File Creation

This sub-category covers the creation of anonymous, memory-backed files via memfd_create(2) (man page) — the primary Linux primitive for creating a file-like handle that can be fexecved, sealed, or shared without ever hitting the filesystem. It is the substrate behind most modern fileless loaders on Linux (Bash bashfd loaders, ELF reflective loaders, several public LKM loaders, etc.).

Why it deserves its own sub-category rather than being folded into an existing one:

  • Folding it into File Creation is misleading — the resulting file handle is anonymous, never traverses a filesystem, and is invisible to file-creation telemetry that hooks open(2) / openat(2) family syscalls.
  • Folding it into Process Creation is also misleading — memfd_create() itself does not execute anything. Execution happens later, when something execveat()s the resulting fd, and at that point a normal Process Creation event is already emitted.
  • Treating it as its own primitive makes the data source visible to the scoring matrix, surfaces vendor coverage gaps, and lets defenders compare like-for-like fileless-loader visibility across products.

Elastic Defend evidence

  • Elastic Defend version: 9.1.4+
  • Telemetry path: enabled by default; no advanced setting required.
  • Event source: event.action: memfd_create on process events.

Fields exposed:

  • process.executable, process.command_line (source process)
  • event.action (memfd_create)
  • process.Ext.memfd.name (the user-supplied memfd label)
  • process.Ext.memfd.flag.allow_seal
  • process.Ext.memfd.flag.doexec
  • process.Ext.memfd.flag.exec
  • process.Ext.memfd.flag.hugetlb
  • process.Ext.memfd.flag.noexec_seal

KQL query used

event.action: "memfd_create" and host.os.type: "linux"

Sanitised raw event (multiple invocations across fileless loaders, perl and python3):

process.executable event.action process.command_line process.Ext.memfd.name …flag.allow_seal …flag.doexec …flag.exec …flag.hugetlb …flag.noexec_seal
/home/ruben_groenewoud/fileless_loader/loader3 memfd_create ./loader3 ./hello.ko memfd:payload false true false false false
/home/ruben_groenewoud/fileless_loader/loader2 memfd_create ./loader2 ./hello.ko memfd:payload false true false false false
/home/ruben_groenewoud/fileless_loader/loader1 memfd_create ./loader1 ./hello.ko memfd:payload false true false false false
/usr/bin/perl memfd_create perl memfd: false true false false false
/usr/bin/python3.12 memfd_create python3 memfd: false true false false false
/usr/bin/python3.12 memfd_create python3 output.py memfd: false true false false false
image

A note on memfd-backed execution: the screenshot shows the syscall surface only. When the memfd-backed object is actually executed (e.g. execveat(fd, "", argv, envp, AT_EMPTY_PATH)), Elastic Defend emits a normal Process Creation event with the executable path resolved through /proc/<pid>/fd/<n>. That is already covered by the existing Process Creation sub-category, so this PR proposes a single sub-category (Memory-Backed File Creation) and intentionally does not add a separate Anonymous File Execution row, to keep the proposal tight. Happy to split it out if reviewers prefer that.

2. File MetadataFile Entropy, File Header Bytes, and File Size

These three fields are file enrichment, not distinct system actions, so they shouldn't be merged into File Creation / File Modification scoring (a vendor that emits a file event without entropy/header/size is still doing the system-action half of the work). They sit cleanly under their own File Metadata parent. The naming mirrors macOS, which already has a File Metadata parent category (MD5 / SHA-256 / Fuzzy Hash) introduced in #150. The existing Linux Hash Algorithms parent is intentionally left untouched in this PR; whether to migrate it under File Metadata later is a separate decision for the maintainer.

Why each field is worth its own row:

  • File Entropy — Shannon entropy is the cheapest packed/encrypted-payload heuristic available on the file event itself. Without it, defenders have to either re-read the file (which they often cannot, post-deletion) or wait for a downstream sandbox.
  • File Header Bytes — the leading bytes of the file are sufficient to reconstruct the magic/signature and detect MIME-vs-extension mismatches at ingest time, again without needing the file to still exist on disk.
  • File Size — basic, but it's the only field that lets you tie a delivered payload back to a specific dropper without a hash. Lower scoring weight reflects how widely it's already collected.

Elastic Defend evidence

  • Elastic Defend version: 9.3+ for File Entropy / File Header Bytes / File Size.
  • Telemetry path:
    • File Entropy, File Header Bytes & File Size are opt-in via the integration policy advanced setting linux.advanced.events.populate_file_data: true (default false). This is exactly the situation the project's Via EnablingTelemetry (🎚️) status was designed for, so both rows use that value.
  • Documentation tooltip from the Elastic Defend integration policy (see the screenshot below):

    "Enable collection of entropy and header bytes on file events. Default: false." — version badge 9.3+.

image

Fields exposed (on file events):

  • file.Ext.entropy (Shannon entropy 0.0–8.0) — opt-in
  • file.Ext.header_bytes (hex-encoded leading bytes; sufficient to reconstruct file magic / signature) — opt-in
  • file.size (size of the file in bytes) — opt-in
  • file.path, file.name, file.extension (already part of the standard file.* ECS fields)
  • event.action: creation, rename, etc.

KQL queries used

event.action: ("creation" or "rename") and host.os.type: "linux" and file.Ext.entropy: *
event.action: ("creation" or "rename") and host.os.type: "linux" and file.Ext.header_bytes: *

Sanitised raw events (creation + rename, mixed file types):

process.executable event.action file.name file.path file.Ext.entropy file.Ext.header_bytes file.extension file.size
/usr/bin/apt-get creation eipp.log.xz /var/log/apt/eipp.log.xz 7.812 fd377a585a000000a4e6d6b446d2b021c01 xz 0
/usr/bin/apt-get rename pkgcache.bin /var/cache/apt/pkgcache.bin 2.095 dc76fe9810000000a8021c2c40385018 bin 64,108,026
/usr/bin/apt-get rename srcpkgcache.bin /var/cache/apt/srcpkgcache.bin 2.05 dc76fe9810000000a8021c2c40385018 bin 64,108,766
/usr/lib/systemd/systemd-journald rename 9:9209563 /run/systemd/journal/streams/9:9209563 5.385 232054660973206973207072697 6... (null) 208
/usr/lib/systemd/systemd-journald rename 9:9211516 /run/systemd/journal/streams/9:9211516 5.437 232054660973206973207072697 6... (null) 207
/usr/bin/python3.12 rename pkgcache.bin /var/lib/ubuntu-advantage/apt-esm/var/cache/apt/pkgcache.bin 1.485 dc76fe9810000000a8021c2c40385018 bin 2,377,928
/usr/bin/python3.12 rename srcpkgcache.bin /var/lib/ubuntu-advantage/apt-esm/var/cache/apt/srcpkgcache.bin 1.409 dc76fe9810000000a8021c2c40385018 bin 2,377,834

(See screenshot 3 attached to this PR. The file.size column in that screenshot is the same field this PR proposes to track as File Size — present on every row, including the rows where the entropy/header-bytes opt-in fields are populated and the rows where they are not.)

image

Files changed

  • EDR_telem_linux.json — adds four new rows. Existing vendor column ordering is preserved (no convert.py round-trip applied, so no diff churn).
  • partially_value_explanations_linux.json — adds four matching stub rows so the file shape stays in sync with the data file (len() parity is maintained: 34 rows in each).
  • Tools/compare.py — adds scoring weights to LINUX_CATEGORIES_VALUED:
    • "Memory-Backed File Creation": 1
    • "File Entropy": 0.5
    • "File Header Bytes": 0.5
    • "File Size": 0.2
    • Memory-Backed File Creation is weighted on par with other primary execution-primitive sub-categories such as Driver Load and Process Tampering. File Entropy and File Header Bytes get half-weight to reflect that they are opt-in enrichment fields rather than mandatory event data. File Size is intentionally lower at 0.2 (matching Agent Start) since it's basic enrichment most vendors will already collect. Happy to adjust any of these during review.
  • Tools/Telemetry-Generator/Linux/complex/memfd_create_exec.py — new generator.
  • Tools/Telemetry-Generator/Linux/complex/file_metadata.py — new generator.
  • Tools/Telemetry-Generator/Linux/lnx_telem_gen.py — registers MemfdCreate and FileMetadata as available event functions.
  • Tools/Telemetry-Generator/Linux/LINUX_TELEMETRY_GENERATOR_GUIDE.md — documents the two new event names.

Files intentionally not modified

  • mitre_att&ck_mappings.json — this file currently only contains Windows sub-categories (e.g. Driver Loaded rather than the Linux Driver Load, Process Tampering Activity rather than Process Tampering) and was not updated as part of macOS EDR Categories and Sub Categories  #150 either. Happy to extend it as part of a follow-up if maintainers want a Linux MITRE mapping pass.

Validation steps run locally

  • python -c "import json; json.load(open('EDR_telem_linux.json'))" — JSON parses cleanly, 34 rows.
  • partially_value_explanations_linux.json row count matches EDR_telem_linux.json row count.
  • python -m py_compile on Tools/compare.py, both new generators, and lnx_telem_gen.py — clean.

Additional Notes

This proposal intentionally stays small (4 rows + scoring weights + generators). The bigger taxonomy questions — whether Hash Algorithms should be folded under File Metadata long-term, and whether memfd-backed execution deserves its own row separate from Process Creation — are flagged in the relevant sections above and can be settled during review without blocking the addition of the underlying telemetry.

Sibling PR: #198 ("[Update] EDR Telemetry for Linux - Elastic Defend") covers the scoring updates for the four already-existing categories Elastic Defend now satisfies (DNS Query, Driver Load, Process Access, Process Tampering). This PR is intentionally kept orthogonal to that one so the existing-category scoring update can be reviewed and merged independently of the taxonomy expansion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant