[Proposal] New Linux telemetry categories: Anonymous File Activity an…#199
Open
Aegrah wants to merge 1 commit into
Open
[Proposal] New Linux telemetry categories: Anonymous File Activity an…#199Aegrah wants to merge 1 commit into
Aegrah wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
EDR Telemetry Pull Request
Contribution Details
This PR proposes two new Linux telemetry categories with four sub-categories total, plus the evidence and a generator for each. It is a sibling to #198 — that PR scores Elastic Defend against the existing Linux taxonomy, this PR proposes the additions to that taxonomy itself, so reviewing the two changes stays clean and independent.
The new sub-categories are:
Telemetry Feature Category)Sub-CategoryYesmemfd_create(2)telemetry.Via EnablingTelemetrylinux.advanced.events.populate_file_data: true(added 9.3+).Via EnablingTelemetryVia EnablingTelemetryAll other vendors are marked
Pending Response. Following the precedent from #130 (Win32 API Telemetry) and #150 (macOS EDR Categories and Sub Categories), other contributors can fill those in as part of the normal review.Telemetry Validation
Type of Contribution
Validation Details
EDR Product Information
memfd_create), 9.3.0+ (File Entropy / File Header Bytes / File Size)Testing Methodology
Each new sub-category is exercised by a dedicated generator added under
Tools/Telemetry-Generator/Linux/complex/:memfd_create_exec.pycallsmemfd_create(2)with a representative set of flag combinations (MFD_CLOEXEC,MFD_ALLOW_SEALING,MFD_NOEXEC_SEAL,MFD_EXEC) and stages an execution from/proc/self/fd/<fd>soprocess.Ext.memfd.flag.*and the memfd-backed exec path can be observed.file_metadata.pywrites a small set of fixtures with known magic-byte headers (random/no-header, repeating ASCII, ELF, ZIP, XZ, GZIP, PDF) and then modifies/renames them so EDRs that computefile.Ext.entropyandfile.Ext.header_byteson file events have deterministic input.Both are wired into
lnx_telem_gen.pyasMemfdCreateandFileMetadataevents.Proposed taxonomy additions
1.
Anonymous File Activity→Memory-Backed File CreationThis sub-category covers the creation of anonymous, memory-backed files via
memfd_create(2)(man page) — the primary Linux primitive for creating a file-like handle that can befexecved, sealed, or shared without ever hitting the filesystem. It is the substrate behind most modern fileless loaders on Linux (Bash bashfd loaders, ELF reflective loaders, several public LKM loaders, etc.).Why it deserves its own sub-category rather than being folded into an existing one:
File Creationis misleading — the resulting file handle is anonymous, never traverses a filesystem, and is invisible to file-creation telemetry that hooksopen(2)/openat(2)family syscalls.Process Creationis also misleading —memfd_create()itself does not execute anything. Execution happens later, when somethingexecveat()s the resulting fd, and at that point a normal Process Creation event is already emitted.Elastic Defend evidence
event.action: memfd_createonprocessevents.Fields exposed:
process.executable,process.command_line(source process)event.action(memfd_create)process.Ext.memfd.name(the user-supplied memfd label)process.Ext.memfd.flag.allow_sealprocess.Ext.memfd.flag.doexecprocess.Ext.memfd.flag.execprocess.Ext.memfd.flag.hugetlbprocess.Ext.memfd.flag.noexec_sealKQL query used
Sanitised raw event (multiple invocations across fileless loaders, perl and python3):
A note on memfd-backed execution: the screenshot shows the syscall surface only. When the memfd-backed object is actually executed (e.g.
execveat(fd, "", argv, envp, AT_EMPTY_PATH)), Elastic Defend emits a normalProcess Creationevent with the executable path resolved through/proc/<pid>/fd/<n>. That is already covered by the existingProcess Creationsub-category, so this PR proposes a single sub-category (Memory-Backed File Creation) and intentionally does not add a separateAnonymous File Executionrow, to keep the proposal tight. Happy to split it out if reviewers prefer that.2.
File Metadata→File Entropy,File Header Bytes, andFile SizeThese three fields are file enrichment, not distinct system actions, so they shouldn't be merged into
File Creation/File Modificationscoring (a vendor that emits a file event without entropy/header/size is still doing the system-action half of the work). They sit cleanly under their ownFile Metadataparent. The naming mirrors macOS, which already has aFile Metadataparent category (MD5 / SHA-256 / Fuzzy Hash) introduced in #150. The existing LinuxHash Algorithmsparent is intentionally left untouched in this PR; whether to migrate it underFile Metadatalater is a separate decision for the maintainer.Why each field is worth its own row:
Elastic Defend evidence
File Entropy/File Header Bytes/File Size.File Entropy,File Header Bytes&File Sizeare opt-in via the integration policy advanced settinglinux.advanced.events.populate_file_data: true(defaultfalse). This is exactly the situation the project'sVia EnablingTelemetry(🎚️) status was designed for, so both rows use that value.Fields exposed (on
fileevents):file.Ext.entropy(Shannon entropy 0.0–8.0) — opt-infile.Ext.header_bytes(hex-encoded leading bytes; sufficient to reconstruct file magic / signature) — opt-infile.size(size of the file in bytes) — opt-infile.path,file.name,file.extension(already part of the standardfile.*ECS fields)event.action:creation,rename, etc.KQL queries used
Sanitised raw events (creation + rename, mixed file types):
(See screenshot 3 attached to this PR. The
file.sizecolumn in that screenshot is the same field this PR proposes to track asFile Size— present on every row, including the rows where the entropy/header-bytes opt-in fields are populated and the rows where they are not.)Files changed
EDR_telem_linux.json— adds four new rows. Existing vendor column ordering is preserved (noconvert.pyround-trip applied, so no diff churn).partially_value_explanations_linux.json— adds four matching stub rows so the file shape stays in sync with the data file (len()parity is maintained: 34 rows in each).Tools/compare.py— adds scoring weights toLINUX_CATEGORIES_VALUED:"Memory-Backed File Creation": 1"File Entropy": 0.5"File Header Bytes": 0.5"File Size": 0.2Memory-Backed File Creationis weighted on par with other primary execution-primitive sub-categories such asDriver LoadandProcess Tampering.File EntropyandFile Header Bytesget half-weight to reflect that they are opt-in enrichment fields rather than mandatory event data.File Sizeis intentionally lower at0.2(matchingAgent Start) since it's basic enrichment most vendors will already collect. Happy to adjust any of these during review.Tools/Telemetry-Generator/Linux/complex/memfd_create_exec.py— new generator.Tools/Telemetry-Generator/Linux/complex/file_metadata.py— new generator.Tools/Telemetry-Generator/Linux/lnx_telem_gen.py— registersMemfdCreateandFileMetadataas available event functions.Tools/Telemetry-Generator/Linux/LINUX_TELEMETRY_GENERATOR_GUIDE.md— documents the two new event names.Files intentionally not modified
mitre_att&ck_mappings.json— this file currently only contains Windows sub-categories (e.g.Driver Loadedrather than the LinuxDriver Load,Process Tampering Activityrather thanProcess Tampering) and was not updated as part of macOS EDR Categories and Sub Categories #150 either. Happy to extend it as part of a follow-up if maintainers want a Linux MITRE mapping pass.Validation steps run locally
python -c "import json; json.load(open('EDR_telem_linux.json'))"— JSON parses cleanly, 34 rows.partially_value_explanations_linux.jsonrow count matchesEDR_telem_linux.jsonrow count.python -m py_compileonTools/compare.py, both new generators, andlnx_telem_gen.py— clean.Additional Notes
This proposal intentionally stays small (4 rows + scoring weights + generators). The bigger taxonomy questions — whether
Hash Algorithmsshould be folded underFile Metadatalong-term, and whether memfd-backed execution deserves its own row separate fromProcess Creation— are flagged in the relevant sections above and can be settled during review without blocking the addition of the underlying telemetry.Sibling PR: #198 ("[Update] EDR Telemetry for Linux - Elastic Defend") covers the scoring updates for the four already-existing categories Elastic Defend now satisfies (DNS Query, Driver Load, Process Access, Process Tampering). This PR is intentionally kept orthogonal to that one so the existing-category scoring update can be reviewed and merged independently of the taxonomy expansion.