[XM Cyber][Entity Inventory] Add Entity Inventory data stream#19550
[XM Cyber][Entity Inventory] Add Entity Inventory data stream#19550muskan-agarwal26 wants to merge 8 commits into
Conversation
…integrations into datastream-entity_inventory
🚀 Benchmarks reportTo see the full report comment with |
| ).do_request().as(refResp, | ||
| refResp.Body.decode_json().as(ro, | ||
| { | ||
| "events": [{"message": "retry"}], |
There was a problem hiding this comment.
🟠 HIGH data_stream/entity_inventory/.../cel.yml.hbs:114
Retry-path event becomes phantom pipeline_error doc
On a 401/419 the CEL program emits {"events": [{"message": "retry"}], ...}. The ingest pipeline has no drop for a message of 'retry' — only for 'Refresh token successful' and 'Refresh token expired, forcing re-auth', which the CEL never produces. So rename_message_to_event_original moves retry into event.original, then json_event_original_into_xm_cyber_entity_inventory tries to parse the bare string retry as JSON, fails, and the pipeline-level on_failure block sets event.kind: pipeline_error and tags the doc preserve_original_event. Every token-refresh cycle therefore lands one phantom error document in logs-xm_cyber.entity_inventory-*.
Recommendation:
Either stop emitting an event on the retry path (mutate cursor only and rely on want_more) or, if a marker event is wanted, drop it in the pipeline. The CEL-only fix:
: (resp.StatusCode == 401 || resp.StatusCode == 419) ?
post_request(
base + "/api/refresh-token",
"application/json",
{"refreshToken": tok.refresh}.encode_json()
).with(
{"Header": {"Content-Type": ["application/json"]}}
).do_request().as(refResp,
refResp.Body.decode_json().as(ro,
{
"events": [],
"cursor": (refResp.StatusCode == 200) ?
{
"access_token": ro.accessToken,
"refresh_token": ro.refreshToken,
"need_reauth": false,
}
:
{
"access_token": "",
"refresh_token": "",
"need_reauth": true,
},
"want_more": true,
}
)
)
🤖 AI-Generated Review | Vera Review Bot
⚠️ Automated review — verify suggestions before applying.
There was a problem hiding this comment.
This review is only half correct; the incorrect half is that emitting a cursor in conjunction with an empty events array will have any effect.
The correct fix is to either update the drop processor to match the retry message, or to add a filebeat drop to the agent template and remove the drop processor.
| - drop: | ||
| description: Drops CEL informational token-refresh success messages from the ingest stream. | ||
| tag: drop_cel_refresh_token_success | ||
| if: ctx.message instanceof String && ctx.message == 'Refresh token successful' |
There was a problem hiding this comment.
🟡 MEDIUM data_stream/entity_inventory/.../default.yml:17
Dead drop processors for refresh-token messages
Both drop_cel_refresh_token_success (message == 'Refresh token successful') and drop_cel_refresh_token_force_reauth (message == 'Refresh token expired, forcing re-auth') match strings the CEL program never produces. The CEL emits 'retry' on the token-refresh path; nothing in this package emits the two literal phrases the drops are guarding against. They are inert filters.
Recommendation:
Remove the stale drops, or replace them with one that matches what this CEL actually emits if the retry event is kept:
- drop:
description: Drops CEL token-refresh retry signal events.
tag: drop_cel_refresh_token_retry
if: ctx.message instanceof String && ctx.message == 'retry'🤖 AI-Generated Review | Vera Review Bot
⚠️ Automated review — verify suggestions before applying.
| @@ -0,0 +1,3 @@ | |||
| {"id":"11405078888731052442","accessKeyCreationDate":"Unknown","podIP":"","ec2PublicIpAddress":"","agentVersion":{"major":1,"minor":55,"patch":2},"agentVersionStr":"1.55.2","arch":"Amd64","cmId":"0000","connectionCounter":160,"customProperties":{"snifferStatus":"Active","snifferStatusChangeable":true,"domainWorkgroup":{"type":"workgroup","data":"workgroup"},"ouComputer":"workgroup","subnetInfo":"172.0.0.0/24","macAddresses":["00:50:56:3D:0A:93"],"ouUser":"workgroup","labels":[{"label":"spooler"},{"label":"device_without_edr"}],"snifferStatusConfiguration":"ForcedEnabled","custom_labels":[{"label":"testLabel"},{"label":" sn : azure Identity : sn "},{"label":"Azure Identity"},{"label":"2test_vmazure virtual machine_azure_test"},{"label":"1azureCRazure Container registry_test"}],"hardwareInfo":{"totalRamMb":"2047","cpuProcessorType":"Intel(R) Xeon(R) Gold 5318Y CPU @ 2.10GHz","cpuCoreCount":1,"cpuCount":1,"cpuManufacturer":"GenuineIntel","cpuSpeedMhz":2095,"systemManufacturer":"VMware, Inc.","systemModel":"VMware Virtual Platform"}},"customerId":"fda93183-19f4-447d-bd49-83633329ee37","disabled":false,"disabledChangedAt":"2025-12-01T05:31:01.103Z","disabledReason":"revivedByCmNodeMgr","firstSeen":"2024-08-07T12:18:26.093Z","hasUpdateAvailable":false,"installationId":"00000000-0000-0000-0000-000000000001","ipv4":[{"data":[192,168,1,203],"type":"Buffer"}],"ipv4Num":[2885681155],"ipv4Str":["192.0.2.0"],"ipv6":[{"data":[253,170,63,62,245,208,0,1,168,44,1,25,76,65,80,190],"type":"Buffer"}],"ipv6Str":["fe80::1c2c:5b3a:97df:13f1"],"lastConnectionTime":"2026-05-03T08:40:06.399Z","lastDisconnectionReason":"Keepalive","lastRebootTime":"2025-05-15T09:33:32.000Z","lastStatusChange":"2026-05-03T08:40:06.399Z","latestPossibleAgentVersion":{"major":1,"minor":55,"patch":2},"latestPossibleAgentVersionStr":"1.55.2","name":"172-0-0-3","nameUppercase":"172-0-0-3","notIncludedInAttacks":false,"os":{"version":{"build":0,"major":10,"minor":0,"patch":18363},"servicePack":{"build":0,"major":0,"minor":0,"patch":0},"distributionName":"","distributionVersion":"","name":"Windows 10 ver 1909"},"osType":"Windows","productType":"Workstation","remoteAddress":"199.203.99.104","securityFlags":["hasSession","hasCachedCredentials"],"status":"active","timeToReviveAt":"2026-05-08T00:00:00Z","type":"agent","typeDisplayName":"Device","hasMatchingSID":false,"lastUpdatedAt":"2026-05-03T09:06:44.280Z","securityFlagsForDisplay":[{"key":"examplekey"}],"southOwner":"south-owner-1","domainName":"workgroup","labels":[{"id":"testLabel","type":"custom"},{"id":" sn : azure Identity : sn ","type":"custom"},{"id":"Azure Identity","type":"custom"},{"id":"2test_vmazure virtual machine_azure_test","type":"custom"},{"id":"1azureCRazure Container registry_test","type":"custom"},{"id":"!@$TEST:2))","type":"custom"},{"id":" sn : Access Token : sn ","type":"custom"},{"id":"Email Service","type":"custom"},{"id":"shirel-device","type":"custom"},{"id":"felix test","type":"custom"}],"machineId":"ca722442-9a91-849d-0fdd-438e7a0701f1","agentType":"Service","category":"enterprise","xmLabels":[{"id":"Spooler server"},{"id":"Public IP"},{"id":"Device without EDR"}],"importedLabels":["SN Name : 172-0-0-3","SN Created : 2026-04-14 05:15:00"],"entityDetails":{"name":"172-0-0-3","id":"11405078888731052442","isAsset":true,"subType":"windows","subTypeDisplayName":"Device"},"accountId":"702947630755","arn":"arn:aws:ssm:us-east-2:702947630755:parameter/EC2Rescue/Passwords/i-0d056ac1b7c822c92","displayName":"/EC2Rescue/Passwords/i-0d056ac1b7c822c92","entityType":"agent","region":"us-east-2","ruleDisplayName":"702947630755 / /EC2Rescue/Passwords/i-0d056ac1b7c822c92","ssmParameterDataType":"text","ssmParameterDescription":"New local Administrator password for instance i-0d056ac1b7c822c92","ssmParameterKeyId":"alias/aws/ssm","ssmParameterLastModifiedDate":"2021-07-28T08:11:54.200Z","ssmParameterLastModifiedUser":"arn:aws:sts::702947630755:assumed-role/AmazonSSMRoleForInstancesQuickSetup/i-0d056ac1b7c822c92","ssmParameterName":"/EC2Rescue/Passwords/i-0d056ac1b7c822c92","ssmParameterTier":"Standard","ssmParameterType":"SecureString","ssmParameterVersion":1,"useType":"Storage","xmProviderAccount":"xm-test3","xmUpdateTime":"2026-05-05T21:05:15.079Z","accountName":"xm-test3","organizationId":"o-wvjziar78j","awsTags":[{"Key":"aws:cloudformation:stack-id","Value":"arn:aws:cloudformation:us-east-1:908522078858:stack/StackSet-crowdstrike-SensorManagement-9fb10f6b-9dc3-4c3c-a078-dcec6bde4487/3493fc10-2bf9-11f0-a92a-0affd5d0d7df"},{"Key":"aws:cloudformation:stack-name","Value":"StackSet-crowdstrike-SensorManagement-9fb10f6b-9dc3-4c3c-a078-dcec6bde4487"},{"Key":"aws:cloudformation:logical-id","Value":"CrowdStrikeSensorManagementFalconCredentialsSecret"}],"secretKmsKeyId":"alias/tenant-secret-kms-local","secretDescription":"Falcon API credentials used by the 1-Click sensor management orchestrator.","tagsStr":["aws:cloudformation:stack-id: arn:aws:cloudformation:us-east-1:908522078858:stack/StackSet-crowdstrike-SensorManagement-9fb10f6b-9dc3-4c3c-a078-dcec6bde4487/3493fc10-2bf9-11f0-a92a-0affd5d0d7df","aws:cloudformation:stack-name: StackSet-crowdstrike-SensorManagement-9fb10f6b-9dc3-4c3c-a078-dcec6bde4487","aws:cloudformation:logical-id: CrowdStrikeSensorManagementFalconCredentialsSecret"],"kmsKeyAliases":["alias/aws/secretsmanager","alias/example"],"kmsKeyCreationDate":"2024-12-05T15:14:24.368Z","kmsKeyDescription":"","kmsKeyManager":"CUSTOMER","kmsKeyOrigin":"AWS_KMS","kmsKeyState":"Enabled","kmsKeyUsage":"ENCRYPT_DECRYPT"} | |||
There was a problem hiding this comment.
🟡 MEDIUM data_stream/entity_inventory/.../test-entity-inventory.log:1
Real-looking customer data in pipeline test fixture
The first event contains values that are not synthetic per the anonymize-logs conventions: real-shaped AWS account IDs (702947630755, 908522078858), a real-looking public IPv4 (remoteAddress: 199.203.99.104), a real-looking principal email inside an SSM parameter assumed-role ARN (...zur@xmcyber.com), and a real-looking CloudFormation stack ID (StackSet-crowdstrike-SensorManagement-9fb10f6b-9dc3-4c3c-a078-dcec6bde4487/3493fc10-2bf9-11f0-a92a-0affd5d0d7df). Pipeline test fixtures are committed and visible in the public repo.
Recommendation:
Replace with placeholders the anonymize-logs skill specifies — RFC 5737 IPs, example.com domains, synthetic UUIDs, and 123456789012-style AWS IDs:
{"id":"11405078888731052442","remoteAddress":"203.0.113.50","accountId":"123456789012","arn":"arn:aws:ssm:us-east-2:123456789012:parameter/EC2Rescue/Passwords/i-0d056ac1b7c822c92","ssmParameterLastModifiedUser":"arn:aws:sts::123456789012:assumed-role/AmazonSSMRoleForInstancesQuickSetup/i-0d056ac1b7c822c92","ruleDisplayName":"123456789012 / /EC2Rescue/Passwords/i-0d056ac1b7c822c92"}🤖 AI-Generated Review | Vera Review Bot
⚠️ Automated review — verify suggestions before applying.
| } | ||
| `}} | ||
|
|
||
| # Page 2 — fetched via nextLink cursor=page2 (more specific match, must precede page 1) |
There was a problem hiding this comment.
I'm not a fan of using rule order to restrict returned pages; it's brittle and not specified anywhere. We can use the query_params in this case AFAICS; if the cursor param is set to null, we can enforce an absent cursor. Then we can arbitrarily order the rules and so make them reader-friendly, in expectation order.
| ).do_request().as(refResp, | ||
| refResp.Body.decode_json().as(ro, | ||
| { | ||
| "events": [{"message": "retry"}], |
There was a problem hiding this comment.
This review is only half correct; the incorrect half is that emitting a cursor in conjunction with an empty events array will have any effect.
The correct fix is to either update the drop processor to match the retry message, or to add a filebeat drop to the agent template and remove the drop processor.
| - append: | ||
| tag: append_preserve_on_collector_error | ||
| field: tags | ||
| value: preserve_original_event | ||
| allow_duplicates: false | ||
| if: ctx.error?.message != null |
There was a problem hiding this comment.
Also add a
- set:
tag: set_pipeline_error_to_event_kind
field: event.kind
value: pipeline_error
if: ctx.error?.message != null
| "by": "_count", | ||
| "direction": "desc" | ||
| }, | ||
| "title": "Included in Attacks" |
There was a problem hiding this comment.
This is opposite to line 33. Either it should be renamed here to reflect the actual semantics (small change), or preferably the name should remain the same and the field semantics and name should be either inverted, or an additional field holding the negation should be added.
chemamartinez
left a comment
There was a problem hiding this comment.
It would be great to map as many entity fields as possible with ECS entity fields (https://www.elastic.co/docs/reference/ecs/ecs-entity).
Based on sample logs from pipeline and system tests, the fields that should be mapped are:
{host|user}.entity.lifecycle.last_activity(depending on the entity type, device or user)user.entity.attributes.mfa_enabled
Regarding relationships fields, I don't see any data that we could map but I'd check if it is possible to get relationship data from the API with any extra parameters or endpoints.
Also, if the data stream collects entity inventory, it is important to set event.kind = asset for further entity workflows. Currently its value is state which makes less sense in my opinion (https://www.elastic.co/docs/reference/ecs/ecs-allowed-values-event-kind).
💚 Build Succeeded
History
|
Proposed commit message
Checklist
changelog.ymlfile.How to test this PR locally
To test the XM Cyber package:
Related issues
Screenshots
Implementation Details
Default Config Values:
interval: 24hpage_size: 1000