diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..a10eb35 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,259 @@ +# AGENTS.md + +## Purpose + +This file captures the project-specific behavior and working rules for humans and coding agents in this repository. +It should track the code in `main.py`, not stale assumptions from earlier iterations. + +## Project scope + +- This is an OpenCTI external-import connector for Double Extortion Platform (DEP) announcements. +- The connector authenticates against DEP AWS Cognito, fetches announcement records from the DEP REST API, converts them to STIX 2.1, and sends bundles to OpenCTI with `update=True`. +- The connector scope is `incident,identity,indicator`. +- The implementation is concentrated in a single runtime file: `main.py`. + +## Runtime and configuration truths + +- Config is loaded from `OPENCTI_CONFIG_FILE` if set; otherwise from `config.yml` next to `main.py`. +- Environment variables override YAML values through `pycti.get_config_variable`. +- `DEP_CLIENT_ID` is required at startup even though `config.yml.sample` leaves it blank. Missing it raises `ValueError`. +- The runtime loop is infinite: `run()` executes one cycle, then sleeps for `CONNECTOR_RUN_INTERVAL`. +- Local Docker Compose mounts `./config.yml` into `/app/config.yml` for the `dep-connector` service. +- The local stack pins OpenCTI services to `6.8.13`; the connector manifest declares support for OpenCTI `>= 6.8.13`. +- The container image runs `python main.py` as the non-root `app` user on Python 3.12. + +## DEP fetch behavior + +- Authentication uses AWS Cognito `InitiateAuth` with `USER_PASSWORD_AUTH`. +- The connector expects `AuthenticationResult.IdToken` from the login response and uses it as the DEP API `Authorization` header. +- DEP fetches always send: + - `ts` + - `te` + - `dset` + - `full=true` +- `extended=true` is sent only when `DEP_EXTENDED_RESULTS=true`. +- `DEP_DSET` defaults to `ext`, so the connector can query alternate DEP datasets when required. + +## State management + +- The connector stores only one state key in OpenCTI worker state: `last_run`. +- First run window: `now - DEP_LOOKBACK_DAYS`. +- Subsequent run window: `last_run - DEP_OVERLAP_HOURS`. +- Invalid or non-string `last_run` values are ignored with a warning. +- State is persisted only after the processing loop finishes: `{"last_run": end.isoformat()}`. +- The overlap window is intentional and should be preserved to catch late DEP updates. + +## Input parsing and normalization + +- DEP records are parsed through a frozen Pydantic dataclass: `LeakRecord(extra="allow")`. +- Unknown DEP fields are tolerated and ignored unless explicitly mapped. +- `annLink` is repaired for a known scrape bug: + - `https//...` -> `https://...` + - `http//...` -> `http://...` +- `site` and `victimDomain` are stripped; empty strings become `None`. +- `sector`, `actor`, and `country` are whitespace-normalized; empty strings, `n/a`, and `none` become `None`. +- Indicator domain extraction prefers `victimDomain`, then falls back to `site`. +- Domain normalization uses `urlsplit`, extracts the hostname, and lowercases it. +- `annDescription` is URL-decoded with `urllib.parse.unquote` before the incident is created. + +## Filtering rules + +- Whole DEP items are skipped only when `DEP_SKIP_EMPTY_VICTIM=true` and `victim` is empty, `n/a`, or `none`. +- Invalid DEP payload entries are skipped with warnings; they should not abort the whole fetch cycle. +- Low-quality actor values are filtered from intrusion-set creation: + - `unknown` + - `unk` + - `anonymous` + - `unattributed` + - `undisclosed` + - `not disclosed` + - `not-disclosed` + - `ransomware group` + - `ransomware gang` + - `threat actor` + - `attacker` + +## STIX authoring conventions + +- Every emitted object and relationship is authored by the same identity: + - name: `DigIntLab` + - type: `Identity` + - identity_class: `organization` + - contact: `https://doubleextortion.com/` +- Every emitted object and relationship carries the label `DigIntLab`. +- Confidence is consistently taken from `DEP_CONFIDENCE`. +- Bundles are deduplicated by STIX ID before sending to OpenCTI. +- Prefer deterministic IDs for DEP-derived entities and relationships to keep re-imports idempotent. + +## Data model mappings + +### Incident + +- One incident is created per DEP announcement. +- The incident is always created, even when no victim identity is created. +- Deterministic incident ID is based on normalized DEP `hashid`: + - `incident--uuid5(NAMESPACE_URL, "dep-announcement:")` +- Incident name format: + - `DEP announcement - ` + - fallback to `victimDomain` + - fallback to `Unknown Victim` +- `created` is derived from the DEP `date` at `00:00:00Z`. +- Incident custom properties: + - `incident_type: cybercrime` + - `first_seen` + - `dep_actor` when present + - `dep_country` when present +- Incident labels always include `DigIntLab`, plus one label per announcement type: + - `dep:announcement-type:` +- Incident external reference prefers `annLink`; if absent, it falls back to `site`. +- `annTitle` is attached as the external reference description when present. + +### Victim + +- Victim is modeled as `Identity` with `identity_class="organization"`. +- No victim identity is created when `victim` is missing. +- Deterministic victim ID uses `pycti.Identity.generate_id(victim_name, identity_class="organization")`. +- Victim external references may include: + - DEP announcement URL with source `dep` + - victim site URL with source `victim-site` +- Victim description is only used for fallback enrichment: + - `Industry sector: ` when a sector exists but sector identities are disabled + - `Reported revenue: ` when revenue is present + +### Sector + +- Sector is modeled as `Identity` with `identity_class="class"`. +- Sector is created only when: + - `DEP_CREATE_SECTOR_IDENTITIES=true` + - sector is present + - victim identity exists +- Sector IDs are deterministic and based on the lowercased sector value. + +### Actor + +- DEP `actor` is modeled as `IntrusionSet`, not `ThreatActor`. +- Rationale: DEP actor values are usually operational labels, not strong real-world identity claims. +- Intrusion sets are created only when: + - `DEP_CREATE_INTRUSION_SETS=true` + - actor is present + - actor is not in the low-quality filter list +- Deterministic intrusion-set ID: + - `intrusion-set--uuid5(NAMESPACE_URL, "dep-actor:")` + +### Country + +- Country is modeled as `Location`. +- Country locations are created only when: + - `DEP_CREATE_COUNTRY_LOCATIONS=true` + - country is present + - victim identity exists +- Deterministic country location ID: + - `location--uuid5(NAMESPACE_URL, "dep-country:")` +- Always set both: + - `name=` + - `country=` +- Preserve the OpenCTI-specific custom property: + - `x_opencti_location_type: Country` + +### Indicators + +- Indicator creation is optional and controlled by: + - `DEP_ENABLE_SITE_INDICATOR` + - `DEP_ENABLE_HASH_INDICATOR` +- Site/domain indicator: + - created from normalized `victimDomain` or `site` + - pattern: `[domain-name:value = '']` +- Hash indicator: + - created from normalized `hashid` + - supported hash lengths: + - `32` -> `MD5` + - `40` -> `SHA-1` + - `64` -> `SHA-256` + - pattern: `[file:hashes.'' = '']` +- Indicator IDs are deterministic because they are generated from the STIX pattern. +- Indicator `valid_from` uses current UTC processing time, so timestamps are not deterministic even though IDs are. +- Indicators are linked to incidents, not to victims. + +## Relationships emitted + +- `incident -> victim` with `targets` +- `victim -> sector` with `part-of` +- `incident -> intrusion-set` with `attributed-to` +- `victim -> country` with `located-at` +- `intrusion-set -> sector` with `targets` +- `intrusion-set -> country` with `targets` +- `sector -> country` with `related-to` +- `indicator -> incident` with `indicates` + +These links are created automatically when both related objects exist. There are no extra compatibility flags for the cross-entity links. + +## Feature flags and important knobs + +- Boolean feature flags: + - `DEP_EXTENDED_RESULTS` + - `DEP_ENABLE_SITE_INDICATOR` + - `DEP_ENABLE_HASH_INDICATOR` + - `DEP_SKIP_EMPTY_VICTIM` + - `DEP_CREATE_SECTOR_IDENTITIES` + - `DEP_CREATE_INTRUSION_SETS` + - `DEP_CREATE_COUNTRY_LOCATIONS` +- Important non-boolean knobs: + - `DEP_DSET` + - `DEP_LOOKBACK_DAYS` + - `DEP_OVERLAP_HOURS` + - `DEP_CONFIDENCE` + - `DEP_LOGIN_ENDPOINT` + - `DEP_API_ENDPOINT` + - `DEP_API_KEY` + - `DEP_USERNAME` + - `DEP_PASSWORD` + - `DEP_CLIENT_ID` + +## Coding conventions for this repo + +- Keep IDs deterministic for DEP-derived entities. +- Preserve the current object model unless the user explicitly asks for a schema change. +- Prefer normalization helpers and central filters over ad-hoc string cleanup. +- Keep optional enrichment behind the existing feature flags. +- Do not reintroduce removed compatibility flags for cross-entity relationships. +- If you change modeling, update `README.md`, `config.yml.sample`, and `AGENTS.md` together. +- If you touch incident or indicator generation, verify idempotency assumptions still hold under `update=True`. + +## Validation and local workflow + +- When developing or changing code, testing is required before considering the work complete. +- In this repository, Docker is required for meaningful runtime validation. +- Install dependencies: + - `task install` +- Format code: + - `task format` +- Check formatting only: + - `task format-check` +- Run lint: + - `task lint` +- Run type checks: + - `task type-check` +- Main quality gate: + - `task check` +- Additional syntax check: + - `python -m compileall main.py` +- Docker-based runtime validation can be satisfied by either: + - building and running the connector image directly + - using `docker compose up` with the local stack when broader integration checks are needed +- Never start the connector before the OpenCTI API/platform is ready and reachable. +- During Docker-based validation, wait for OpenCTI readiness first, then start the connector. + +`task check` is the canonical combined gate from `Taskfile.yml` because it runs format check, lint, and mypy. + +There is a `task test` target, but there is currently no first-party test suite in this repository. Do not assume automated test coverage exists. +For code changes, do not stop at static checks alone; perform Docker-based runtime validation as well. + +## File map + +- Connector runtime and STIX mapping: `main.py` +- Sample connector config: `config.yml.sample` +- Local development stack: `docker-compose.yml` +- Runtime image definition: `Dockerfile` +- User-facing docs: `README.md` +- Marketplace metadata: `__metadata__/connector_manifest.json` +- Task automation: `Taskfile.yml` diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..f02324a --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +see @AGENTS.md diff --git a/README.md b/README.md index 765f73e..e099908 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ The Double Extortion connector ingests ransomware and data-leak announcements pu - Creates **Organization** identities for victims. - Optionally materializes **Intrusion Sets** from DEP actor names. - Optionally materializes **Country** locations and links victims to them. -- Automatically links intrusion sets to sectors and sectors to countries when those entities are created. +- Automatically links intrusion sets to sectors, intrusion sets to countries, and sectors to countries when those entities are created. - Generates optional **Indicators** for advertised victim domains and leak hash identifiers. - Adds announcement-type labels to incidents (for example `dep:announcement-type:pii`). - Supports querying different Double Extortion Platform datasets via `DEP_DSET`. @@ -73,7 +73,7 @@ DEP `actor` values are modeled as STIX `IntrusionSet` objects instead of `Threat - DEP actor strings usually represent campaign/operator labels, not high-confidence real-world identities. - `IntrusionSet` is a safer semantic fit for recurring malicious activity clusters. - This avoids over-claiming attribution when source data quality is limited. -- It supports incident and targeting analysis directly through `attributed-to` (incident -> intrusion set) and `targets` (intrusion set -> sector). +- It supports incident and targeting analysis directly through `attributed-to` (incident -> intrusion set) and `targets` links from intrusion sets to sectors and countries. A `ThreatActor` model can be adopted later if the feed includes stronger attribution context (persona, role, motivation, sophistication). @@ -107,7 +107,7 @@ docker run --rm \ - The API occasionally URL-encodes announcement descriptions. The connector automatically decodes the description before sending it to OpenCTI. - DEP actor and country values can be materialized as entities using `DEP_CREATE_INTRUSION_SETS` and `DEP_CREATE_COUNTRY_LOCATIONS`. - DEP actor and country values are also stored in incident custom properties (`dep_actor`, `dep_country`) for source traceability. -- Cross-entity links are automatic: intrusion set -> sector (`targets`) and sector -> country (`related-to`) when both entities are present. +- Cross-entity links are automatic: intrusion set -> sector (`targets`), intrusion set -> country (`targets`), and sector -> country (`related-to`) when both entities are present. - Generic low-quality actor values (for example `unknown`, `anonymous`, `ransomware group`) are ignored for intrusion-set creation. - To reload the connector code in the platform, run: `docker compose build dep-connector; docker compose up -d dep-connector; docker compose logs -f dep-connector` diff --git a/docker-compose.yml b/docker-compose.yml index e2debde..d55b7df 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -274,7 +274,8 @@ services: context: . dockerfile: Dockerfile depends_on: - - opencti + opencti: + condition: service_healthy volumes: - ./config.yml:/app/config.yml:ro diff --git a/main.py b/main.py index 0852d16..9d567c8 100644 --- a/main.py +++ b/main.py @@ -589,6 +589,12 @@ def _build_optional_entities( "targets", intrusion_set.id, sector_identity.id ) ) + if intrusion_set and country_location: + objects.append( + self._build_relationship( + "targets", intrusion_set.id, country_location.id + ) + ) if sector_identity and country_location: objects.append( self._build_relationship(