Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ __pycache__
.uv_cache
.venv
config.yml
dep-api-spec.json
scripts/
46 changes: 36 additions & 10 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,31 @@ It should track the code in `main.py`, not stale assumptions from earlier iterat
- `dset`
- `full=true`
- `extended=true` is sent only when `DEP_EXTENDED_RESULTS=true`.
- `DEP_DSET` defaults to `ext`, so the connector can query alternate DEP datasets when required.
- `DEP_DATASETS` defaults to `["ext"]`.
- Supported official dataset values are:
- `ext`
- `prv`
- `nws`
- `vnd`
- `dds`
- `frm`
- The connector also accepts long dataset aliases and normalizes them to the API codes:
- `extortion` -> `ext`
- `privacy` -> `prv`
- `opennews` or `news` -> `nws`
- `vandalism` -> `vnd`
- `ddos` -> `dds`
- `forum` -> `frm`
- The DEP API accepts one `dset` value per request, so the connector loops over configured datasets and issues one request per dataset.

## State management

- The connector stores only one state key in OpenCTI worker state: `last_run`.
- The connector stores one per-dataset state map in OpenCTI worker state: `last_run_by_dataset`.
- First run window: `now - DEP_LOOKBACK_DAYS`.
- Subsequent run window: `last_run - DEP_OVERLAP_HOURS`.
- Invalid or non-string `last_run` values are ignored with a warning.
- State is persisted only after the processing loop finishes: `{"last_run": end.isoformat()}`.
- Subsequent run window per dataset: `last_run_by_dataset[dataset] - DEP_OVERLAP_HOURS`.
- Invalid per-dataset `last_run` values are ignored with a warning.
- State is persisted independently per dataset after that dataset finishes processing: `{"last_run_by_dataset": {"ext": "...", "dds": "..."}}`.
- Adding a new dataset later starts that dataset from the full lookback window because it has no existing entry in `last_run_by_dataset`.
- The overlap window is intentional and should be preserved to catch late DEP updates.

## Input parsing and normalization
Expand Down Expand Up @@ -81,6 +97,9 @@ It should track the code in `main.py`, not stale assumptions from earlier iterat
- identity_class: `organization`
- contact: `https://doubleextortion.com/`
- Every emitted object and relationship created from DEP content carries the label `DigIntLab`.
- DEP-derived objects and relationships also carry:
- `dep:dataset:<dataset code>` when the source dataset is known
- `dep:announcement-type:<lowercased enum value>` when announcement types are present
- Confidence is consistently taken from `DEP_CONFIDENCE`.
- Bundles are deduplicated by STIX ID before sending to OpenCTI.
- Prefer deterministic IDs for DEP-derived entities and relationships to keep re-imports idempotent.
Expand Down Expand Up @@ -108,8 +127,9 @@ It should track the code in `main.py`, not stale assumptions from earlier iterat
- Report custom properties (when present):
- `dep_actor`
- `dep_country`
- Report labels always include `DigIntLab`, plus one label per announcement type:
- Report labels always include `DigIntLab`, plus any applicable:
- `dep:announcement-type:<lowercased enum value>`
- `dep:dataset:<dataset code>`
- Report external reference prefers `annLink`; if absent, it falls back to `site`.
- `annTitle` is attached as the external reference description when present.
- `object_refs` contains all objects in the bundle (author identity, victim, indicators, intrusion set, country, sector, and all relationships between them).
Expand All @@ -130,8 +150,9 @@ It should track the code in `main.py`, not stale assumptions from earlier iterat
- `first_seen`
- `dep_actor` when present
- `dep_country` when present
- Incident labels always include `DigIntLab`, plus one label per announcement type:
- Incident labels always include `DigIntLab`, plus any applicable:
- `dep:announcement-type:<lowercased enum value>`
- `dep:dataset:<dataset code>`
- Incident external reference prefers `annLink`; if absent, it falls back to `site`.
- `annTitle` is attached as the external reference description when present.

Expand Down Expand Up @@ -242,7 +263,7 @@ These links are created automatically when both related objects exist. There are
- `DEP_CREATE_COUNTRY_LOCATIONS`
- Important non-boolean knobs:
- `DEP_PRIMARY_OBJECT` (default: `report`; valid values: `report`, `incident`)
- `DEP_DSET`
- `DEP_DATASETS`
- `DEP_LOOKBACK_DAYS`
- `DEP_OVERLAP_HOURS`
- `DEP_CONFIDENCE`
Expand Down Expand Up @@ -287,8 +308,13 @@ These links are created automatically when both related objects exist. There are

Use `task format check type-check` for complete local checks before considering code changes done.

There is a `task test` target, but there is currently no first-party test suite in this repository. Do not assume automated test coverage exists.
For code changes, do not stop at static checks alone; perform Docker-based runtime validation as well.
There is a first-party pytest suite in `tests/`, and `task test` runs it.
Current automated coverage focuses on:

- dataset parsing and official dataset validation against `dep-api-spec.json`
- connector runtime helpers, run-window behavior, and item-processing behavior
- STIX conversion, deterministic IDs, labels, and normalization helpers
For code changes, do not stop at static checks alone; perform Docker-based runtime validation as well.

## File map

Expand Down
66 changes: 35 additions & 31 deletions DOCKERHUB.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ An [OpenCTI](https://github.com/OpenCTI-Platform/OpenCTI) external-import connec
- Creates **Organization** identities for victims
- Optionally creates **Sector** identities, **Intrusion Sets**, and **Country** locations
- Optionally generates **Indicators** for victim domains and leak hash identifiers
- Adds announcement-type labels such as `dep:announcement-type:pii`
- Maintains connector state with an overlap window to catch late DEP updates
- Adds announcement-type and dataset labels such as `dep:announcement-type:pii` and `dep:dataset:ext`
- Maintains per-dataset connector state with an overlap window to catch late DEP updates

---

Expand Down Expand Up @@ -54,38 +54,38 @@ The connector loads configuration from `OPENCTI_CONFIG_FILE` when set, otherwise

### Required

| Environment variable | Description |
| -------------------- | ----------- |
| `OPENCTI_URL` | URL of your OpenCTI platform |
| `OPENCTI_TOKEN` | OpenCTI API token |
| `CONNECTOR_ID` | Unique connector identifier |
| `CONNECTOR_TYPE` | Connector type, typically `EXTERNAL_IMPORT` |
| `CONNECTOR_NAME` | Connector display name |
| `CONNECTOR_SCOPE` | Connector scope, typically `report,incident,identity,indicator` |
| `DEP_USERNAME` | DEP portal username |
| `DEP_PASSWORD` | DEP portal password |
| `DEP_API_KEY` | API key issued by DEP |
| `DEP_CLIENT_ID` | AWS Cognito App Client ID |
| Environment variable | Description |
| -------------------- | --------------------------------------------------------------- |
| `OPENCTI_URL` | URL of your OpenCTI platform |
| `OPENCTI_TOKEN` | OpenCTI API token |
| `CONNECTOR_ID` | Unique connector identifier |
| `CONNECTOR_TYPE` | Connector type, typically `EXTERNAL_IMPORT` |
| `CONNECTOR_NAME` | Connector display name |
| `CONNECTOR_SCOPE` | Connector scope, typically `report,incident,identity,indicator` |
| `DEP_USERNAME` | DEP portal username |
| `DEP_PASSWORD` | DEP portal password |
| `DEP_API_KEY` | API key issued by DEP |
| `DEP_CLIENT_ID` | AWS Cognito App Client ID |

### Optional

| Environment variable | Default | Description |
| -------------------- | ------- | ----------- |
| `CONNECTOR_RUN_INTERVAL` | `3600` | Polling interval in seconds |
| `DEP_CONFIDENCE` | `70` | Confidence score on generated STIX objects |
| `DEP_LOOKBACK_DAYS` | `7` | Days to look back on first run |
| `DEP_OVERLAP_HOURS` | `72` | Overlap hours from previous run to catch late updates |
| `DEP_DSET` | `ext` | DEP dataset to query |
| `DEP_PRIMARY_OBJECT` | `report` | Primary STIX object to emit: `report` or `incident` |
| `DEP_EXTENDED_RESULTS` | `true` | Request extended DEP results |
| `DEP_ENABLE_SITE_INDICATOR` | `true` | Create a domain indicator per victim |
| `DEP_ENABLE_HASH_INDICATOR` | `true` | Create a hash indicator when a hash is provided |
| `DEP_SKIP_EMPTY_VICTIM` | `true` | Skip items where victim name is empty, `n/a`, or `none` |
| `DEP_CREATE_SECTOR_IDENTITIES` | `true` | Create sector identities and link victims with `part-of` |
| `DEP_CREATE_INTRUSION_SETS` | `true` | Create intrusion sets from DEP actor values |
| `DEP_CREATE_COUNTRY_LOCATIONS` | `true` | Create country locations and link victims with `located-at` |
| `DEP_LOGIN_ENDPOINT` | `https://cognito-idp.eu-west-1.amazonaws.com/` | Cognito login endpoint |
| `DEP_API_ENDPOINT` | `https://api.eu-ep1.doubleextortion.com/v1/dbtr/privlist` | DEP REST endpoint |
| Environment variable | Default | Description |
| ------------------------------ | --------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `CONNECTOR_RUN_INTERVAL` | `3600` | Polling interval in seconds |
| `DEP_CONFIDENCE` | `70` | Confidence score on generated STIX objects |
| `DEP_LOOKBACK_DAYS` | `7` | Days to look back on first run |
| `DEP_OVERLAP_HOURS` | `72` | Overlap hours from previous run to catch late updates |
| `DEP_DATASETS` | `ext` | DEP datasets to query. Accepts comma-separated short API codes (`ext`, `prv`, `nws`, `vnd`, `dds`, `frm`) or long aliases such as `extortion`, `privacy`, `opennews`/`news`, `vandalism`, `ddos`, and `forum`. |
| `DEP_PRIMARY_OBJECT` | `report` | Primary STIX object to emit: `report` or `incident` |
| `DEP_EXTENDED_RESULTS` | `true` | Request extended DEP results |
| `DEP_ENABLE_SITE_INDICATOR` | `true` | Create a domain indicator per victim |
| `DEP_ENABLE_HASH_INDICATOR` | `true` | Create a hash indicator when a hash is provided |
| `DEP_SKIP_EMPTY_VICTIM` | `true` | Skip items where victim name is empty, `n/a`, or `none` |
| `DEP_CREATE_SECTOR_IDENTITIES` | `true` | Create sector identities and link victims with `part-of` |
| `DEP_CREATE_INTRUSION_SETS` | `true` | Create intrusion sets from DEP actor values |
| `DEP_CREATE_COUNTRY_LOCATIONS` | `true` | Create country locations and link victims with `located-at` |
| `DEP_LOGIN_ENDPOINT` | `https://cognito-idp.eu-west-1.amazonaws.com/` | Cognito login endpoint |
| `DEP_API_ENDPOINT` | `https://api.eu-ep1.doubleextortion.com/v1/dbtr/privlist` | DEP REST endpoint |

---

Expand All @@ -108,6 +108,10 @@ dep-connector:
- DEP_CLIENT_ID=${DEP_CLIENT_ID}
```

When multiple datasets are configured, the connector loops over them and issues one DEP API request per dataset. Dataset aliases are normalized to the short API codes before the request is sent, for example `ddos -> dds` and `vandalism -> vnd`.

State is tracked per dataset, so adding a new dataset later starts that dataset from the normal lookback window instead of inheriting the already-advanced state of the previously configured datasets.

---

## Links
Expand Down
Loading
Loading