Architecture

Janus follows this pipeline: ingest source data, normalize it into a shared event stream, run analyzers, and write a versioned analysis bundle.

System Shape

The live entrypoint is janus.py. In the normal operator workflow, janus-cli prepares Docker mounts and invokes the Python CLI inside the container; the Python runtime is not usually executed directly on the host.

The main boundaries are:

./Config mounted read-only as /config
./out mounted read/write as /data/out
CLI path inputs for --events, merge, and multi-analyze must resolve under out/
TLS verification enabled by default; verify_tls: false or --insecure are explicit escape hatches

Docker wrapper networking

janus-cli does not run Python on the host by default: it builds a standard docker run with bind mounts (./out → /data/out, ./Config → /config) and passes the Janus subcommand (run, pull, analyze, and so on) to the image entrypoint.

Loopback in config is container loopback. Any URL in Config/janus.yml that uses 127.0.0.1, localhost, or ::1 resolves inside the Janus container. If the real service binds on the host (typical for lab teamservers, local Ghostwriter, or Mythic), connections fail with connection refused unless you change networking or the URL.

Controls (applied to docker run before volume mounts):

Mechanism	Purpose
`janus-cli --docker-network <mode>`	Global (before subcommand) or per-command flag; maps to `docker run --network`.
`janus-cli --docker-add-host <host:ip>`	Appends `--add-host` (e.g. `host.docker.internal:host-gateway` on Linux bridge).
`docker.network_mode` / `docker.run_extra` in `janus.yml`	Persistent operator settings for “weird enclaves”; see Config/janus.example.yml.
`JANUS_DOCKER_RUN_EXTRA`	Space-separated extra `docker run` tokens; lowest precedence for `--network` versus CLI/config.

Precedence for --network: CLI --docker-network beats docker.network_mode, which replaces any --network coming from JANUS_DOCKER_RUN_EXTRA when network_mode is set.

Platforms: On Linux Engine, network_mode: host makes container loopback match the host (useful with unchanged https://127.0.0.1:... endpoints). On Docker Desktop for macOS/Windows, prefer host.docker.internal (or similar) in the API URL instead of relying on host networking.

The Cobalt Strike REST client may print a short stderr hint when a connection error targets loopback and the process appears to run inside a container. Operator playbook and caveats: FAQ — Cobalt Strike REST and janus-cli + Docker.

Pipeline

Janus has two execution modes:

Ingest mode: pull or load source telemetry, normalize it, and write a run directory
Analysis mode: read an existing normalized dataset and produce analyzer JSON plus optional HTML

The pipeline is:

Ingest source telemetry
Normalize source-specific records into task and result events
Persist events.ndjson plus bundle.json
Run analyzers over the normalized events
Generate a self-contained HTML report from analyzer output

This separation matters operationally:

Parsers own extraction and source quirks
The event stream is the analysis contract
Analyzers are mostly source-agnostic and rely on the normalized stream plus optional behavior-registry hints
merge and multi-analyze operate on persisted normalized datasets, not on raw source exports

Execution Responsibilities

Command	What `janus-cli` does	What `janus` does
`pull`	Resolve source, config, and target ID; invoke the matching pull workflow	Fetch telemetry and write the initial run directory
`pull --source cobaltstrike`	Resolve Cobalt Strike REST endpoint and auth from flags/config; invoke the Cobalt Strike REST ingest path in the container	Log in to the teamserver REST API, list/fetch tasks, normalize into `out/complete`
`run --source cobaltstrike`	Reuse the Cobalt Strike REST ingest path, then run analyzers and HTML generation	End-to-end Cobalt Strike workflow with the same pull ergonomics as other sources
`analyze`	Resolve the latest run or user-specified events file	Run one analyzer or the full set against normalized events
`report`	Resolve the latest analysis directory	Generate HTML from analyzer output
`run`	Chain source pull, analyze, and report	Execute the full pipeline
`merge` / `multi-analyze`	Expand input paths or patterns	Merge normalized runs, then optionally run the multi-op analyzer set and report
`status` / `config` / `version`	Inspect local state	n/a

Source Coverage

Source family	Current normalized `source` values	What it provides	Janus strength
Mythic	`mythic`, `mythic-partial`	Commands, responses, callback metadata, lifecycle state	Best fidelity for failure, retry, duration, and callback-health analysis
Ghostwriter	`ghostwriter`	Oplog chronology, command text, output, project/reporting context	Strong for workflow and timing analysis; weaker for failure-centric analysis
Cobalt Strike REST	`cobaltstrike-rest`	Teamserver REST API tasks (`/api/v1/tasks`, task detail, auth) — see Cobalt Strike REST API	Supported Cobalt Strike path: task + output in one API, suitable for automation without SSH file surgery

Cobalt Strike automation note: The teamserver REST API is the practical way to pull structured tasks and beacon output for Janus. Alternatives such as SSH access to the teamserver host and copying fragmented on-disk artifacts are operationally heavier and do not match Janus’s normalized task/result model as directly.

Janus normalizes every source to the same two event types, but not every source can populate the same fields with the same fidelity.

Persisted Contract

The real cross-component contract is the on-disk bundle:

events.ndjson: normalized events, sorted by timestamp
bundle.json: run metadata and provenance
analyzer JSON files: per-analyzer output
report.html: optional self-contained report

write_ndjson() validates each event before writing. write_bundle() adds run metadata such as:

analysis_version
analysis_timestamp
janus_version

That means the persisted contract is slightly stricter than the in-memory dataclasses, but narrower than the old documentation implied.

Event Model

The event model should be read in three layers:

Schema-required: fields enforced before NDJSON is written
Common parser-populated: fields emitted by current first-party parsers, but not hard-validated
Source-specific optional: enrichments only some parsers can provide

Many analyzers join events by (operation_id, task_id), not by task_id alone.

Task Event

Represents operator intent.

Schema-required

Field	Meaning
`task_id`	Task identifier within the operation
`command_name`	Normalized command name
`timestamp`	ISO 8601 submit time

Common parser-populated

These are emitted by current first-party parsers and should be treated as the practical shared shape, even though validate_events() does not currently enforce them.

Field	Typical status	Meaning
`event_type`	always present	Always `task`
`source`	always present	Parser/source identifier such as `mythic`, `ghostwriter`, `mythic-partial`, or `cobaltstrike-rest`
`operation_id`	always present today	Operation or project identifier; may be remapped during merge
`callback_id`	always present today	Callback/session identifier; may be synthetic or `0` when unavailable
`callback_display_id`	usually present	Human-facing callback label; may be copied from `callback_id` or defaulted
`display_id`	parser-dependent	Human-facing task label; Mythic-native in Mythic, synthetic/defaulted elsewhere
`tool_name`	always present today	Tool or source-side module name
`arguments_raw`	always present today	Raw argument payload, possibly empty

Source-specific optional enrichments

Field	Emitted by	Meaning
`processing_timestamp`	Mythic, synthesized in partial Mythic	When the agent picked up the task
`callback_sleep_info`	Mythic, Cobalt Strike REST	Callback sleep interval used for duration heuristics
`issued_command_name`	Mythic	Actual executed command when attribution rewrites `command_name`
`parent_task_id`	Mythic	Parent task for subtask lineage
`orphaned_subtask`	Mythic	True when `parent_task_id` could not be resolved
`c2_task_id`	Ghostwriter, Cobalt Strike REST	Source-side cross-link identifier today; CS REST stores the string `taskId` here for traceability

Important caveats:

Ghostwriter does not populate display_id, processing_timestamp, callback_sleep_info, issued_command_name, parent_task_id, or orphaned_subtask
Partial Mythic synthesizes some timing fields to preserve analyzer compatibility, so those values are less trustworthy than full Mythic pull data
Cobalt Strike REST hashes string taskId to an int task_id, sets callback_id from bid, and may populate callback_sleep_info from beacon metadata when the REST API exposes sleep settings

Result Event

Represents tool output or task outcome.

Schema-required

Field	Meaning
`task_id`	Task the result belongs to
`status`	`success`, `error`, or `unknown`
`timestamp`	Completion, last-response, or fallback time
`output_text`	Concatenated output text; may be empty

Common parser-populated

Field	Typical status	Meaning
`event_type`	always present	Always `result`
`source`	always present	Parser/source identifier
`operation_id`	always present today	Operation or project identifier

Source-specific optional enrichments

Field	Emitted by	Meaning
`dispatch_failed`	Mythic, partial Mythic	Task failed before reaching the agent
`terminal_inferred_error`	Mythic	Janus promoted a terminal `unknown` to `error`

Important caveats:

Ghostwriter currently emits status: unknown for all results because the source does not expose a reliable success/error signal
Cobalt Strike REST maps API taskStatus and error/result payloads to success / error / unknown; operator and acknowledgement text are merged into output_text
output_text is required by validation, but may be intentionally empty after output_rule=errors_only

Retention controls and NDJSON content

Janus applies two independent retention controls after normalization and before writing events.ndjson. Both are set in Config/janus.yml (top-level keys) or overridden with CLI flags. CLI always takes precedence over config; default for both is all.

`output_rule` — result output retention

Value	Behavior
`all`	Keep all `output_text` verbatim (default)
`errors_only`	Clear `output_text` on `success` results; `error` and `unknown` output kept
`none`	Clear `output_text` on all results regardless of status

When output is cleared, the affected event retains an output_retained field recording the policy and derived features (output_present, output_length, output_line_count) so downstream consumers can distinguish missing output from genuinely empty output. This marker is written even if the original output_text was already empty.

Analyzers that rely on successful output, especially av-tracker on ps, need output_rule: all.

`arguments_rule` — task argument retention

Value	Behavior
`all`	Keep `arguments_raw` verbatim (default)
`drop`	Clear `arguments_raw`; only `command_name` and metadata are retained
`hash`	Replace `arguments_raw` with a SHA-256 digest (`arguments_digest`) for correlation without content recovery
`features_only`	Replace `arguments_raw` with derived features: `arguments_length`, `arguments_token_count`, `arguments_shape`, and `arguments_entropy`

When arguments are filtered, the affected event retains an arguments_retained field recording the applied policy. drop and hash also preserve arguments_length; features_only preserves derived fields such as arguments_present, arguments_length, arguments_shape, and arguments_entropy. These markers are written even if the original arguments_raw was already empty. Analyzers that depend on raw arguments (parameter-entropy, argument-position-profile, tool-dump, command-retry-success) produce reduced or empty output under non-all policies.

Common effects

bundle.json records the resolved output_rule and arguments_rule as canonical rule IDs
merged datasets may record output_rule: mixed and/or arguments_rule: mixed plus observed_*_rules arrays when inputs used different policies
Ghostwriter still writes a full raw_export.json; retention filtering only applies to normalized NDJSON
merge and multi-analyze do not re-filter existing NDJSON; apply the desired policy before merging

Source-Specific Normalization Notes

Mythic

Mythic has the richest normalization path:

prefers submitted timestamps for task time and processed timestamps for result time
rewrites some commands for better attribution
preserves subtask lineage when possible
can mark dispatch failures separately from agent-side execution failures
can promote terminal unknown results to inferred error

Dispatcher parent tasks are skipped when the real execution is represented by a child task.

Ghostwriter

Ghostwriter preserves chronology well but has weaker execution semantics:

command parsing is mostly text splitting
callback IDs are parsed from entry description text when present
result status is conservatively unknown
source-side entry IDs are currently stored in c2_task_id

Cobalt Strike REST

Live ingest uses the teamserver REST server (authenticate with POST /api/auth/login, then GET /api/v1/tasks and per-task GET /api/v1/tasks/{taskId}). Janus maps each task to one TaskEvent and one merged ResultEvent (acknowledgements, result chunks, and errors in output_text). String task IDs are normalized to integer task_id for analyzer joins. Use ./janus-cli pull --source cobaltstrike or ./janus-cli run --source cobaltstrike with config keys under cobaltstrike: such as rest_endpoint, username, password, optional api_token, and duration_ms.

When using janus-cli, ensure rest_endpoint is reachable from inside the Janus container (routable IP/DNS, host networking, or host.docker.internal). A host-only https://127.0.0.1:50443 works on the host but often fails in the default bridge network; see Docker wrapper networking and the FAQ.

Analyzer Registries

Two registry layers shape analysis behavior:

Core/analyzer_registry.py: analyzer names, output filenames, and run sets
Core/analyzer_behavior_registry.py: source-aware heuristics consumed by selected analyzers

The behavior registry is advisory. It exists to keep source-specific semantics from leaking into every analyzer implementation.

Multi-Operation Analysis

merge and multi-analyze combine normalized runs into one dataset. multi-analyze then runs the registry-defined multi-operation analyzer set.

Important merge behavior:

each source run keeps its own operation_id namespace
if an input run has a missing, invalid, or duplicate operation_id, Janus remaps it during merge
remap details are recorded in merged bundle.json
analyzers then join on the merged (operation_id, task_id) pairs

This design lets Janus compare patterns across engagements without changing the single-operation event model.

Fields That Likely Need To Move

The field most likely to move or split is c2_task_id.

Today it has incompatible meanings:

Ghostwriter uses it as a source-side task or entry identifier
Cobalt Strike REST uses it for the opaque string taskId

That makes it a poor shared-model field. The cleaner long-term direction is one of:

split it into a dedicated shared field with a single meaning, such as source_task_ref
move source-specific extras into a nested metadata object

By contrast, the other optional enrichments are reasonable to keep in the shared model because analyzers or reporting consume them:

processing_timestamp
callback_sleep_info
issued_command_name
parent_task_id
orphaned_subtask
dispatch_failed
terminal_inferred_error

Privacy

Janus does not use LLMs for analysis, summarization, or report generation. All analysis runs locally from the telemetry you ingest.

Janus does not send normalized operation data to external AI or SaaS analysis services. Network access is limited to the source systems you explicitly configure for data collection (Mythic, Ghostwriter, or Cobalt Strike REST endpoints).

What Janus Stores Today

Every Janus run can produce the following artifacts under out/:

Artifact	Contains	Affected by retention controls
`events.ndjson`	Normalized task and result events with timestamps, command names, arguments, output text, and identifiers	Yes — `output_rule` and `arguments_rule` filter sensitive fields before write
`bundle.json`	Run metadata: source, operation identifiers, counts, resolved retention settings, Janus version	No — metadata only; no raw telemetry
Analyzer JSON files	Per-analyzer output: aggregated statistics, findings, and detail rows that may include `arguments_raw` or `output_text` excerpts	Indirectly — analyzers receive post-policy events, so their output reflects the retention state
`report.html`	Self-contained HTML report rendering analyzer output	Indirectly — the report displays a retention banner and adjusts formatting when content is filtered
Ghostwriter `raw_export.json`	Full Ghostwriter oplog export as received from the API	No — retention controls do not filter `raw_export.json`; it is the unmodified source snapshot

Privacy Boundaries

Sensitive fields by nature:

arguments_raw on task events — may contain file paths, hostnames, credentials, shellcode, Kerberos tickets, or other target-specific payloads
output_text on result events — may contain process listings, directory contents, command output, and other data from the target environment
Source-linked identifiers (operation_id, callback_id, task_id) — operational context that is useful for analysis but ties events to specific engagements

What Janus does locally:

Connects to configured source systems over HTTPS (TLS verified by default) to pull telemetry
Normalizes source-specific records into a shared event model
Applies retention policy (filters arguments_raw and output_text before persistence)
Runs analyzers against the normalized events
Generates a self-contained HTML report
Writes all artifacts to the local out/ directory

What Janus does not do:

Send normalized or raw data to any external service, cloud endpoint, or LLM provider
Phone home, report usage, or check for updates
Persist credentials to disk (API tokens and passwords are read from config or environment at runtime)

Retention Controls

Janus provides two independent, operator-controlled retention policies applied after normalization and before events.ndjson is written. See Retention controls and NDJSON content for the full policy table.

Resolution precedence: CLI flag > config file > default (all).

Where the policy is recorded: bundle.json is the authoritative privacy record for a Janus run. It includes both output_rule and arguments_rule as resolved canonical strings. For merged datasets, these fields may be mixed, with observed_output_rules and observed_arguments_rules listing the exact policies present. Downstream consumers and the HTML report read this metadata first to understand the retention state.

When filtered content is detected in events: Task and result events include arguments_retained or output_retained fields when a non-default policy affected that event. These fields let analyzers distinguish "empty because the operator sent no arguments" from "empty because retention policy removed the content." Derived features (arguments_present, arguments_length, arguments_shape, arguments_entropy, output_present, output_length, output_line_count) and digests (arguments_digest) are persisted alongside the retention marker depending on the policy. For output_rule: errors_only, Janus also stamps result events with output_retained: errors_only even when the row keeps its visible error text, so downstream tools can still detect the active policy from events.ndjson alone.

Artifact-Specific Caveats

Ghostwriter raw_export.json is not filtered by any retention policy. It is a verbatim snapshot of the API response. If the raw export contains sensitive data, the operator is responsible for managing or deleting it.
merge and multi-analyze do not re-filter existing NDJSON. Apply the desired retention policy during the original ingest to ensure merged datasets inherit the correct filtering. If inputs were created under different policies, Janus marks the merged bundle as mixed rather than pretending one policy applied everywhere.
Analyzer JSON output may embed arguments_raw or output_text values in detail rows, findings, or examples when running under output_rule: all / arguments_rule: all. Under stricter policies, those fields are already empty or replaced in the source events, so analyzer output inherits the same reduction.
report.html displays a retention policy banner in the report header when non-default or mixed policies are active. Table cells that would normally show raw arguments instead display contextual placeholders (e.g., "redacted", hash prefix, or shape summary).

Analyzer Compatibility

Not all analyzers produce meaningful output under every retention policy. The table below summarizes the dependency and degradation behavior:

Analyzer	Depends on `arguments_raw`	Depends on `output_text`	Behavior under restricted policy
`summary-visualization`	No	No	Full output
`command-failure-summary`	Detail rows only	Error messages	Core metrics intact; detail rows show empty args/output
`command-retry-success`	Yes (argument diff)	No	Cannot detect argument tuning between retries; sequence detection still works
`command-duration`	Detail rows only	No	Core metrics intact; detail rows show empty args
`outlier-context`	No	No	Full output
`callback-health`	Detail rows only	No	Core metrics intact; detail rows show empty args
`av-tracker`	No	Yes (successful `ps` output)	Cannot detect AV/EDR executables without `output_rule: all`
`dwell-time`	Detail rows only	No	Core metrics intact; context rows show empty args
`parameter-entropy`	Yes (full analysis)	No	Produces empty or severely limited findings
`argument-position-profile`	Yes (full analysis)	No	Produces empty or severely limited findings
`tool-dump`	Yes (matching + dumps)	No	Match accuracy degraded; dump content empty

When analyzers detect that events were persisted under a restrictive or mixed retention policy, they include privacy_warnings in their metadata section describing the specific limitation. For single-operation analysis, Janus derives that state from bundle.json; event-level markers remain as provenance inside events.ndjson.

Operator Responsibilities

Janus provides the retention controls; the operator is responsible for:

Choosing the right policy for the engagement's data handling requirements before running ingest
Managing artifact lifecycle — Janus does not enforce time-based retention, automatic deletion, or encryption at rest
Handling Ghostwriter raw_export.json — this file is outside the retention policy scope
Protecting Config/janus.yml — this file may contain API tokens and endpoint URLs; it is mounted read-only in the container but lives on the host filesystem
Reviewing bundle.json — this file records the exact retention policy that was applied and can be used for compliance or audit purposes

Output Summary

Janus writes deterministic, versioned analysis artifacts so runs can be replayed, diffed, merged, and consumed by downstream tooling. The architecture is intentionally biased toward:

source-specific parsing at the edge
a narrow normalized event contract in the middle
source-aware but mostly source-agnostic analyzers on top

The main current architectural debt is not the parser boundary. It is the gap between documented field requirements and enforced schema, plus the overloaded meaning of c2_task_id.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

System Shape

Docker wrapper networking

Pipeline

Execution Responsibilities

Source Coverage

Persisted Contract

Event Model

Task Event

Schema-required

Common parser-populated

Source-specific optional enrichments

Result Event

Schema-required

Common parser-populated

Source-specific optional enrichments

Retention controls and NDJSON content

`output_rule` — result output retention

`arguments_rule` — task argument retention

Common effects

Source-Specific Normalization Notes

Mythic

Ghostwriter

Cobalt Strike REST

Analyzer Registries

Multi-Operation Analysis

Fields That Likely Need To Move

Privacy

What Janus Stores Today

Privacy Boundaries

Retention Controls

Artifact-Specific Caveats

Analyzer Compatibility

Operator Responsibilities

Output Summary

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Architecture

System Shape

Docker wrapper networking

Pipeline

Execution Responsibilities

Source Coverage

Persisted Contract

Event Model

Task Event

Schema-required

Common parser-populated

Source-specific optional enrichments

Result Event

Schema-required

Common parser-populated

Source-specific optional enrichments

Retention controls and NDJSON content

output_rule — result output retention

arguments_rule — task argument retention

Common effects

Source-Specific Normalization Notes

Mythic

Ghostwriter

Cobalt Strike REST

Analyzer Registries

Multi-Operation Analysis

Fields That Likely Need To Move

Privacy

What Janus Stores Today

Privacy Boundaries

Retention Controls

Artifact-Specific Caveats

Analyzer Compatibility

Operator Responsibilities

Output Summary

`output_rule` — result output retention

`arguments_rule` — task argument retention