Skip to content

Lading in Python#1907

Open
StephenWakely wants to merge 5 commits into
mainfrom
stephen/lading-py
Open

Lading in Python#1907
StephenWakely wants to merge 5 commits into
mainfrom
stephen/lading-py

Conversation

@StephenWakely

Copy link
Copy Markdown
Contributor

What does this PR do?

A brief description of the change being made with this pull request.

Motivation

What inspired you to submit this pull request?

Related issues

A list of issues either fixed, containing architectural discussions, otherwise relevant
for this Pull Request.

Additional Notes

Anything else we should know when reviewing?

StephenWakely and others added 2 commits June 11, 2026 09:07
…gstatsd-py

Adds a Python port of lading under lading_py/ that uses the dogstatsd-py
library for all metric emission, enabling direct testing of that client
under realistic load patterns.

All supporting lading capabilities are preserved: Prometheus and expvar
telemetry collection from a running Datadog Agent, JSONL/Parquet capture
output, passive Prometheus exporter, HTTP blackhole, /proc observer,
and the full lading YAML config schema.

The Dockerfile is updated from the Rust multi-stage build to a two-stage
Python build, dropping build time from several minutes to ~30 seconds.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@StephenWakely StephenWakely requested a review from a team as a code owner June 11, 2026 10:56
@datadog-prod-us1-4

datadog-prod-us1-4 Bot commented Jun 11, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 3 Pipeline jobs failed

Config Validation | Validate Example Configs   View in Datadog   GitHub Actions

Continuous integration | Integration Tests   View in Datadog   GitHub Actions

Changelog Check | changelog-check   View in Datadog   GitHub Actions

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: d38228d | Docs | Datadog PR Page | Give us feedback!

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8a995ae4fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lading/src/bin/lading.rs
}

fn main() -> Result<(), Error> {
panic!("Rust is forbidden.");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove the unconditional panic from the Rust entry point

When anyone runs the existing Rust executable, including cargo run --bin lading, installed lading, or smoke tests that still exercise the Rust binary, main now panics before parsing the CLI or running any workload. The crate still builds, so this becomes a runtime break of the primary lading binary rather than a compile-time failure.

Useful? React with 👍 / 👎.

Comment on lines +74 to +75
if kind == "counter":
self._registry.increment(name, int(value), merged)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Record Prometheus counters as deltas, not absolute totals

For Prometheus target metrics, counters are cumulative values reported by the target, but this code adds each scraped absolute value into the registry. Since Accumulator later differences registry counters between flushes, a stable counter value like http_requests_total 100 will be emitted as another 100 every scrape/flush instead of 0, and increasing counters are overcounted by repeatedly re-adding their full total. This affects configs using target_metrics[].prometheus with any counter metric.

Useful? React with 👍 / 👎.

self._gauges: dict[str, Gauge] = {}

def _sync(self) -> None:
counters, gauges, _ = self._lading_registry.snapshot()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Export counters from the Prometheus telemetry endpoint

When telemetry.prometheus.addr is configured, the exporter snapshots both counters and gauges but only registers gauges, so core lading metrics such as bytes_written and packets_sent never appear on /metrics. This leaves the advertised passive Prometheus exporter without the main throughput counters users need to monitor a run in real time.

Useful? React with 👍 / 👎.

Replaces --config (required) with --config-path (default: /etc/lading/lading.yaml)
and adds all flags the Rust binary exposes: run/config-check subcommands,
--no-target, --target-pid, --capture-path/format/flush-seconds,
--prometheus-addr, --experiment-duration-seconds/infinite,
--warmup-duration-seconds, --global-labels, and compat stubs for
unsupported flags. LADING_CONFIG env var is also honoured.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5fc947b95a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


async def run(self, signals) -> None:
while not signals.shutdown.is_set():
await asyncio.sleep(self._flush_seconds)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wake the accumulator on shutdown

When capture output is enabled and shutdown is requested just after this sleep starts, inner_main sets the shutdown event and then awaits the accumulator task, but the task cannot observe the event until the full flush_seconds delay has elapsed. With the default 60-second flush interval (or any larger configured value), short runs and SIGTERM shutdowns hang for that extra interval before the final flush, effectively ignoring the configured shutdown delay.

Useful? React with 👍 / 👎.

Comment on lines +229 to +231
if config.telemetry and config.telemetry.prometheus_addr:
exp = PrometheusExporter(registry, config.telemetry.prometheus_addr)
tasks.append(asyncio.create_task(exp.run(signals), name="prometheus_exporter"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Start the exporter for prometheus socket configs

When users configure telemetry.prometheus_socket.path or pass --prometheus-path, the override code stores that value, but this startup path only checks prometheus_addr, so no Prometheus exporter task is created for socket-only configurations. That makes the advertised --prometheus-path endpoint silently unavailable unless a TCP address is also configured.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5fc947b95a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


async def run(self, signals) -> None:
while not signals.shutdown.is_set():
await asyncio.sleep(self._flush_seconds)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wake the accumulator on shutdown

When capture output is enabled and shutdown is requested just after this sleep starts, inner_main sets the shutdown event and then awaits the accumulator task, but the task cannot observe the event until the full flush_seconds delay has elapsed. With the default 60-second flush interval (or any larger configured value), short runs and SIGTERM shutdowns hang for that extra interval before the final flush, effectively ignoring the configured shutdown delay.

Useful? React with 👍 / 👎.

Comment on lines +229 to +231
if config.telemetry and config.telemetry.prometheus_addr:
exp = PrometheusExporter(registry, config.telemetry.prometheus_addr)
tasks.append(asyncio.create_task(exp.run(signals), name="prometheus_exporter"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Start the exporter for prometheus socket configs

When users configure telemetry.prometheus_socket.path or pass --prometheus-path, the override code stores that value, but this startup path only checks prometheus_addr, so no Prometheus exporter task is created for socket-only configurations. That makes the advertised --prometheus-path endpoint silently unavailable unless a TCP address is also configured.

Useful? React with 👍 / 👎.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be339ac27d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +30 to +32
class ConfRange(BaseModel):
inclusive: InclusiveRange | None = None
exclusive: ExclusiveRange | None = None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve existing constant ranges

Configs that use the existing lading ConfRange form constant (for example for contexts, tags_per_msg, or multivalue_count) are accepted by Pydantic as extra data but leave both inclusive and exclusive unset; the generator then crashes when lo/hi dereference self.exclusive. This breaks compatibility for valid lading YAMLs that rely on constant ranges instead of spelling identical inclusive min/max values.

Useful? React with 👍 / 👎.

Comment on lines +78 to +85
now = time.monotonic()
elapsed = now - self._last
self._tokens = min(self._rate, self._tokens + elapsed * self._rate)
self._last = now
if self._tokens >= n:
self._tokens -= n
return
wait = (n - self._tokens) / self._rate

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow buckets to accumulate one full block

When bytes_per_second is lower than the estimated size of a generated block, acquire() can never finish because tokens are capped at self._rate, which is always less than n; the send thread then sleeps forever and does not observe shutdown. This affects low-rate configs or large multi-value/event blocks, causing the run to hang instead of emitting at the configured slow rate.

Useful? React with 👍 / 👎.

Comment on lines +22 to +23
def _sync(self) -> None:
counters, gauges, _ = self._lading_registry.snapshot()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid draining histograms from the exporter

Serving the passive Prometheus endpoint calls Registry.snapshot(), and that method clears histogram samples, but the exporter ignores the returned histograms. With telemetry.prometheus enabled alongside capture or histogram/summary target metrics, each /metrics scrape can silently discard those samples before the accumulator writes them, leaving gaps in captured histogram data.

Useful? React with 👍 / 👎.

aiohttp fails at parse time when receiving zstd-compressed requests
without this backport of Python 3.14's compression.zstd module.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d38228daee

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +302 to +303
if config.experiment_duration_secs > 0:
await asyncio.sleep(config.experiment_duration_secs)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wake fixed-duration runs on shutdown

When a SIGTERM/SIGINT arrives during a configured experiment duration, the signal handler only sets signals.shutdown, but this branch is still awaiting asyncio.sleep(config.experiment_duration_secs). As a result, a normal 60-second (or longer) run keeps the main task asleep until the full duration elapses before it starts draining tasks, so shutdown requests are not honored promptly and --max-shutdown-delay cannot take effect for fixed-duration runs.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant