Skip to content

Add DSQL loader operations reference#176

Draft
amaksimo wants to merge 1 commit into
mainfrom
improve-dsql-loader-docs
Draft

Add DSQL loader operations reference#176
amaksimo wants to merge 1 commit into
mainfrom
improve-dsql-loader-docs

Conversation

@amaksimo
Copy link
Copy Markdown
Contributor

Summary

The current DSQL skill documents the loader at a basic invocation level (~30 lines in connectivity-tools.md) but does not cover the operational knowledge that determines whether a load succeeds and at what rate. This PR adds a dedicated references/data-loading.md with the operational details an agent needs when planning or diagnosing a load.

New file: plugins/databases-on-aws/skills/dsql/references/data-loading.md

Topics covered:

  • Fresh-vs-warm partition behavior — DSQL tables start on a single partition; sustained writes are required to drive splits; a fresh table absorbs roughly 3-4K rec/s from a single client. This is DSQL-side behavior, so any client-side tuning advice without it leads to misdiagnosis.
  • Resume and retry mechanics--manifest-dir, --resume-job-id, --keep-manifest. Includes the Amazon Linux 2023 /tmp tmpfs gotcha that silently makes resume impossible across an OOM/SIGKILL/reboot.
  • --on-conflict do-nothing semantics — when it is safe to use and a common pitfall (duplicate-PK rows in the source).
  • Schema inference caveats — when --dry-run should be used to surface the inferred schema first.
  • Index count → throughput — the 1 + num_indexes write cost and when to defer index creation with CREATE INDEX ASYNC.
  • Diagnostic decision tree — five symptom→cause mappings for slow or unexpected loads.

The existing loader entry in connectivity-tools.md keeps its install/quickstart content and now links out to the new reference. SKILL.md gets a one-entry index update.

Test plan

  • Render the new markdown locally; verify all internal links resolve (scaling-guide.md, development-guide.md, connectivity-tools.md).
  • Confirm CREATE INDEX ASYNC references match existing usage in the skill.
  • Confirm no project-specific or vendor-specific content snuck in (the content was distilled from a separate cost-estimator project; only DSQL/loader knowledge that applies to any user was kept).

Adds references/data-loading.md covering the operational knowledge that
isn't in connectivity-tools.md today: fresh-vs-warm partition behavior
and the ~3-4K rec/s single-partition ceiling, --manifest-dir / --resume-job-id
/ --keep-manifest semantics (including the /tmp tmpfs gotcha on AL2023),
--on-conflict do-nothing safety conditions, schema inference caveats with
--dry-run, the 1+num_indexes write cost, and a five-symptom diagnostic
decision tree.

connectivity-tools.md keeps its short loader entry and links out.
SKILL.md gets a new index entry.
@amaksimo amaksimo requested review from a team as code owners May 27, 2026 23:25
@amaksimo amaksimo marked this pull request as draft May 27, 2026 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant