Skip to content

added oed test file generator#238

Open
benhayes21 wants to merge 2 commits intomainfrom
feature/oed_file_generator
Open

added oed test file generator#238
benhayes21 wants to merge 2 commits intomainfrom
feature/oed_file_generator

Conversation

@benhayes21
Copy link
Copy Markdown
Contributor

Add OED test data generator CLI command

Add a new ods_tools generate CLI command that produces synthetic OED test data files
Loc, Acc, ReinsInfo, ReinsScope) from a JSON configuration. The generator creates files
with referential integrity across all four OED file types and supports configurable
portfolio structure, field selection, financial terms, and output format (csv/parquet).

Changes:

  • ods_tools/oed/oed_generator.py — New module containing OEDFileGenerator and
    DataGenerator classes that produce synthetic OED data driven by a JSON config. Schema
    loading updated to use the existing OedSchema class.
  • ods_tools/main.py — Registered the generate subcommand with options: --config,
    --output-dir, --format, --example-config, --list-versions, --list-fields, --oed-version.
  • tests/data/oed_generator_config.json — Example configuration file for the generator.
  • README.md — Documented all CLI commands (convert, check, transform, combine, generate).
    Previously only convert and transform were documented.

Usage:
ods_tools generate --config config.json --output-dir ./output
ods_tools generate --example-config > my_config.json
ods_tools generate --list-fields Loc --oed-version 4.0.0

@benhayes21
Copy link
Copy Markdown
Contributor Author

closes #239

"ReinsType": [
"QS",
"CXL",
"PR"
Copy link
Copy Markdown
Contributor

@sstruzik sstruzik Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using default config, I get those reinstype. So this must not be taken into account
QS
SS
AXL
It is also important because we currently don't support AXL. so on my machine, exposure run on the default fails.

if prefix:
filename = f"{prefix}_{spec_key}"
else:
filename = f"Source{spec_key}OED"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think default filename should correspond to the usual filename we have in ods_tools/oed/common.py
USUAL_FILE_NAME = {
'location': ['location'],
'account': ['account'],
'ri_info': ['ri_info', 'reinsinfo'],
'ri_scope': ['ri_scope', 'reinsscope'],
}

"City"
],
"fixed_values": {},
"filename_prefix": "Test"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would set the value of all filename_prefix to "", so that by default the files match the usual file name so you can exposure run directly on the folder without renaming

if file_type == "ReinsScope":
if field_name == "ReinsNumber":
if self.context["reins_numbers"]:
return self.generator.rng.choice(self.context["reins_numbers"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue with the reinstype probably need to be handle here.
Not sure what the fixed_values logic is in the code. but we should explain it in the read me

Copy link
Copy Markdown
Contributor

@sambles sambles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems pretty good, a few minor comments from me


def _gen_reinsexpirydate(self, spec, file_type, row_idx, ctx):
base = datetime(2026, 1, 1) + timedelta(days=self.rng.randint(0, 365))
return base.strftime("%Y-%m-%d")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I not a fan of these unused method params, i get that theses are fetched as needed.

But maybe def _gen_reinsexpirydate(self, **kwargs):
vs
def _gen_reinsexpirydate(self, spec, file_type, row_idx, ctx): ?

and only list when the param is used

# --- Date generators ---

def _gen_locexpirydate(self, spec, file_type, row_idx, ctx):
return (datetime(2026, 1, 1) + timedelta(days=self.rng.randint(0, 365))).strftime("%Y-%m-%d")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also it might be a good idea to have the base year of the date fields configurable datetime(2026, 1, 1) ~ again, minor issue

Comment thread ods_tools/main.py
formatter_class=argparse.RawTextHelpFormatter)
generate_command.add_argument('--config', help='Path to JSON configuration file', default=None)
generate_command.add_argument('--output-dir', help='Output directory for generated files (default: ./oed_output)',
default='./oed_output')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

./oed_output feels a bit too generic, maybe something like generated_oed_output/<timestamp> as a default, then it won't overwrite the same file sets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants