Skip to content

Store the intermediary results of the custom synthesizers, but the generated result file cannot be opened #321

@T0217

Description

@T0217

Environment Details

  • SDGym version: 0.8.0
  • Python version: 3.11.5
  • Operating System: Windows 11

Error Description

Thank you for sharing the code. When creating a custom synthesizer in SDGym, it is important to store the intermediate results. However, the generated result file cannot be opened. And I can not find the code to generate the file, what should I do?

Steps to reproduce

import os
import shutil
import sdgym
from sdgym import create_single_table_synthesizer
from sdgym.synthesizers import (UniformSynthesizer,
                                GaussianCopulaSynthesizer,
                                TVAESynthesizer)
import warnings
warnings.filterwarnings('ignore')

synthesizers = [
    UniformSynthesizer,
    GaussianCopulaSynthesizer,
    TVAESynthesizer
]


# YData
# CTGAN
def ctgan_get_trained_synthesizer(data, metadata):
    from ydata_synthetic.synthesizers.regular import RegularSynthesizer
    from ydata_synthetic.synthesizers import ModelParameters, TrainParameters

    ctgan_args = ModelParameters(batch_size=500, lr=2e-4, betas=(0.5, 0.9))
    train_args = TrainParameters(epochs=2)

    synthesizer = RegularSynthesizer(modelname='ctgan', model_parameters=ctgan_args)

    num_cols = [col for col, sdtype in metadata['columns'].items() if sdtype['sdtype'] in ['numerical', 'datetime']]
    cat_cols = [col for col, sdtype in metadata['columns'].items() if sdtype['sdtype'] == 'categorical']

    synthesizer.fit(data=data,
                    train_arguments=train_args,
                    num_cols=num_cols,
                    cat_cols=cat_cols)

    return synthesizer


def sample_from_synthesizer(synthesizer, n_rows):
    synthetic_data = synthesizer.sample(n_rows)
    return synthetic_data


YData_CTGANSynthesizer = create_single_table_synthesizer(
    get_trained_synthesizer_fn=ctgan_get_trained_synthesizer,
    sample_from_synthesizer_fn=sample_from_synthesizer,
    display_name='YData-CTGAN'
)


custom_synthesizers = [YData_CTGANSynthesizer]

# Detect the existence of the folder
detailed_results_folder = r"C:\Users\18840\Desktop\result"

if os.path.isdir(detailed_results_folder) and\
   os.path.exists(detailed_results_folder):
    print('The folder where the intermediate files are stored already exists and is processed for deletion.')
    shutil.rmtree(detailed_results_folder, ignore_errors=True)
    print('-' * 50)

results = sdgym.benchmark_single_table(
    synthesizers=synthesizers,
    custom_synthesizers=custom_synthesizers,
    show_progress=True,
    multi_processing_config={
     'package_name': 'multiprocessing',
     'num_workers': 8
    },
    sdv_datasets=['adult'],
    detailed_results_folder=detailed_results_folder
)

Here is an example of the output files.
Snipaste_2024-07-03_13-02-11

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingnewAutomatic label applied to new issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions