Skip to content

Releases: datasciencedynamics/eda_toolkit

EDA Toolkit 0.0.29

11 May 23:51

Choose a tag to compare

What's Changed

  • fix(table1): handle NaN categories in groupby by @lshpaner in #130

Full Changelog: 0.0.28...0.0.29

EDA Toolkit 0.0.28

31 Mar 21:48

Choose a tag to compare

What's Changed

  • feat: grouped_distributions, table1, box_violin by @lshpaner in #128

Full Changelog: 0.0.27...0.0.28

EDA Toolkit 0.0.27

29 Mar 01:22

Choose a tag to compare

Fix missing thousands separator in continuous "Overall" column of generate_table1()

Inside generate_table1, the make_overall helper was formatting mean and SD with :.{n}f instead of :,.{n}f, so values like
189664.13 appeared without commas. Added the , flag to both format strings. No behavioral change to categorical Overall or any
other column.

EDA Toolkit 0.0.26

29 Mar 00:06

Choose a tag to compare

What's Changed

  • ENH: Add Overall column to generate_table1 by @lshpaner in #126
  • refactor: remove duplicate _flag_iqr and _flag_zscore definitions from detect_outliers by @lshpaner in #127

Full Changelog: 0.0.25...0.0.26

EDA Toolkit 0.0.25

24 Mar 05:51

Choose a tag to compare

Fix: correct p-value precision loss in Bonferroni/BH adjustment

  • Previously, p-values were rounded before being fed into the multiple comparison correction block, causing small but meaningful errors in adjusted values. Raw p-values are now stored as _raw_pval internally, the correction block reads from those, and a cleanup pass removes the key before DataFrame construction. All rounding is handled consistently via decimal_places at the formatting stage.

Added new normality_tests() function to eda_toolkit.

New Features

  • Batch normality testing across numeric columns via Shapiro-Wilk,
    D'Agostino K², and Anderson-Darling
  • alpha, tests, features, and decimal_places parameters
  • Shapiro-Wilk auto-skipped for n > 5,000 with printed explanation
  • Anderson-Darling uses method="interpolate" on scipy >= 1.17
    for real p-values; falls back to critical value comparison on
    older scipy
  • Returns tidy summary DataFrame with Variable, Test,
    Statistic, P-value, Normal columns
  • 40 pytest tests added

EDA Toolkit 0.0.24

22 Mar 20:40

Choose a tag to compare

Version 0.0.24

New Features

  • detect_outliers(): New batch outlier detection function supporting
    IQR, Z-score, and Isolation Forest methods with groupby, flag_col,
    return_mask, return_bounds, and verbose parameters
  • flex_corr_matrix(): Added corr_method (pearson/spearman/kendall),
    show_significance, significance_level, significance_method (stars/mask),
    significance_legend_x, filter_significance, and image_filename parameters

Enhancements

  • read_csv_with_progress(): Added chunksize, low_memory, and **kwargs
    passthrough to pd.read_csv

Bug Fixes

  • flex_corr_matrix(): Fixed title misalignment, -0.00 display artifact,
    diagonal now shows 1.00 in both stars and mask modes; fixed abs() stripping
    negative signs from significance annotations
  • detect_outliers(): Added observed=True to groupby calls to silence
    pandas FutureWarning

Dependency Updates

  • Removed upper caps on matplotlib, scikit-learn, and tqdm to prevent
    Colab downgrades

Infrastructure

  • _data_manager_utils.py: Extracted _flag_iqr and _flag_zscore helpers
  • new pytest tests added across detect_outliers, flex_corr_matrix,
    and read_csv_with_progress
  • README stripped to lean landing page; full docs at
    datasciencedynamics.com/eda_toolkit_docs/

EDA Toolkit 0.0.23

21 Mar 21:00

Choose a tag to compare

What's Changed

Full Changelog: 0.0.22...0.0.23

EDA Toolkit 0.0.22

20 Dec 20:03

Choose a tag to compare

  • Renamed conditional_histograms to grouped_distributions
  • Updated SciPy version requirement in README

EDA Toolkit 0.0.21

20 Dec 18:49

Choose a tag to compare

  • Fixed dependency resolution for Python 3.11+ by updating SciPy version constraints.
  • Improved compatibility with Google Colab and Python 3.12 environments.
  • Updated package acknowledgements and metadata in __init__.py.
  • No API changes.

EDA Toolkit 0.0.20

20 Dec 16:40

Choose a tag to compare

What's Changed

  • flexible summarize_combinations sans excel by @lshpaner in #104
  • Groupby imputer by @Oscar-Gil-Data in #109
  • Refactor kde_distributions and extract density overlay logic into plot utilities by @lshpaner in #110
  • Introduce Enhanced Q-Q Plot Support for Distribution GOF Diagnostics by @lshpaner in #111
  • Add ECDF plot type to data_doctor; add SciPy to requirements by @Oscar-Gil-Data in #112
  • Add conditional histograms and unify figure-saving utilities by @lshpaner in #114
  • DataFrame memory management utility, del_inactive_dataframes by @Oscar-Gil-Data in #115

Full Changelog: 0.0.19...0.0.20