Releases · datasciencedynamics/eda_toolkit

Inside generate_table1, the make_overall helper was formatting mean and SD with :.{n}f instead of :,.{n}f, so values like
189664.13 appeared without commas. Added the , flag to both format strings. No behavioral change to categorical Overall or any
other column.

Assets 2

29 Mar 00:06

lshpaner

0.0.26

1000726

EDA Toolkit 0.0.26

What's Changed

ENH: Add Overall column to generate_table1 by @lshpaner in #126
refactor: remove duplicate _flag_iqr and _flag_zscore definitions from detect_outliers by @lshpaner in #127

Full Changelog: 0.0.25...0.0.26

Contributors

lshpaner

Assets 2

24 Mar 05:51

lshpaner

0.0.25

44d814c

EDA Toolkit 0.0.25

Fix: correct p-value precision loss in Bonferroni/BH adjustment

Previously, p-values were rounded before being fed into the multiple comparison correction block, causing small but meaningful errors in adjusted values. Raw p-values are now stored as _raw_pval internally, the correction block reads from those, and a cleanup pass removes the key before DataFrame construction. All rounding is handled consistently via decimal_places at the formatting stage.

Added new `normality_tests()` function to `eda_toolkit`.

New Features

Batch normality testing across numeric columns via Shapiro-Wilk,
D'Agostino K², and Anderson-Darling
alpha, tests, features, and decimal_places parameters
Shapiro-Wilk auto-skipped for n > 5,000 with printed explanation
Anderson-Darling uses method="interpolate" on scipy >= 1.17
for real p-values; falls back to critical value comparison on
older scipy
Returns tidy summary DataFrame with Variable, Test,
Statistic, P-value, Normal columns
40 pytest tests added

Assets 2

22 Mar 20:40

lshpaner

0.0.24

e34f29a

EDA Toolkit 0.0.24

Version 0.0.24

New Features

detect_outliers(): New batch outlier detection function supporting
IQR, Z-score, and Isolation Forest methods with groupby, flag_col,
return_mask, return_bounds, and verbose parameters
flex_corr_matrix(): Added corr_method (pearson/spearman/kendall),
show_significance, significance_level, significance_method (stars/mask),
significance_legend_x, filter_significance, and image_filename parameters

Enhancements

read_csv_with_progress(): Added chunksize, low_memory, and **kwargs
passthrough to pd.read_csv

Bug Fixes

flex_corr_matrix(): Fixed title misalignment, -0.00 display artifact,
diagonal now shows 1.00 in both stars and mask modes; fixed abs() stripping
negative signs from significance annotations
detect_outliers(): Added observed=True to groupby calls to silence
pandas FutureWarning

Dependency Updates

Removed upper caps on matplotlib, scikit-learn, and tqdm to prevent
Colab downgrades

Infrastructure

_data_manager_utils.py: Extracted _flag_iqr and _flag_zscore helpers
new pytest tests added across detect_outliers, flex_corr_matrix,
and read_csv_with_progress
README stripped to lean landing page; full docs at
datasciencedynamics.com/eda_toolkit_docs/

Assets 2

21 Mar 21:00

lshpaner

0.0.23

881e01c

EDA Toolkit 0.0.23

What's Changed

enhance stacked_crosstab_plot params & visuals by @lshpaner in #119
fix deps, bump python floor to 3.8 by @lshpaner in #120
fix: patch box_violin_plot bugs and add features by @lshpaner in #121
(+) read_csv_with_progress by @lshpaner in #122
enhance flex_corr_matrix with sig & filters by @lshpaner in #123
Fix generate table1 ls by @lshpaner in #124
add detect_outliers with iqr/zscore/isoforest by @lshpaner in #125

Full Changelog: 0.0.22...0.0.23

Contributors

lshpaner

Assets 2

20 Dec 20:03

lshpaner

0.0.22

4009c27

EDA Toolkit 0.0.22

Renamed conditional_histograms to grouped_distributions
Updated SciPy version requirement in README

Assets 2

20 Dec 18:49

lshpaner

0.0.21

29ddced

EDA Toolkit 0.0.21

Fixed dependency resolution for Python 3.11+ by updating SciPy version constraints.
Improved compatibility with Google Colab and Python 3.12 environments.
Updated package acknowledgements and metadata in __init__.py.
No API changes.

Assets 2

20 Dec 16:40

lshpaner

0.0.20

ad7a926

EDA Toolkit 0.0.20

What's Changed

flexible summarize_combinations sans excel by @lshpaner in #104
Groupby imputer by @Oscar-Gil-Data in #109
Refactor kde_distributions and extract density overlay logic into plot utilities by @lshpaner in #110
Introduce Enhanced Q-Q Plot Support for Distribution GOF Diagnostics by @lshpaner in #111
Add ECDF plot type to data_doctor; add SciPy to requirements by @Oscar-Gil-Data in #112
Add conditional histograms and unify figure-saving utilities by @lshpaner in #114
DataFrame memory management utility, del_inactive_dataframes by @Oscar-Gil-Data in #115

Full Changelog: 0.0.19...0.0.20

Contributors

Oscar-Gil-Data and lshpaner

Assets 2

Releases: datasciencedynamics/eda_toolkit

EDA Toolkit 0.0.29

What's Changed

Contributors

Uh oh!

EDA Toolkit 0.0.28

What's Changed

Contributors

Uh oh!

EDA Toolkit 0.0.27

Fix missing thousands separator in continuous "Overall" column of generate_table1()

Uh oh!

EDA Toolkit 0.0.26

What's Changed

Contributors

Uh oh!

EDA Toolkit 0.0.25

Fix: correct p-value precision loss in Bonferroni/BH adjustment

Added new normality_tests() function to eda_toolkit.

New Features

Uh oh!

EDA Toolkit 0.0.24

Version 0.0.24

New Features

Enhancements

Bug Fixes

Dependency Updates

Infrastructure

Uh oh!

EDA Toolkit 0.0.23

What's Changed

Contributors

Uh oh!

EDA Toolkit 0.0.22

Uh oh!

EDA Toolkit 0.0.21

Uh oh!

EDA Toolkit 0.0.20

What's Changed

Contributors

Uh oh!

Fix missing thousands separator in continuous "Overall" column of `generate_table1()`

Added new `normality_tests()` function to `eda_toolkit`.