Releases: datasciencedynamics/eda_toolkit
Releases · datasciencedynamics/eda_toolkit
EDA Toolkit 0.0.29
What's Changed
Full Changelog: 0.0.28...0.0.29
EDA Toolkit 0.0.28
What's Changed
Full Changelog: 0.0.27...0.0.28
EDA Toolkit 0.0.27
Fix missing thousands separator in continuous "Overall" column of generate_table1()
Inside generate_table1, the make_overall helper was formatting mean and SD with :.{n}f instead of :,.{n}f, so values like
189664.13 appeared without commas. Added the , flag to both format strings. No behavioral change to categorical Overall or any
other column.
EDA Toolkit 0.0.26
What's Changed
- ENH: Add Overall column to generate_table1 by @lshpaner in #126
- refactor: remove duplicate
_flag_iqrand_flag_zscoredefinitions fromdetect_outliersby @lshpaner in #127
Full Changelog: 0.0.25...0.0.26
EDA Toolkit 0.0.25
Fix: correct p-value precision loss in Bonferroni/BH adjustment
- Previously, p-values were rounded before being fed into the multiple comparison correction block, causing small but meaningful errors in adjusted values. Raw p-values are now stored as
_raw_pvalinternally, the correction block reads from those, and a cleanup pass removes the key before DataFrame construction. All rounding is handled consistently viadecimal_placesat the formatting stage.
Added new normality_tests() function to eda_toolkit.
New Features
- Batch normality testing across numeric columns via Shapiro-Wilk,
D'Agostino K², and Anderson-Darling alpha,tests,features, anddecimal_placesparameters- Shapiro-Wilk auto-skipped for n > 5,000 with printed explanation
- Anderson-Darling uses
method="interpolate"on scipy >= 1.17
for real p-values; falls back to critical value comparison on
older scipy - Returns tidy summary DataFrame with
Variable,Test,
Statistic,P-value,Normalcolumns - 40 pytest tests added
EDA Toolkit 0.0.24
Version 0.0.24
New Features
detect_outliers(): New batch outlier detection function supporting
IQR, Z-score, and Isolation Forest methods withgroupby,flag_col,
return_mask,return_bounds, andverboseparametersflex_corr_matrix(): Addedcorr_method(pearson/spearman/kendall),
show_significance,significance_level,significance_method(stars/mask),
significance_legend_x,filter_significance, andimage_filenameparameters
Enhancements
read_csv_with_progress(): Addedchunksize,low_memory, and**kwargs
passthrough topd.read_csv
Bug Fixes
flex_corr_matrix(): Fixed title misalignment,-0.00display artifact,
diagonal now shows1.00in both stars and mask modes; fixedabs()stripping
negative signs from significance annotationsdetect_outliers(): Addedobserved=Truetogroupbycalls to silence
pandasFutureWarning
Dependency Updates
- Removed upper caps on
matplotlib,scikit-learn, andtqdmto prevent
Colab downgrades
Infrastructure
_data_manager_utils.py: Extracted_flag_iqrand_flag_zscorehelpers- new pytest tests added across
detect_outliers,flex_corr_matrix,
andread_csv_with_progress - README stripped to lean landing page; full docs at
datasciencedynamics.com/eda_toolkit_docs/
EDA Toolkit 0.0.23
What's Changed
- enhance stacked_crosstab_plot params & visuals by @lshpaner in #119
- fix deps, bump python floor to 3.8 by @lshpaner in #120
- fix: patch box_violin_plot bugs and add features by @lshpaner in #121
- (+) read_csv_with_progress by @lshpaner in #122
- enhance flex_corr_matrix with sig & filters by @lshpaner in #123
- Fix generate table1 ls by @lshpaner in #124
- add detect_outliers with iqr/zscore/isoforest by @lshpaner in #125
Full Changelog: 0.0.22...0.0.23
EDA Toolkit 0.0.22
- Renamed
conditional_histogramstogrouped_distributions - Updated SciPy version requirement in
README
EDA Toolkit 0.0.21
- Fixed dependency resolution for Python 3.11+ by updating SciPy version constraints.
- Improved compatibility with Google Colab and Python 3.12 environments.
- Updated package acknowledgements and metadata in
__init__.py. - No API changes.
EDA Toolkit 0.0.20
What's Changed
- flexible summarize_combinations sans excel by @lshpaner in #104
- Groupby imputer by @Oscar-Gil-Data in #109
- Refactor
kde_distributionsand extract density overlay logic into plot utilities by @lshpaner in #110 - Introduce Enhanced Q-Q Plot Support for Distribution GOF Diagnostics by @lshpaner in #111
- Add ECDF plot type to data_doctor; add SciPy to requirements by @Oscar-Gil-Data in #112
- Add conditional histograms and unify figure-saving utilities by @lshpaner in #114
- DataFrame memory management utility, del_inactive_dataframes by @Oscar-Gil-Data in #115
Full Changelog: 0.0.19...0.0.20