- Dataset features can now be easily accessed with the property dataset.features
model.test_estimatorswith CV will keep using CV if it's refitting the best estimator.Resultnow has a.parametersattribute to show what parameters generated the result- Switch to pyproject.toml for project metadata
- Updated documentation to use california housing dataset instead of boston
- Updated documentation to remove deprecated parameters from estimators
- Permutation importance and Feature importance are now two different plotting methods.
Model.test_estimatorsnow takes afeature_pipelineargument- Fixed a bug where
FillNAdid not create a_is_nacolumn if the column didn't have a missing value - Implemented Bayesian Search for hyperparameter optimization
- Added a
read_fileconvenience method toFileDatasetto read - Fixed a bug where
copy_tofailed between two instances of Sqlite based SQLDatasets - Fixed a bug where
ClassificationVisualize.confusion_matrixwould fail on multi-class problems due to wrong defaults - Added repr to demodataset
- Lift curve now can plot multi-class
- Precision-Recall curve can now plot multi-class
- ROC AUC curve can now plot multi-class
- Fixed Binner to have a default value
- Fixed FuncTransform to have a default value
load_estimatornow uses default storage if nothing is passedModel.bayessearcis nowModel.bayesiansearch- Added
target_feature_distributiontoDataset.plot
- Added
load_demo_datasetfunction - If the dataset has no train set
score_estimatorwill now runcreate_train_testwith default configurations Model.make_predictionnow takes a threshold argument when making a binary classification- All ML-tooling logging messages now go to stdout instead of stderr
- Can pass a feature pipeline to
Modelwhich will then automatically generate a combined feature_pipeline + estimator Pipeline - Can pass a feature pipeline to
Dataset.plotmethods, to apply preprocessing before visualization - New config implementation. If you need to reset the configuration, you should use
Model.config.reset_config()
- Fixed typehints in Dataset
- Dataset.create_train_test now takes a boolean
stratifyparameter. - Added default local filestorage when using
save_estimator - The dataframe returned by
.make_predictionnow labels the columns in a more human friendly manner - Dataset now verifies that
load_training_dataandload_prediction_datado not return empty - Added a missing data visualization to
Dataset.plot - FillNA now accepts a
is_nanflag which adds a flag indicating that a value was missing Model.make_predictionnow accepts ause_cacheflag to score everything in cached.x- Added a new Transformer:
RareFeatureEncoder
- Fixed type inferences from data to sql in _load_data
- Added idx arg to load_prediction_data abstract method in SQLDataset
- Added caching of loaded data in SQLDataset
- Added
.copy_tofunctionality to SQLDataset and FileDataset, allowing copying between datasets
- Bug fix for calculating feature importance when passing large amounts of data
- Bug fix when using default metric in
test_estimators - Bug fix when gridsearching, only applying last change
- Add nicer error message when passing incorrect dtypes to FillNA
- Storage .save method now only takes filename as parameter
- Handles storage loading of paths outputted from the Storage .get_list method
- Handles case when Dataset does not have a
yvalue - Added
plot_learning_curveand correspondingresult.plot.learning_curve - Added
plot_validation_curveand correspondingresult.plot.validation_curve - Replaced
permutation_importancewith scikit-learn's implementation - Added
target_correlationplots to Dataset.plot
- Bug fix for logging when feature unions (DFFeatureUnion) had tuples
- Hot fix python version to 3.7
- Breaking change - Model methods load_estimator and save_estimator now takes a Storage class that defines how and where to store estimators.
- Added the ability to declare that a saved model should be a production estimator.
- Added corresponding
.load_production_estimatortoModel
- Removed gitpython as a dependency
- Replaced custom feature permutation importance with sklearns implementation from v0.22
- Breaking change - Dataset is now a separate object that has to be instantiated outside Modeldata
- Breaking change - ModelData is now renamed to Model
- Added new properties
is_estimatorandis_regressorwhich checks what type of estimator is used
- Joblib is now a dependency, instead of being vendored with scikit-learn
- Updated requirements
- Breaking change - BaseClassModel renamed to ModelData.
- Breaking change - model renamed to estimator
- Added Precision-Recall Curve
- Added option to give custom file name to .save_estimator()
- Instantiating with estimator is now optional - set estimator later using .init_estimator
- We have a logo! Added ML Tooling logo to docs
- Now issues a warning when git is not installed.
- Data for a class is changed from instance variable to class variable
- Grid search only copies data to workers once and reuses them across grid and folds.
- The Data Class now takes a random seed which it will receive from the BaseClass
- Disabled mem-maping in feature importance
- Added license file to package
- Updated requirements
- Feature importances changed to use permutation instead of built-in for better estimates.
- .train_estimator will now reset the result attribute to None, in order to prevent users from mistakenly assuming the result is from the training
- Fixed bug in lift_score when using dataframes
- Fixed bug when training model and then scoring model
- Fixed bug where users could not save models if no result had been created, as would happen if the user only called .train_estimator before saving.
- Default_metric is now the same metric as the one specified for the model in .config
- Each class inheriting from ModelData has an individual config
- Changed get_scorer_func to wrap sklearn's get_scorer
- Fixed bug when gridsearching twice
- Added Binarize Transformer
- Added ability to use keywords in FuncTransformer
- .predict now returns a dataframe indexed on input
- Updated dependencies
- Added gridsearch method to BaseClass. Gridsearch your model and return a list of results for inspection
- Added ResultGroup - any method that returns a list of results now returns a ResultGroup instead.
- Added logging
- Added ability to record runs as yaml
- Another bugfix release
- Fixed bug that prevented DFRowFunc from pickling properly
- Added DFRowFunc Transformer
- Updated FillNA to handle categorical values
- Allow user to choose whether score_model uses cv or not
- Plot_feature_importance now takes a top_n and bottom_n argument
- Fix for error in setup wheels
- Implemented new FillNA Transformer
- Refactored to use flat structure
- Renamed project to ml_tooling
- Initial release