Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/FeatureSelection.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,6 @@ const MMI = MLJModelInterface
include("models/featureselector.jl")
include("models/rfe.jl")
include("shared.jl")
include("type_docstrings.jl")

end # module
80 changes: 1 addition & 79 deletions src/models/featureselector.jl
Original file line number Diff line number Diff line change
Expand Up @@ -87,82 +87,4 @@ MMI.metadata_model(
load_path = "FeatureSelection.FeatureSelector"
)

## Docstring
"""
$(MMI.doc_header(FeatureSelector))

Use this model to select features (columns) of a table, usually as
part of a model `Pipeline`.


# Training data

In MLJ or MLJBase, bind an instance `model` to data with

mach = machine(model, X)

where

- `X`: any table of input features, where "table" is in the sense of Tables.jl

Train the machine using `fit!(mach, rows=...)`.


# Hyper-parameters

- `features`: one of the following, with the behavior indicated:

- `[]` (empty, the default): filter out all features (columns) which
were not encountered in training

- non-empty vector of feature names (symbols): keep only the
specified features (`ignore=false`) or keep only unspecified
features (`ignore=true`)

- function or other callable: keep a feature if the callable returns
`true` on its name. For example, specifying
`FeatureSelector(features = name -> name in [:x1, :x3], ignore =
true)` has the same effect as `FeatureSelector(features = [:x1,
:x3], ignore = true)`, namely to select all features, with the
exception of `:x1` and `:x3`.

- `ignore`: whether to ignore or keep specified `features`, as
explained above


# Operations

- `transform(mach, Xnew)`: select features from the table `Xnew` as
specified by the model, taking features seen during training into
account, if relevant


# Fitted parameters

The fields of `fitted_params(mach)` are:

- `features_to_keep`: the features that will be selected


# Example

```
using MLJ

X = (ordinal1 = [1, 2, 3],
ordinal2 = coerce(["x", "y", "x"], OrderedFactor),
ordinal3 = [10.0, 20.0, 30.0],
ordinal4 = [-20.0, -30.0, -40.0],
nominal = coerce(["Your father", "he", "is"], Multiclass));

selector = FeatureSelector(features=[:ordinal3, ], ignore=true);

julia> transform(fit!(machine(selector, X)), X)
(ordinal1 = [1, 2, 3],
ordinal2 = CategoricalValue{Symbol,UInt32}["x", "y", "x"],
ordinal4 = [-20.0, -30.0, -40.0],
nominal = CategoricalValue{String,UInt32}["Your father", "he", "is"],)

```
"""
FeatureSelector
# docstring is in "src/type_docstrings.jl"
81 changes: 81 additions & 0 deletions src/type_docstrings.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# This file cannot be include before types and all metadata is defined

## Docstring
"""
$(MMI.doc_header(FeatureSelector))

Use this model to select features (columns) of a table, usually as
part of a model `Pipeline`.


# Training data

In MLJ or MLJBase, bind an instance `model` to data with

mach = machine(model, X)

where

- `X`: any table of input features, where "table" is in the sense of Tables.jl

Train the machine using `fit!(mach, rows=...)`.


# Hyper-parameters

- `features`: one of the following, with the behavior indicated:

- `[]` (empty, the default): filter out all features (columns) which
were not encountered in training

- non-empty vector of feature names (symbols): keep only the
specified features (`ignore=false`) or keep only unspecified
features (`ignore=true`)

- function or other callable: keep a feature if the callable returns
`true` on its name. For example, specifying
`FeatureSelector(features = name -> name in [:x1, :x3], ignore =
true)` has the same effect as `FeatureSelector(features = [:x1,
:x3], ignore = true)`, namely to select all features, with the
exception of `:x1` and `:x3`.

- `ignore`: whether to ignore or keep specified `features`, as
explained above


# Operations

- `transform(mach, Xnew)`: select features from the table `Xnew` as
specified by the model, taking features seen during training into
account, if relevant


# Fitted parameters

The fields of `fitted_params(mach)` are:

- `features_to_keep`: the features that will be selected


# Example

```
using MLJ

X = (ordinal1 = [1, 2, 3],
ordinal2 = coerce(["x", "y", "x"], OrderedFactor),
ordinal3 = [10.0, 20.0, 30.0],
ordinal4 = [-20.0, -30.0, -40.0],
nominal = coerce(["Your father", "he", "is"], Multiclass));

selector = FeatureSelector(features=[:ordinal3, ], ignore=true);

julia> transform(fit!(machine(selector, X)), X)
(ordinal1 = [1, 2, 3],
ordinal2 = CategoricalValue{Symbol,UInt32}["x", "y", "x"],
ordinal4 = [-20.0, -30.0, -40.0],
nominal = CategoricalValue{String,UInt32}["Your father", "he", "is"],)

```
"""
FeatureSelector
1 change: 1 addition & 0 deletions test/models/featureselector.jl
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@
# Test model Metadata
@test MLJBase.input_scitype(selector) == MLJBase.Table
@test MLJBase.output_scitype(selector) == MLJBase.Table
@test MLJBase.package_name(selector) == "FeatureSelection"
end

# To be added with FeatureSelectorRule X = (n1=["a", "b", "a"], n2=["g", "g", "g"], n3=[7, 8, 9],
Expand Down
6 changes: 4 additions & 2 deletions test/models/rfe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,10 @@ const DTM = DummyTestModels
# Traits
@test MLJBase.package_name(selector) == "FeatureSelection"
@test MLJBase.load_path(selector) == "FeatureSelection.RecursiveFeatureElimination"
@test MLJBase.iteration_parameter(selector) == FeatureSelection.prepend(:model, MLJBase.iteration_parameter(selector.model))
@test MLJBase.training_losses(selector, rpt) == MLJBase.training_losses(selector.model, rpt.model_report)
@test MLJBase.iteration_parameter(selector) ==
FeatureSelection.prepend(:model, MLJBase.iteration_parameter(selector.model))
@test MLJBase.training_losses(selector, rpt) ==
MLJBase.training_losses(selector.model, rpt.model_report)
end

@testset "Compare results for RFE with scikit-learn" begin
Expand Down
Loading