In pipelines the output of a supervised model that gets propagated to the next component in the pipeline is the output of predict. However, some supervised models also learn a transformation. For example, MLJFlux's NeuralNetworkClassifier and NeuralNetworkRegressor learn entity embeddings to handle categorical inputs, and transform gives access just to these embeddings. We want to use these embeddings as a preprocessing step for some other supervised learner, as in
NeuralNetworkClassifier |> LogisticClassifier
but of course this doesn't work, because the first model is propagating the output of predict instead of transform, because the pipeline apparatus identifies NeuralNetworkClassifier as a Supervised model.
We actually solved this problem in MLJFlux by introducing the EntityEmbedder wrapper, so that the following works:
EntityEmbedder(NeuralNetworkClassifier) |> LogisticClassifier
However, it has struck me rather late, that this wrapper likely works (or should in principle work) for any Supervised model with a transform. So we should really call EntityEmbedder something like Transformer, and perhaps make it immediately available (e.g. by moving it to MLJTransforms.jl).
Thoughts anyone?
@EssamWisam
In pipelines the output of a supervised model that gets propagated to the next component in the pipeline is the output of
predict. However, some supervised models also learn a transformation. For example, MLJFlux's NeuralNetworkClassifier and NeuralNetworkRegressor learn entity embeddings to handle categorical inputs, andtransformgives access just to these embeddings. We want to use these embeddings as a preprocessing step for some other supervised learner, as inNeuralNetworkClassifier |> LogisticClassifierbut of course this doesn't work, because the first model is propagating the output of
predictinstead oftransform, because the pipeline apparatus identifiesNeuralNetworkClassifieras aSupervisedmodel.We actually solved this problem in MLJFlux by introducing the
EntityEmbedderwrapper, so that the following works:However, it has struck me rather late, that this wrapper likely works (or should in principle work) for any
Supervisedmodel with atransform. So we should really callEntityEmbeddersomething likeTransformer, and perhaps make it immediately available (e.g. by moving it to MLJTransforms.jl).Thoughts anyone?
@EssamWisam