[r] add poc matrix projection interface #158
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add an sklearn-like interface for creating pipelines in terms of fitting, projection, and combining operations on matrices.
Current thoughts:
I deviated from the design docs a little bit to make the inheritance make more sense. I made a default
PipelineBase, withPipelineinheriting form it. I then made aPipelineStepthat inherits from PipelineBase.EstimatorandTransformerboth inherit from this class.PipelineBaseandPipelinediffer because Pipelines should have steps. This isn't true for single PipelineStepsPipelineBaseandPipelineStepdiffer to indicate each step has a step_name associated with it. Also to allow for shared interface for transformers/predictors in how they are printed and how they can be concatenated to create a pipelineI had to change
transform()toproject(), given I found a generic base function with the same name. Additionally, I foundpredict()in the stats package, and changed it toestimate()Tests will be added in another sister PR, as there are no transformers/estimators that are built to test functionality here.
I'm not sure which methods I should provide detail to, given that we are not sure how much of this we want to expose. I provided them to the generics themselves, to allow for a meta-look on how to use methods in both
PipelineStepsandPipeline. However, it isn't clear to me whether I need to continue providing an extensive docstring for every overriden method in child classes.I'm not sure which Classes I should be exposing to the reference either. I found that previous BPCells classes (ie IterableMatrix) aren't heavily described in the reference. I provided some information on
Pipeline,Estimator, andTransformer, and exposed them to the reference page. I also tried to provide information on how to create aTransformer, andEstimatoryourself on the docstring.I don't think I'm completely sold on using the
show()method as an analog of the python__repr__()dunder. I think it could be more useful to make it act more similarly to what you used to displayIterableMatrix, ie where we still have information on what steps are in a pipeline, but also macro information, like hyper params or details on what the step has fit to. In this case, we would have a__repr__()analog somewhere else.Probably redundant to have both
project()andestimate(). What do you think for just combining them into one?