Pipeline for machine learning binary problems, wrote completely with numpy.
- Pipeline: Class used to create a pipe:
setStages(): set an array of PipelineStageaddStages(): add an array of PipelineStagefit(): get DTR (nFeature, nSample) and LTR (nSample,), it will return amodel. It call iterativelycompute()of a PipelineStage
- PipelineStage: Interface for a stage of a Pipeline
compute(): main method to start the stage
- VoidStage: A stage that doesn't do anything
- Model: Interface of a model
transform(): get DTE (nFeature, nSample) and LTE (nSample,), it will return the scores. It call iterativelycompute()of a PipelineStage of preprocessing stages.
- CrossValidator: Class used to do CrossValidation
setEstimator(): set a PipelinesetNumFolds(): set the number of Foldsfit(): get DTR and LTR, and it creates k-folds randomly from them, then it calls for each foldfit()of the Pipeline. At the end it returns the scores of the DTR.
- MVG
(PipelineStage): Multivariate Gaussian - NaiveBayesMVG
(PipelineStage): Multivariate Gaussian (Diag) - TiedMVG
(PipelineStage): Multivariate Gaussian (Single Cov)setPiT(): set the re-balancing factor
- TiedNaiveBayesMVG
(PipelineStage): Multivariate Gaussian (Diag Single Cov)setPiT(): set the re-balancing factor
- LogisticRegression
(PipelineStage): Logistic RegressionsetLambda(): set the Lambda factor, it is the Regulizer Factor, 0 := Overfitting, 1 := UnderfittingsetPiT(): set the re-balancing factorsetExpanded(): setTrueorFalse, if you want to use the Quadratic or Linear Model
- SVM
(PipelineStage): Support Vector MachinesetK(): set the K factor, usually 1setC(): set the C factor, 0 := Big Margin, 1 := Small MarginsetPiT(): set the re-balancing factorsetPolyKernel(): use a Polynmial Kernel, set c factor and d (degree)setRBFKernel(): use a RBG Kernel, set Gamma factorsetNoKern(): no Kernel
- GMM
(PipelineStage): Gaussian Mixture Model Clustering model used for ClassificationsetDiagonal(): use Diagonal Matrices of GMM densitysetTied(): use same matrice for all components of GMMsetIterationLBG(): set the number of iteration of LBG, the final number of component is a power of 2 of the iterationsetAlpha(): set alpha factor, for LBG algorithm, it's the rescaling factor for the new starting points of the components at each iteration of LBGsetPsi(): set psi factor, used to avoid the problem of generative solutions, it's a limitation on the variation of the covatiance matrices of a GMM component
All the Class here implement the Model Interface, the main method is tranform()
All the pre-processing stage, are PipelineStage, the main method is compute()
- PCA
- LDA
- ZNorm
- L2Norm
- Gaussianization
Lots of usefull tools, used inside the whole project