Skip to content

Latest commit

 

History

History
57 lines (29 loc) · 1.03 KB

File metadata and controls

57 lines (29 loc) · 1.03 KB

SortingHatLib

Library for ML feature type inference: https://github.com/pvn25/MLDataPrepZoo/tree/master/MLFeatureTypeInference

Due to git-lfs limits, the resources files are moved to: https://drive.google.com/drive/folders/1eC8F5pO2hSoQf4RQM7zww49y2ZbLIvqG

By default, these resources will be auto downloaded the first time you run the program. If for some reason, this does not work you can try manual download.

  1. Install the package using python-pip
git clone https://github.com/pvn25/SortingHatLib.git

pip install SortingHatLib/
  1. Import the library using
import sortinghat.pylib as pl
  1. Read in csv file using pandas
dataDownstream = pd.read_csv('adult.csv')
  1. Perform base featurization of the raw CSV file:
dataFeaturized = pl.FeaturizeFile(dataDownstream)
  1. bigram feature extraction for Random Forest:
dataFeaturized1 = pl.FeatureExtraction(dataFeaturized)
  1. Finally, load the model for prediction
y_RF = pl.Load_RF(dataFeaturized1)