Skip to content

Latest commit

 

History

History
15 lines (11 loc) · 628 Bytes

File metadata and controls

15 lines (11 loc) · 628 Bytes

WebClassifier

A Chinese term classifier based on web search results

Files in this repository includes

  1. Scripts to retrieve features of terms from search engine, cleansing the raw feature lists and sampling the data
  2. Dictionary used for Chinese text segmentation
  3. Sample data sets include:
    • Input term sets named in drugList-;
    • Raw term-feature matrix generated from different search engines and term set named in drugFeature-;
    • The exact testing and training sets used for this study named in -TestTrain.

You will need 7-Zip (http://www.7-zip.org/) to decompress the files.