Skip to content

smujjiga/tfkld

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Fast and scalable implemtation of Term Frequencey Kullback Leibler Divergence (TFKLD)

This reimplementation is based on https://github.com/jiyfeng/tfkld aimed to drastically speeds up weight calculation.TFKLD was propsed in this 2013 EMNLP paper.

Also available is the test script (fe_quora.py) to extract TFKLD features of the Quora dataset hosted on kaggle as part of a competition.

Steps to run the test script

  • Download the dataset from here.
  • Extract the zip file and place it in the same directory as that of tflkd.py and fe_quora.py
  • Execute fe_quora.py.
  • It should take some time and after it finishes you should have train-tfkld-dr.pkl, dev-tfkld-dr.pkl and test-tfkld-dr.pkl pickle files corresponding to the test, development and test data corrspondingly.
  • TFKLD features are reduced to 200 dimensions using SVD.

About

Fast scalable implementation of Term Frequency Kullback Leibler Divergence

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages