Latent Dirichlet Allocation

Code for running robust and repeatable LDA experiments. Use the -h flag to view CLI parameters for any script.

Files and Folders

lda: Files related to the training and analysis of LDA topic models
dlda: Files related to the training and analysis of dynamic topic models (using gensim's ldaseq implementation)
list_common_words.py: Takes an experiment config file as a command line argument and runs all specified preprocessing before listing the top 50 words in the dataset which will be used in that experiment
- See lda or dlda READMEs for the structure of an experiment JSON file
plot_data_quants.py: Driver function to use a TextParser to make plots of the quantities of data in time frames (especially useful for deciding time intervals for a dynamic topic model)

Install our ogm package and its dependencies.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
dlda		dlda
lda		lda
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
list_common_words.py		list_common_words.py
plot_data_quants.py		plot_data_quants.py