We would like to understand how using abridged versus full articles causes certain qualitative features in topics to emerge or not.
We would like an interesting result such as: using abbridged API version down weights "Politics" topic in the resulting topic model.
We know this is done when we have proposed a measurement between matched topics.
-Proposed metric: Jenson-Shannon divergence.
-Once we have human-labeled topics, we are interested in which ones maximize or minimize the JS statistic.
We would like to understand how using abridged versus full articles causes certain qualitative features in topics to emerge or not.
We would like an interesting result such as: using abbridged API version down weights "Politics" topic in the resulting topic model.
We know this is done when we have proposed a measurement between matched topics.
-Proposed metric: Jenson-Shannon divergence.
-Once we have human-labeled topics, we are interested in which ones maximize or minimize the JS statistic.