find most important terms, filter out stopwords, stemming etc. If there is no decent NLP toolkit for Java, we can use the Python NLP toolkit in a script job ithat pre-processes the fetched API data once per day