It's past time to give up on these rule based taggers and train some models based on NMT architectures or something.
- Java lib is more than 10 years old. The papers in which these systems have been based OTOH are almost 20 years old
- G2P phone set is not really friendly to anyone, some phonemes are mysterious, so the IMHO whole thing should be based on IPA
- Multitask phone, syllable and syllphones would be nice
- Data from Dicio and Priberam should do. Seed lex is not discarded
- Good exercise on ML fundamentals :)