KL loss function

Hello, I am confused about the KL loss function in the training process. In your paper VRNMT, you have mentioned that the z_j is integrated into the decoder network, did you mean the z_j in decoder derives from the prior network? Which makes me very confused. Look forward to your early reply.