For VLLM do we need to use vision_process.py

The model is not following my instructions when using images with a system prompt. 

System prompt tells model that it should only output json and nothing else, but it still outputs a written summary of the image instead of the json