Skip to content

Latest commit

 

History

History
28 lines (27 loc) · 854 Bytes

File metadata and controls

28 lines (27 loc) · 854 Bytes

AimKP Codebase

Augmenting Intra-Modal Understanding in MLLMs for Robust Multimodal Keyphrase Generation

Install

  1. Install Package
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
pip install -e ".[train]"
pip install -r requirements.txt
  1. Training Image data of CMKP dataset can be found in CMKP repo MLLM: LLaVA-v1.5-7b, Vision Encoder: clip-vit-large-patch14-336
# For standard training
bash /path/to/standard_finetune.sh
# For training under AimKP
bash /path/to/scripts/AimKP.sh
  1. Evaluation
python evaluate.py --model-path checkpoint --model-base /path/to/models/llava-v1.5-7b --txt-path "results"

Acknowledgement