Geography-Aware Large Language Models for Next POI Recommendation

Install

Clone this repository to your local machine:

git clone https://github.com/liuzh368/GA-LLM.git
cd GA-LLM

Install the environment by running:

conda env create -f environment.yml

Download the model from: https://huggingface.co/Yukang/Llama-2-7b-longlora-32k-ft, Save the downloaded model to the appropriate directory (e.g., models/).

Data Preparation and Processing

Download the raw datasets from datasets.
Copy the contents of the extracted dataset folder to the GA-LLM/data-preprocess/data directory, and then execute the data-preprocess/generate_ca_raw.py script.
Run the data-preprocess/run.py script with the following required argument:
```
python data-preprocess/run.py -f best_conf/ca.yml
```
Run the data-preprocess/traj_qk.py script with the following required argument:
```
python data-preprocess/traj_qk.py -dataset_name ca
```
Run the data-preprocess/process_projector_gps_stage1.py script with the following required arguments:
```
python data-preprocess/process_projector_gps_stage1.py -dataset_name ca -use_sim False
```
Run the data-preprocess/process_llm_gps.py script with the following required arguments:
```
python data-preprocess/process_llm_gps.py -dataset_name ca -use_sim False
```

Main Performance

(Example commands for GCIM execution. Adjust parameters according to your environment)

train

Stage 1: GCIM Module Training

CUDA_VISIBLE_DEVICES=0 nohup torchrun --nproc_per_node=1 --master_port=29501 finetune_GCIM.py  \
--model_name_or_path model_llama2-7b-longlora/ \
--bf16 True \
--lora_r 256 \
--lora_alpha 512 \
--output_dir outputs/ca/stage1_r256_alpha512_GCIM \
--gps_mapping_path /share/home/liuzh368/demo/LLM4POI/datasets/ca/preprocessed/ca_gps_mapping.csv \
--model_max_length 32768 \
--use_flash_attn True \
--data_path /share/home/liuzh368/demo/LLM4POI/datasets/ca/preprocessed/train_qa_pairs_kqt_200items_ca_projector_gps_without_sim_stage1.json \
--low_rank_training True \
--num_train_epochs 3  \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 2  \
--gradient_accumulation_steps 1  \
--evaluation_strategy "no"  \
--save_strategy "steps"  \
--save_steps 1000  \
--save_total_limit 2  \
--learning_rate 2e-5  \
--weight_decay 0.0  \
--warmup_steps 20  \
--lr_scheduler_type "constant_with_warmup"  \
--logging_steps 1  \
--deepspeed "ds_configs/stage2.json"  \
--tf32 True > runs/ca/train_stage1_r256_alpha512_GCIM.txt 2>&1 &

Stage 2: LLM Fine-tuning

CUDA_VISIBLE_DEVICES=0 nohup torchrun --nproc_per_node=1 --master_port=29501 finetune_LLM_with_GCIM.py \
--model_name_or_path /share/home/liuzh368/demo/LLM4POI/model_llama2-7b-longlora \
--bf16 True \
--output_dir outputs/ca/llm_GCIM_r256_alpha512 \
--model_max_length 32768 \
--use_flash_attn True \
--data_path /share/home/liuzh368/demo/LLM4POI/datasets/ca/preprocessed/train_qa_pairs_kqt_200items_ca_llm_without_sim_gps.json \
--gps_mapping_path /share/home/liuzh368/demo/LLM4POI/datasets/ca/preprocessed/ca_gps_mapping.csv \
--geoencoder_path /share/home/liuzh368/demo/LLM4POI/outputs/ca/stage1_r256_alpha512_GCIM/109122/geo_encoder_merged.pth \
--low_rank_training True \
--num_train_epochs 3 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 2 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 1000 \
--save_total_limit 2 \
--learning_rate 2e-5 \
--weight_decay 0.0 \
--warmup_steps 20 \
--lr_scheduler_type "constant_with_warmup" \
--logging_steps 1 \
--deepspeed "ds_configs/stage2.json" \
--tf32 True > runs/ca/train_llm_GCIM_r256_alpha512.txt 2>&1 &

test

Checkpoint Path：outputs/ca/llm_GCIM_r256_alpha512/checkpoint-109122

Convert Checkpoint：

python outputs/ca/llm_GCIM_r256_alpha512/checkpoint-109122/zero_to_fp32.py outputs/ca/llm_GCIM_r256_alpha512/checkpoint-109122 outputs/ca/llm_GCIM_r256_alpha512/checkpoint-109122/pytorch_model.bin

Run Evaluation：

CUDA_VISIBLE_DEVICES=0 nohup python test_next_gps.py \
--batch_size 8 \
--base_model /share/home/liuzh368/demo/LLM4POI/model_llama2-7b-longlora \
--output_dir /share/home/liuzh368/demo/LLM4POI/outputs/ca/llm_GCIM_r256_alpha512/checkpoint-109122 \
--cache_dir ./cache \
--seq_len 32768 \
--context_size 32768 \
--flash_attn True \
--model_path /share/home/liuzh368/demo/LLM4POI/model_llama2-7b-longlora \
--data_path /share/home/liuzh368/demo/LLM4POI/datasets \
--dataset_name ca \
--test_file test_qa_pairs_kqt_200items_ca_llm_without_sim_gps.txt \
--test_type llm \
--gps_mapping_path /share/home/liuzh368/demo/LLM4POI/datasets/ca/preprocessed/ca_gps_mapping.csv \
--geoencoder_path /share/home/liuzh368/demo/LLM4POI/outputs/ca/stage1_r256_alpha512_GCIM/109122/geo_encoder_merged.pth \
--use_random_projector False > runs/nyc/eval_llm_gps_r256_alpha512.txt 2>&1 &

Acknowledgement

This code is developed based on LLM4POI and LongLoRA.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data-preprocess		data-preprocess
datasets/ca/preprocessed		datasets/ca/preprocessed
geo_model		geo_model
gps_process		gps_process
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
eval_base_model.py		eval_base_model.py
eval_new.py		eval_new.py
eval_next_poi.py		eval_next_poi.py
eval_next_poi_llama3.1.py		eval_next_poi_llama3.1.py
eval_projector_ca_tky.py		eval_projector_ca_tky.py
eval_projector_llm_ca_tky.py		eval_projector_llm_ca_tky.py
eval_projector_llm_nyc.py		eval_projector_llm_nyc.py
eval_projector_nyc.py		eval_projector_nyc.py
finetune_FusionGeoEncoder.py		finetune_FusionGeoEncoder.py
finetune_GCIM.py		finetune_GCIM.py
finetune_LLM.py		finetune_LLM.py
finetune_LLM_2layer.py		finetune_LLM_2layer.py
finetune_LLM_2layer_poi_gps.py		finetune_LLM_2layer_poi_gps.py
finetune_LLM_with_FusionGeoEncoder.py		finetune_LLM_with_FusionGeoEncoder.py
finetune_LLM_with_GCIM.py		finetune_LLM_with_GCIM.py
finetune_LLM_with_both.py		finetune_LLM_with_both.py
finetune_LLM_with_both2.py		finetune_LLM_with_both2.py
finetune_LLM_with_both_fusion.py		finetune_LLM_with_both_fusion.py
finetune_LLM_with_both_old.py		finetune_LLM_with_both_old.py
finetune_pre_LLM.py		finetune_pre_LLM.py
finetune_pre_both.py		finetune_pre_both.py
finetune_pre_both_incremental.py		finetune_pre_both_incremental.py
finetune_pre_both_incremental_fusion.py		finetune_pre_both_incremental_fusion.py
finetune_projector.py		finetune_projector.py
finetune_projector_2layer.py		finetune_projector_2layer.py
finetune_projector_2layer_incremental.py		finetune_projector_2layer_incremental.py
finetune_projector_2layer_incremental_stage2.py		finetune_projector_2layer_incremental_stage2.py
finetune_projector_2layer_poi_gps.py		finetune_projector_2layer_poi_gps.py
finetune_projector_2layer_stage2.py		finetune_projector_2layer_stage2.py
finetune_projector_incremental.py		finetune_projector_incremental.py
finetune_projector_stage2.py		finetune_projector_stage2.py
finetune_projector_v00.py		finetune_projector_v00.py
finetuning_v0.py		finetuning_v0.py
finetuning_v1.py		finetuning_v1.py
finetuning_v2.py		finetuning_v2.py
finetuning_v3.py		finetuning_v3.py
finetuning_v4.py		finetuning_v4.py
finetuning_v6_poi_user.py		finetuning_v6_poi_user.py
fintuning_v7_poi_user_cat.py		fintuning_v7_poi_user_cat.py
fintuning_v8_user.py		fintuning_v8_user.py
get_trainable_weights.py		get_trainable_weights.py
get_trainable_weights_safetensors.py		get_trainable_weights_safetensors.py
llama_attn_replace.py		llama_attn_replace.py
llama_attn_replace_sft.py		llama_attn_replace_sft.py
supervised-fine-tune-llama3.1-qlora.py		supervised-fine-tune-llama3.1-qlora.py
supervised-fine-tune-qlora.py		supervised-fine-tune-qlora.py
test_new_fusion_gps.py		test_new_fusion_gps.py
test_next_both.py		test_next_both.py
test_next_both_fusion.py		test_next_both_fusion.py
test_next_gps.py		test_next_gps.py
test_next_poi.py		test_next_poi.py
test_next_poi_2layer.py		test_next_poi_2layer.py
test_next_poi_GeoEncoder.py		test_next_poi_GeoEncoder.py
traj_sim.py		traj_sim.py
zero_to_fp32.py		zero_to_fp32.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Geography-Aware Large Language Models for Next POI Recommendation

Install

Data Preparation and Processing

Main Performance

train

test

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Geography-Aware Large Language Models for Next POI Recommendation

Install

Data Preparation and Processing

Main Performance

train

test

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages