Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition
Tianyi Shang, Zhenyu Li*, Pengjie Xu, Jinwei Qiao, Gang Chen, Zihan Ruan, Weijun Hu
Qilu University of Technology (Shandong Academy of Sciences)
This is the official repository for Text4VPR (paper), also see text4vpr in https://github.com/nuozimiaowu/Text4VPR. in detail. 🔥🔥🔥
We focus on the localization problem from pure text to images, specifically achieving accurate positioning through descriptions of the surrounding environment. Our text4VPR model addresses this issue for the first time by utilizing semantic information from multiple views of the same location. During the training phase, we employ contrastive learning with single image-text pairs, while in the inference phase, we match groups of descriptions and images from the same location to achieve precise localization. We are the first to tackle the localization problem from pure text descriptions to image groups and have introduced a dataset called Street360Loc. This dataset contains 7,000 locations, each with four images from different directions and corresponding rich textual descriptions. On Street360Loc, Text4VPR builds a robust baseline, achieving a top-1 accuracy of 52% and a top-10 accuracy of 92% within a 5-meter radius on the test set. This indicates that localization from textual descriptions to images is not only feasible but also holds significant potential for further advancement.
Text4VPR. Training stage: We employ the T5 model to encode text descriptions. We implement the Sinkhorn algorithm to assign tags to clusters, followed by cluster aggregation to generate image encodings. Finally, we use contrastive learning to draw correctly matched image-text pairs closer in the embedding space. Inference stage: Both images and text are encoded with the same text and image encoders utilized during training. Subsequently, we align text clusters with their paired image clusters at corresponding positions.
Create a conda environment and install basic dependencies:
git clone https://github.com/nuozimiaowu/Text4VPR
cd Text4VPR
conda create -n text4vpr python=3.11.9
conda activate text4vpr
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install xformers==0.0.28.post1
pip install pandas
pip install nltk
pip install transformers=4.44.2
pip install openpyxl
pip install protobuf
pip install tiktoken
pip install sentencepiece
"Download Street360Loc_images.rar from https://drive.google.com/file/d/17QlkGvgAKIYlm6AHi6fjFTD8ev8eNVWB/view?usp=sharing. Extract it into the dataset folder. The structure of the dataset folder will be as follows:"
Dataset construction finished
Our pre-trained model is avaliable:https://drive.google.com/file/d/1ZH-lpOzsKPOMguwKcDcr0OlXUjwK427d/view?usp=sharing
To evaluate the pretrained model, follow these detailed steps:
-
Open the Evaluation Script: Navigate to the
testdirectory and open thetest.pyscript in a text editor. -
Update File Paths: Locate the following lines in
test.pyand replace the placeholder paths with the actual paths to your test dataset and the trained model weights:excel_file_test = r"your path to /dataset/test_description.xlsx" image_root_dir = r"your path to /dataset/Street360Loc_images"For example:
excel_file_test = r"/home/user/Text4VPR/dataset/test_description.xlsx" image_root_dir = r"/home/user/Text4VPR/dataset/Street360Loc_images" -
Update Model Weights Path: Find and update the line specifying the model weight file path to point to the checkpoint saved during training:
model_path = r'your downloaded pretraind model' -
Run the Evaluation Script: After updating the paths, run the evaluation script. In your terminal, ensure you are in the root directory of the Text4VPR repository, and execute:
python test/test.py
To train the Text4VPR model, follow these detailed steps:
-
Open the Training Script: Navigate to the
traindirectory and open thetrain.pyscript in a text editor of your choice. -
Update File Paths: Locate the following lines in
train.pyand replace the placeholder paths with the actual paths to your dataset:excel_file_train = r"your path to /dataset/train_description.xlsx" excel_file_val = r"your path to /dataset/val_description.xlsx" image_root_dir = r"your path to /dataset/Street360Loc_images"For example, if your dataset is located in
/home/user/Text4VPR/dataset, the lines should look like this:excel_file_train = r"/home/user/Text4VPR/dataset/train_description.xlsx" excel_file_val = r"/home/user/Text4VPR/dataset/val_description.xlsx" image_root_dir = r"/home/user/Text4VPR/dataset/Street360Loc_images" -
Run the Training Script: After updating the paths, you can run the training script. Open your terminal, ensure you are in the root directory of the Text4VPR repository, and execute the following command:
python train/train.pyThe training process will begin, and you will see progress updates in the terminal.
-
Model Checkpoint: Once training is complete, the trained model will be saved in the
train/checkpointsdirectory. You can find the model files there for later use.
To evaluate the model, follow these detailed steps:
-
Open the Evaluation Script: Navigate to the
testdirectory and open thetest.pyscript in a text editor. -
Update File Paths: Locate the following lines in
test.pyand replace the placeholder paths with the actual paths to your test dataset and the trained model weights:excel_file_test = r"your path to /dataset/test_description.xlsx" image_root_dir = r"your path to /dataset/Street360Loc_images"For example:
excel_file_test = r"/home/user/Text4VPR/dataset/test_description.xlsx" image_root_dir = r"/home/user/Text4VPR/dataset/Street360Loc_images" -
Update Model Weights Path: Find and update the line specifying the model weight file path to point to the checkpoint saved during training:
model_path = r'your weight file path under train/checkpoints/'For example:
model_path = r"/home/user/Text4VPR/train/checkpoints/your_model_weights.pth" -
Run the Evaluation Script: After updating the paths, run the evaluation script. In your terminal, ensure you are in the root directory of the Text4VPR repository, and execute:
python test/test.pyThe evaluation process will commence, and you will see the evaluation results printed in the terminal.
Following these steps will enable you to successfully train and evaluate the Text4VPR model using your dataset.



