Historical maps contain valuable information for unlocking the rich historical and cultural information with multiple languages. The mapKurator system provides multilingual text spotting on scanned historical maps with two state-of-the-art approaches.(TESTR and Spotter-v2)
English, Russian, Arabic, Chinese, Japanese
- Training Datasets and training process are described in here
To run spotting with multilingual models, you can download (1) a config directory and paired (2) pretrained model weight of each language here . Make sure to place the config directory under /spotter-v2/PALEJUN/configs/PALEJUN/and then change a path of MODEL.WEIGHTS in SynthMap_Polygon.yaml correctly following your downloaded model path.
You can call run.py with the following command, which is a same steps of English spotter in here :
python run.py --module_text_spotting
--sample_map_csv_path /home/maplord/maplist_csv/luna_omo_metadata_56628_20220724.csv
--text_spotting_model_dir ./spotter-v2/PALEJUN/
--expt_name sample_maps
--spotter_model spotter-v2
--spotter_config ./spotter-v2/PALEJUN/configs/PALEJUN/config-ru/SynthMap-ru/SynthMap_Polygon.yaml
--spotter_expt_name test
--gpu_id 0
where
--module_text_spottingturns on the spotting module in this run--sample_map_csv_pathstores the metadata of the input map, a sample file can be found here.--text_spotting_model_dirswitches to the model directory--expt_nameis the experiment name for running the pipeline--spotter_modelis the spotter model name, choices=[ "spotter-v2"]--spotter_configis the configuration file for running the spotting model--spotter_expt_nameis the experiment name for running the spotter--gpu_idselects a GPU for running the spotter
If you do not have a metadata csv file, or wish to specify the input path of image directly, you can use tools/inference.py in the model folder (i.e., text_spotting_model_dir).
python tools/inference.py --config-file ./spotter-v2/PALEJUN/configs/PALEJUN/config-ru/SynthMap-ru/SynthMap_Polygon.yaml
--output_json
--input ./test_images
--output ./output
where
--config-fileis the configuration file for running the spotting model--output_jsonindicates the output file format is JSON--inputis the input image directory--outputis the output file directory- You can set GPU with
CUDA_VISIBLE_DEVICES={gpu_id}, default gpu_id=0