mapkurator-doc/docs/multilingual.md at main · knowledge-computing/mapkurator-doc

Description

Historical maps contain valuable information for unlocking the rich historical and cultural information with multiple languages. The mapKurator system provides multilingual text spotting on scanned historical maps with two state-of-the-art approaches.(TESTR and Spotter-v2)

Supported Languages

English, Russian, Arabic, Chinese, Japanese

Training Datasets and Training Process

Training Datasets and training process are described in here

Inference Commands

1) Use run.py

To run spotting with multilingual models, you can download (1) a config directory and paired (2) pretrained model weight of each language here . Make sure to place the config directory under /spotter-v2/PALEJUN/configs/PALEJUN/and then change a path of MODEL.WEIGHTS in SynthMap_Polygon.yaml correctly following your downloaded model path.

You can call run.py with the following command, which is a same steps of English spotter in here :

python run.py --module_text_spotting 
              --sample_map_csv_path /home/maplord/maplist_csv/luna_omo_metadata_56628_20220724.csv
              --text_spotting_model_dir ./spotter-v2/PALEJUN/
              --expt_name sample_maps 
              --spotter_model spotter-v2
              --spotter_config ./spotter-v2/PALEJUN/configs/PALEJUN/config-ru/SynthMap-ru/SynthMap_Polygon.yaml
              --spotter_expt_name test
              --gpu_id 0

where

--module_text_spotting turns on the spotting module in this run
--sample_map_csv_path stores the metadata of the input map, a sample file can be found here.
--text_spotting_model_dir switches to the model directory
--expt_name is the experiment name for running the pipeline
--spotter_model is the spotter model name, choices=[ "spotter-v2"]
--spotter_config is the configuration file for running the spotting model
--spotter_expt_name is the experiment name for running the spotter
--gpu_id selects a GPU for running the spotter

2) Use inference.py

If you do not have a metadata csv file, or wish to specify the input path of image directly, you can use tools/inference.py in the model folder (i.e., text_spotting_model_dir).

python tools/inference.py --config-file ./spotter-v2/PALEJUN/configs/PALEJUN/config-ru/SynthMap-ru/SynthMap_Polygon.yaml
                          --output_json 
                          --input ./test_images
                          --output ./output

where

--config-file is the configuration file for running the spotting model
--output_json indicates the output file format is JSON
--input is the input image directory
--output is the output file directory
You can set GPU with CUDA_VISIBLE_DEVICES={gpu_id}, default gpu_id=0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Description

Supported Languages

Training Datasets and Training Process

Inference Commands

1) Use run.py

2) Use inference.py

FilesExpand file tree

multilingual.md

Latest commit

History

multilingual.md

File metadata and controls

Description

Supported Languages

Training Datasets and Training Process

Inference Commands

1) Use run.py

2) Use inference.py