Skip to content

Latest commit

 

History

History
32 lines (22 loc) · 1.34 KB

File metadata and controls

32 lines (22 loc) · 1.34 KB

Description

PostOCR

PostOCR helps to verify the output and correct misspelled words from PatchTextSpotter using the OpenStreetMap dictionary. PostOCR module finds words' candidates using fuzzy query function from Elasticsearch, which contains the place name attribute from the OpenStreetMap dictionary. Once PostOCR module identifies words' candidates, the module picks one candidate by the word popularity from the dictionary.

Commands

The inputs for this module are geocoordinate converter results in GeoJSON format.

1) Use run.py

Although the map image does not have Geo-coordinate, you can run stand-alone postOCR module.

python3 run.py --expt_name='57k_maps' --module_post_ocr

where

  • --expt_name: experiment name for running the pipeline
  • --module_post_ocr: turns on the postOCR module in this run

2) Use post_ocr_main.py

If you do not have a metadata csv file, you can directly use post_ocr_main.py in m5_post_ocr folder.

Sample command:

python3 post_ocr_main.py --in_geojson_file --out_geojson_dir

where

  • --in_geojson_file: input geojson file for running the script
  • --out_geojson_dir: output directory path to save the processed file