mapkurator-doc/docs/install1.md at main · knowledge-computing/mapkurator-doc

The mapKurator-system requires that cuda_11.3 with cudnn and nvidia-smi is working properly on the underlying host OS. For a successful installation, you may need to use cuda_11.3-devel. You can learn more here.

Note that cuda_11.3 only provided support for Ubuntu 20.04 and below at the time this document was created.

Option 1: Using mapKurator-Recogito docker image

NOTE: The docker image supports upto stitch module.

If you would like to get a quick set up to try out our text spotting feature without Post-OCR and Entity linking modules, please consider using our docker image which is built on nvidia/cuda:11.3.0-devel-ubuntu18.04. For full features of mapKurator, please follow Option 2 for installation.

First pull the docker image with the following command - docker pull knowledgecomputing/mapkurator_recogito_2023:latest Then run the container with -

docker run -it --name YOUR_CONTAINER_NAME --gpus all -v /PATH/TO/INPUT/FOLDER/ON/HOST_MACHINE:/home/mapkurator-test-images/input/ -v /PATH/TO/OUTPUT/FOLDER/ON/HOST_MACHINE:/home/mapkurator-test-images/output/  knowledgecomputing/mapkurator_recogito_2023

Inside the container, run conda activate mapKurator to activate the mapkurator environment.

NOTE:

Remember to change /PATH/TO/INPUT/FOLDER/ON/HOST_MACHINE and /PATH/TO/OUTPUT/FOLDER/ON/HOST_MACHINE in the above command to two actual directory paths on your host machine.
The -v option in the command above gives your docker container access to the folders on host machine. More documentation can be found at this link

Then refer to this "How to Use" guide link. Ensure that you place any test images in the /PATH/TO/INPUT/FOLDER/ON/HOST_MACHINE mentioned above. The docker image comes with two spotting modules which can be found in the /home directory. These are spotter-v2 and spotter_testr.

Option 2: Installing mapKurator on Ubuntu18.04 with cuda_11.3_devel

Setup Anaconda

Setup an anaconda environment by running the following commands.

Download the latest anaconda setup -
wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh You may replace the link above with the latest link from anaconda.
Run the installation file.
bash Anaconda3-2022.10-Linux-x86_64.sh
Create a conda environment to install all software packages required by mapKurator-system.
conda create --name mapKurator -y python=3.8
Activate the environment.
conda activate mapKurator

Clone the mapKurator repository

git clone https://github.com/knowledge-computing/mapkurator-system

Install required libraries

Install all python packages with the commands below.

python -m pip install numpy==1.21.6
python -m pip install opencv-python
python -m pip install pandas==1.4.3
python -m pip install Pillow==9.4.0
pip install Polygon3
python -m pip install shapely==1.8.2
python -m pip install geojson==2.5.0
python3 -m pip install setuptools==59.5.0

conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install scikit-image
pip install matplotlib
pip install numba
pip install jupyterlab

Install gdal by following the instructions here
Install PostgreSQL by following the instructions here. Tested version: 14.7
Install elasticsearch by following the instructions here. Tested version: 7.10.1
Install logstash by following the instructions here. Tested version: 8.7.0
Install Detectron
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html
Install Adelaidet
git clone https://github.com/aim-uofa/AdelaiDet.git
cd AdelaiDet
python setup.py build develop

Please note that the mapKurator has been tested with the versions shown above only. If you tested it on the latest versions and found any issues, please let us know.

Clone the mapKurator-textspotter repository

git clone https://github.com/knowledge-computing/mapkurator-spotter.git
cd /mapkurator-spotter/spotter-v2
python setup.py build develop

Download OpenStreetMap data and create indices for PostOCR and Entity Linker modules

To retrieve OpenStreetMap geo-entities and popularity score (i.e., frequency of geo-entities' names), we utilize Postgres database and Elasticsearch search engine with Logstash. The tested version of each software is mentioned above.

Note that the following procedures are the demonstration of creating indices using OpenStreetMap.

Figure shows an outline of tables on Postgres and indices on Elasticsearch. The details of each component are as follows.

table all_continents : A table of all OpenStreetMap geo-entities' id, names, and the corresponding source tables
schema {each continent} table {points, lines, multilinestrings, multipolygons, other_relations}: A source table of OpenStreetMap geo-entities including names, semantic types, and geometries
index osm: An Elasticsearch index of table all_continents
index osm-voca: An Elasticsearch index that contains a unique vocabulary set of single words from geo-entities' names and their popularity from the index osm
index osm-linker: An Elasticsearch index that contains a unique vocabulary set of single words from geo-entities' names and the list of geo-entities' id with the corresponding source tables

OpenStreetMap data to Postgres database

Download OpenStreetMap geo-entities of each continent in Geofabrik (file format: .osm.pbf)
Create Postgres database and run CREATE EXTENSION postgis;
Upload OpenStreetMap files (.osm.pbf) to Postgres database. Please run the following code after setting up the appropriate environment variables: m6_entity_linker/upload_osm_to_postgres_ogr2ogr.py
Create generic index structure (GIST) of osm_id and wkb_geometry columns for each table. Please run or modify the following code: m6_entity_linker/create_spatial_index_postgres.py
Create all_continents table and insert all OpenStreetMap geo-entities' id, names, and the corresponding source tables. Please run or modify the following code: m6_entity_linker/upload_osm_to_postgres_all_continents.py

Index creation on Elasticsearch

Create osm index on Elasticsearch using all_continents table on Postgres. Please refer the following Logstash configuration file: m6_entity_linker/logstash_postgres_world.conf
Create osm-voca index on Elasticsearch which is used for PostOCR module. Please run or modify m4_post_ocr/preprocess.py and you will find the generated csv file named total.csv. Then, refer the following Logstash configuration file to create osm-voca: m4_post_ocr/logstash_postocr.conf
Create osm-linker index on Elasticsearch which is used for EntityLinker module: Please run or modify m6_entity_linker/create_elasticsearch_index.py and you will find the generated csv file named osm_linker.csv. Then, refer the following Logstash configuration file to create osm-linker: m6_entity_linker/logstash_osm_linker.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option 1: Using mapKurator-Recogito docker image

Option 2: Installing mapKurator on Ubuntu18.04 with cuda_11.3_devel

Setup Anaconda

Clone the mapKurator repository

Install required libraries

Clone the mapKurator-textspotter repository

Download OpenStreetMap data and create indices for PostOCR and Entity Linker modules

OpenStreetMap data to Postgres database

Index creation on Elasticsearch

FilesExpand file tree

install1.md

Latest commit

History

install1.md

File metadata and controls

Option 1: Using mapKurator-Recogito docker image

Option 2: Installing mapKurator on Ubuntu18.04 with cuda_11.3_devel

Setup Anaconda

Clone the mapKurator repository

Install required libraries

Clone the mapKurator-textspotter repository

Download OpenStreetMap data and create indices for PostOCR and Entity Linker modules

OpenStreetMap data to Postgres database

Index creation on Elasticsearch