Skip to content

Confusing middle pgsql processing. #2049

@StyXman

Description

@StyXman

What version of osm2pgsql are you using?

osm2pgsql version 1.8.0

What operating system and PostgreSQL/PostGIS version are you using?

Linux Debian Stable, PG Database version: 15.3 (Debian 15.3-0+deb12u1), PostGIS version: 3.3

Tell us something about your system

Small server, 8GiB RAM, 1TB consumer SATA SSD.

What did you do exactly?

osm2pgsql --verbose --database gis --cache 0 --number-processes 4 --slim --flat-nodes $(pwd)/nodes.cache --hstore --multi-geometry --style $osm_carto/openstreetmap-carto.style --tag-transform-script $osm_carto/openstreetmap-carto.lua europe-latest.osm.pbf

What did you expect to happen?

The logs about the middle processing are a little bit confusing:

2023-08-19 20:49:52  [1] Clustering table 'planet_osm_polygon' by geometry...
2023-08-19 20:49:52  [2] Clustering table 'planet_osm_line' by geometry...
2023-08-19 20:49:52  [3] Clustering table 'planet_osm_roads' by geometry...
2023-08-19 20:49:52  [4] Clustering table 'planet_osm_point' by geometry...

2023-08-19 20:49:52  [1] Using native order for clustering table 'planet_osm_polygon'
2023-08-19 20:49:52  [2] Using native order for clustering table 'planet_osm_line'
2023-08-19 20:49:52  [3] Using native order for clustering table 'planet_osm_roads'
2023-08-19 20:49:52  [4] Using native order for clustering table 'planet_osm_point'

2023-08-19 22:35:50  [3] Creating geometry index on table 'planet_osm_roads'...
2023-08-19 22:50:47  [3] Creating osm_id index on table 'planet_osm_roads'...
2023-08-19 22:55:52  [3] Analyzing table 'planet_osm_roads'...
2023-08-19 22:57:47  [3] Done task in 7674389ms.
2023-08-19 22:57:47  [3] Starting task...
2023-08-19 22:57:47  [3] Done task in 1ms.
2023-08-19 22:57:47  [3] Starting task...

2023-08-19 22:57:47  [0] Done postprocessing on table 'planet_osm_nodes' in 0s

2023-08-19 22:57:47  [3] Building index on table 'planet_osm_ways'

2023-08-19 23:32:06  [4] Creating geometry index on table 'planet_osm_point'...
2023-08-20 00:13:30  [4] Creating osm_id index on table 'planet_osm_point'...
2023-08-20 00:20:35  [4] Analyzing table 'planet_osm_point'...
2023-08-20 00:20:40  [4] Done task in 12647156ms.
2023-08-20 00:20:40  [4] Starting task...

2023-08-20 00:20:40  [4] Building index on table 'planet_osm_rels'
2023-08-20 02:03:11  [4] Done task in 6151838ms.

2023-08-20 03:17:24  [2] Creating geometry index on table 'planet_osm_line'...
2023-08-20 03:54:40  [2] Creating osm_id index on table 'planet_osm_line'...
2023-08-20 04:02:57  [2] Analyzing table 'planet_osm_line'...
2023-08-20 04:03:01  [2] Done task in 25988218ms.

2023-08-20 05:26:21  [1] Creating geometry index on table 'planet_osm_polygon'...
2023-08-20 06:17:31  [1] Creating osm_id index on table 'planet_osm_polygon'...
2023-08-20 06:30:46  [1] Analyzing table 'planet_osm_polygon'...
2023-08-20 06:30:47  [1] Done task in 34854542ms.

2023-08-20 10:48:18  [3] Done task in 42630605ms.

2023-08-20 10:48:18  [0] Done postprocessing on table 'planet_osm_ways' in 42630s (11h 50m 30s)
2023-08-20 10:48:18  [0] Done postprocessing on table 'planet_osm_rels' in 6151s (1h 42m 31s)

I separated it in sections by worker. The messages about Starting task and Done task don't mention which task it is. Also, do I understand correctly that not all workers started working at the same time? If I'm right, why is that? If I Isolate worker 3's logs:

2023-08-19 20:49:52  [3] Clustering table 'planet_osm_roads' by geometry...
2023-08-19 20:49:52  [3] Using native order for clustering table 'planet_osm_roads'
2023-08-19 22:35:50  [3] Creating geometry index on table 'planet_osm_roads'...
2023-08-19 22:50:47  [3] Creating osm_id index on table 'planet_osm_roads'...
2023-08-19 22:55:52  [3] Analyzing table 'planet_osm_roads'...
2023-08-19 22:57:47  [3] Done task in 7674389ms.
2023-08-19 22:57:47  [3] Starting task...
2023-08-19 22:57:47  [3] Done task in 1ms.
2023-08-19 22:57:47  [3] Starting task...
2023-08-19 22:57:47  [3] Building index on table 'planet_osm_ways'
2023-08-20 10:48:18  [3] Done task in 42630605ms.
2023-08-20 10:48:18  [0] Done postprocessing on table 'planet_osm_ways' in 42630s (11h 50m 30s)

At 20:49 starts processing roads. ~2h later it's creating 2 indexes and analyzing it. Launches 2 tasks, but finishes one first (22:57), a long one? Which one? And then another, but one seems to still be running.

And suddenly it's creating indexes on ways. This seems to finish the next day and it's properly logged.

I also see the messages about 'Done postprocessing' for nodes (22:57), ways and rels (next day, 10:48), but for the other 4 tables the message is `All postprocessing'.

Could tasks be more verbose about what exactly they're doing? I didn't try --log-level=debug because this is a 49h+ process and I really don't want to start it all over :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions