A library and command-line tool for extracting vector layer data from OGC services (WMS, WFS).
Note: This tool only supports vector layers. Raster layers are not supported.
- Supports WMS and WFS: Extracts data from both Web Map Service (WMS) and Web Feature Service (WFS) endpoints.
- Flexible Retrieval Modes: Offers
OFFSET(paged retrieval) andEXTENT(bbox splitting and drilling down by spatial extent) retrieval modes for efficient data extraction, including handling deduplication with the EXTENT mode. - Multiple Retrieval Formats: Supports KML and GeoRSS formats when retrieving data from WMS GetMap operations. Output is always in Geojsonl(GeoJSONSeq)
- Geometry Precision Control: Allows truncating geometry coordinates to a specified decimal point precision.
- State Management: Persists extraction state to allow resuming interrupted downloads.
- Geoserver and QGIS Server Flavor Support: Handles vendor-specific differences for GetFeatureInfo based retrieval from WMS.
- Error Handling: Provides informative error messages and handles common service exceptions.
- Configuration: Customizable through command-line options.
- KML Postprocessing: Offers options to strip superflous points in Polygon/LineString geometry collections and whether to keep original style related props.
- Hole Punching: Includes a utility to remove overlap in polygons by punching holes to deal with shortcomings of GeoRSS based retrieval
- Capabilities Exploration: Can explore services via a GetCapabilities request or by scraping the Geoserver webpage. Partial parsing of incomplete/corrupt capabilitie.xml response is supported
-
Using
pip:pip install wmsdump
-
Using
uv(recommended):wmsdumpusesuvfor package management and dependency resolution.uvis a faster alternative topip.Installing uv - https://docs.astral.sh/uv/getting-started/installation
# Install dependencies using uv uv pip install wmsdumpYou can also use the tools directly by running
uvx --from wmsdump wms-extractor <args>
uv creates a temporary virtualenv and manages your dependencies in this invocation.
For the optional
punch-holesfeature( needed for using the punch-holes utility ), use:uv pip install wmsdump[punch-holes]
or
pip install wmsdump[punch-holes]
For the optional
projfeature( needed for retrieving data in projections other than EPSG:4326 or EPSG:3857 ), use:uv pip install wmsdump[proj]
or
pip install wmsdump[proj]
wmsdump provides a command-line tool wms-extractor with two main commands: explore and extract.
wms-extractor --help--log-level: Log level. One of DEBUG,INFO,WARNING,ERROR,CRITICAL. Defaults to INFO.--no-ssl-verify: switch off ssl verification for all network calls.--request-timeout: timeout for the http requests in seconds. Default is no timeout.--header: Header to be added to all network requests, in the format "Key:Value". Can be used multiple times.
The explore command helps discover available layers and service information.
wms-extractor explore --helpOptions:
--geoserver-url: URL of the GeoServer endpoint. The WMS endpoint is assumed to be<geoserver_url>/ows.--service-url: URL of the WMS/WFS endpoint from which to probe for capabilities. If not provided, it will be derived fromgeoserver-url.--service: Service to use (WMS or WFS). Defaults to WFS.--service-version: The protocol version to use. Defaults to '1.1.1' for WMS and '1.0.0' for WFS.--namespace: Only look for layers in a given namespace (Geoserver specific).--output-file: File to write the layer list to.--scrape-webpage: Scrape the GeoServer web page instead of reading capabilities. Useful when capabilities are broken.
Examples:
# Explore WFS layers from a GeoServer endpoint
wms-extractor explore --geoserver-url http://example.com/geoserver
# Explore WMS layers from a specific URL
wms-extractor explore --service-url http://example.com/wms --service WMS
# Scrape the GeoServer web page for layers
wms-extractor explore --geoserver-url http://example.com/geoserver --scrape-webpage
# Write layer list to a file
wms-extractor explore --geoserver-url http://example.com/geoserver --output-file layers.txtThe extract command extracts data from a specified layer.
wms-extractor extract --helpArguments:
LAYERNAME: Name of the layer to extract.OUTPUT_FILE: Output file to write the GeoJSONl features to. If not provided, a filename will be derived from the LAYERNAME.
Options:
--output-dir: Directory to write output files in (only used whenOUTPUT_FILEis not given). Defaults to the current directory.--geoserver-url: URL of the GeoServer endpoint.service-urlis assumed to be<geoserver_url>/[<layer_namespace>/]ows.--service-url: URL of the WMS/WFS endpoint from which to retrieve data. If not provided, it will be derived fromgeoserver-url.--service: Service to use (WMS or WFS). Defaults to WFS.--service-version: The protocol version to use. Defaults to '1.1.1' for WMS and '1.0.0' for WFS.--retrieval-mode: Which method to use for batch record retrieval (OFFSETorEXTENT). Defaults toOFFSET.--operation: Which operation to use for querying a WMS endpoint (GetMaporGetFeatureInfo). Defaults toGetMap.--flavor: Vendor of the WMS service (GeoserverorQGISserver), useful to specify for GetFeatureInfo based retrieval. Defaults toGeoserver.--sort-key: Key to use for paged retrieval (required when server requires it).--batch-size: Batch size to use for retrieval. Defaults to 1000.--pause-seconds: Amount of time to pause between a batch of requests. Defaults to 2.--requests-to-pause: Number of requests to make before pausing. Defaults to 10.--max-attempts: Number of times to attempt a request before giving up. Defaults to 5.--retry-delay: Number of seconds to wait before retrying on failure (delay is incremented for each failure). Defaults to 5.--geometry-precision: Decimal point precision of geometry to be returned (-1 means no truncation). Defaults to -1.--getmap-format: Format to use while pulling using WMS GetMap (KMLorGEORSS). Defaults toKML.--kml-strip-point: Whether to strip the points in polygons and linestring geomcollections (KML specific). Defaults toTrue.--kml-keep-original-props: Whether to keep the original style-related properties in KML conversion. Defaults toFalse.--out-srs: CRS to request data in. Defaults toEPSG:4326.--bounds: Bounding box to restrict the query to (format:<xmin>,<ymin>,<xmax>,<ymax>).--max-box-dims: When querying using EXTENT mode, the maximum size of the bounding box to use (format:<deltax>,<deltay>).--skip-index: Skip n elements in index (useful to skip records causing failure, only applicable for OFFSET retrieval). Defaults to 0.
Examples:
# Extract data from a WFS layer
wms-extractor extract my_layer output.geojsonl --geoserver-url http://example.com/geoserver
# Extract data from a WMS layer using GetMap with GeoRSS format
wms-extractor extract my_layer output.geojsonl --service WMS --service-url http://example.com/wms --getmap-format GEORSS
# Extract data and truncate geometry to 3 decimal places
wms-extractor extract my_layer output.geojsonl --geoserver-url http://example.com/geoserver --geometry-precision 3
# Extract data with bounding box
wms-extractor extract my_layer output.geojsonl --geoserver-url http://example.com/geoserver --bounds -180,-90,180,90This command removes duplicate features from a GeoJSONL file. Features are considered duplicates if they have identical geometry and properties. Deduplication is performed by hashing features and detecting collisions.
geojsonl-dedupe --helpArguments:
INPUT-FILE: The input GeoJSONl file to deduplicate (required)OUTPUT-FILE: The output GeoJSONl file. If not provided, writes todeduped_<INPUT-FILE>
Options:
--log-level,-l: Set logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL). Defaults to INFO.--use-offset/--use-ram: Use file offset for collision checks (default) or keep features in RAM. Using file offset is more memory-efficient for large files.
Example:
# Deduplicate using file offset method (memory-efficient)
geojsonl-dedupe input.geojsonl output.geojsonl
# Deduplicate keeping features in RAM (faster but uses more memory)
geojsonl-dedupe input.geojsonl output.geojsonl --use-ram
# Auto-generate output filename
geojsonl-dedupe input.geojsonlNote: The EXTENT retrieval mode includes built-in deduplication to handle features that may appear in overlapping spatial extents during extraction. This tool is useful for post-processing or cleaning up data from other sources.
This command is available if installed with the punch-holes extra. It removes overlaps in a GeoJSONl file by punching holes where polygons overlap. This is useful for cleaning up data problems which happen when extracting data using GeoRSS format which cannot represent polygons with holes.
punch-holes --helpArguments:
INPUT_FILE: The input GeoJSONl file to processOUTPUT_FILE: The output GeoJSONl file. If none provided, writes the results tofixed_<INPUT_FILE>
Options:
--index-in-mem: Whether the spatial index keeps the geometry data in memory or just the offset of the features on disk.--keep-map-file: Whether to keep the overlap map temporary file (debugging purposes).
Example:
punch-holes input.geojsonl output.geojsonlwmsdump automatically creates a .state file alongside the output file. This file stores the progress of the extraction. If the extraction is interrupted, wmsdump will resume from the last known state when run again with the same parameters. To start a new extraction, delete both the output file and the .state file.
WMSDUMP_SAVE_RESPONSE_TO_FILE: If set, the raw HTTP response from the OGC service will be saved to the specified file. This is useful for debugging.
bs4(Beautiful Soup 4)clickcolorlogjsonschemakml2geojsonrequestsxmltodict
Optional:
geoindex-rs(required forpunch-holes)numpy(required forpunch-holes)shapely(required forpunch-holes)pyproj(required for handling some CRS definitions)
Contributions are welcome! Please submit bug reports, feature requests, and pull requests through GitHub.
This project is released under UnLicense - see the LICENSE file for details.
This was heavily inspired by a similar tool for ESRI endpoints - openaddresses/pyesridump
Also, that this is possible was pointed out to me by datta07, some of the georss parsing code was also based on prior work by datta07, answerquest and devdattaT.