spress

spress provides utilities for encoding and compressing geospatial objects, such as sf objects.

Installation

This package requires C++11 for compilation. Once you have that installed, you can install the development version from GitHub with devtools or pak:

# pak
pak::pkg_install("UFOKN/spress")

# devtools
devtools::install_github("UFOKN/spress")

Usage

spress offers both encoding and compressing utilities. For clarity, in the context of spress, encoding refers to projecting 2-dimensional coordinates to a 1-dimensional system (particularly, geographic coordinates to a Hilbert Curve index); whereas, compressing refers to the process for reducing the object and file size of the encoded object.

Encoding

At a high level, spress performs the following algorithm for encoding:

Given a set of points P
Retrieve P’s extent/bounding box
Create a 2^n x 2^n grid with centroids aligned to the extent’s corner points
Index the grid to a Hilbert Curve
Assign each point in P to the closest index on the Hilbert Curve

A visualization of this algorithm is shown below:

After encoding, the data necessary to be stored is reduced down to:

The set of points P’s extent;
The n value used;
and, the list of indices.

With this set of data, the decoding process is exactly the reverse of the encoding algorithm.

In the visual example above, an n of 4 was used (since 2^4 = 16, and we have a 16x16 grid), however, note that the maximum possible error E converges toward 0 as the extent becomes smaller, and as n grows. Thus, we can achieve a much more accurate encoding scheme by modifying these when possible.

Realistically, this algorithm is computationally expensive if the extent and n are large. However, this is mitigated via bitwise operations and taking advantage of efficient C++ structures. For the specifics of this, see libspate’s source code in inst/include/spate.hpp.

Encoding Example

# Random set of points
P <- data.frame(
  X = c(-77.8015434499876,
        -77.8707250008617,
        -77.9212227055911,
        -77.9385055000025,
        -77.8452515318622,
        -77.8715886279598),
  Y = c(34.2961615000021,
        34.1818796104794,
        34.1447587911948,
        34.2165775000314,
        34.2218266693184,
        34.2632856737062)
)

# n = 8 --> 2^8 x 2^8 grid --> 256x256 grid
n <- 8L

# Encode using vectors
spress::encode(P$X, P$Y, n = n)
#> > spress encoded coordinates: 6 indices
#> # Grid Dimensions: 256 x 256 (65536)
#> # Extent:          -77.80, -77.94, 34.30, 34.14
#>       Index  Coordinates
#> [1]:  43690  (-77.92, 34.14)
#> [2]:   6827  (-77.87, 34.18)
#> [3]:   3754  (-77.94, 34.22)
#> [4]:  16323  (-77.85, 34.22)
#> [5]:  34040  (-77.80, 34.30)
#> # … with 1 more index

# We can also use the data.frame directly and attach to it
encoded_P <- spress::encode(P, coords = c("X", "Y"), attach = FALSE)

encoded_df <- spress::encode(P, coords = c("X", "Y"), attach = TRUE)
encoded_df
#>           X        Y   h
#> 1 -77.80154 34.29616 170
#> 2 -77.87073 34.18188  37
#> 3 -77.92122 34.14476  14
#> 4 -77.93851 34.21658  63
#> 5 -77.84525 34.22183 132
#> 6 -77.87159 34.26329 111

Decoding Example

We will use encoded_P and encoded_df from the encoding example If we decode an object that used spress::encode to encode, there’s no need to pass extent or n to the decoding function. These values are stored in the attributes spress_extent and spress_n, respectively.

spress::decode(encoded_P)
#>           x        y
#> 1 -77.80154 34.29616
#> 2 -77.87459 34.18513
#> 3 -77.92024 34.14476
#> 4 -77.93851 34.21541
#> 5 -77.84720 34.22551
#> 6 -77.87459 34.26588

# Columns `x` and `y` are the new decoded coordinate columns
# The argument `index` refers to the column index of the encoded index values
spress::decode(encoded_df, index = 3)[, c("x", "y")]
#>           x        y
#> 1 -77.80154 34.29616
#> 2 -77.87459 34.18513
#> 3 -77.92024 34.14476
#> 4 -77.93851 34.21541
#> 5 -77.84720 34.22551
#> 6 -77.87459 34.26588

Compression

Compression is not implement yet as of 11/30/21.

For compression, spress performs the following algorithm:

Given a set of points P
Encode P based on the encoding algorithm
Create an ordering index for order preservation upon decompression
Sort indices by the encoding index, such that each point in P follows along corresponding Hilbert Curve
Keep the first encoding index and get the difference between every successive point

This algorithm returns a set of indices that (generally) should be small integers. The returned table can then be written to any file format, preferably a columnar format (such as Apache Parquet), and additionally compressed using another compression library (such as xz or snappy).

C++ Library

Included in this package is the header-only C++ library libspress, which contains the foundational code used for encoding/decoding, and was developed specifically for this package.

This can be found at inst/include/spress.hpp from the root of this repository. Unit testing for this library is done with Catch via testthat. See testthat::use_catch for details on how this works.

Note that the C++ code for this package contains purely the algorithm for encoding and decoding. The current roadmap for this package includes making the C++ library fully vendorable.

Attribution

This package was inspired by the paper:

Liu, Dongge, Tao Wang, Xiaojuan Li, Yeqing Ni, Yanping Li, and Zhao Jin. 2020. “A Multiresolution Vector Data Compression Algorithm Based on Space Division” ISPRS International Journal of Geo-Information 9, no. 12: 721. https://doi.org/10.3390/ijgi9120721

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
R		R
inst		inst
man		man
src		src
tests		tests
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
cran-comments.md		cran-comments.md
spress.Rproj		spress.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spress

Installation

Usage

Encoding

Encoding Example

Decoding Example

Compression

C++ Library

Attribution

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

spress

Installation

Usage

Encoding

Encoding Example

Decoding Example

Compression

C++ Library

Attribution

About

Resources

License

Licenses found

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages