Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,21 @@ All notable changes to this project will be documented in this file. It uses the
[Semantic Versioning]: https://semver.org/spec/v2.0.0.html
"Semantic Versioning 2.0.0"

## [v0.1.2] — Unreleased
## [v0.2.0] — Unreleased

Fixed memory safety in C++ wrapper.
### ⚡ Improvements

* Added `re2extractallgroupshorizontal`, `re2extractallgroupsvertical`,
`re2regexpquotemeta`, and `re2splitbyregexp` (CH-compatible). Each gets a
`bytea` overload alongside `text`.
* Fixed memory safety in C++ wrapper.

### 📔 Notes

* Run `ALTER EXTENSION re2 UPDATE TO '0.2'` to expose the new functions on
existing databases.

[v0.1.2]: https://github.com/clickhouse/pg_re2/compare/v0.1.1...v0.1.2
[v0.2.0]: https://github.com/clickhouse/pg_re2/compare/v0.1.1...v0.2.0

## [v0.1.1] — 2026-04-16

Expand Down
4 changes: 2 additions & 2 deletions META.json
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
{
"name": "re2",
"abstract": "ClickHouse-compatible regex functions using RE2",
"version": "0.1.2",
"version": "0.2.0",
"maintainer": "Philip Dubé",
"license": "postgresql",
"provides": {
"re2": {
"abstract": "ClickHouse-compatible regex functions using RE2",
"docfile": "doc/re2.md",
"file": "re2.control",
"version": "0.1.2"
"version": "0.2.0"
}
},
"prereqs": {
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ EXTVERSION = $(shell grep -m 1 'default_version' re2.control | \
DISTVERSION = $(shell grep -m 1 '^[[:space:]]\{2\}"version":' META.json | \
sed -e 's/[[:space:]]*"version":[[:space:]]*"\([^"]*\)",\{0,1\}/\1/')

DATA = sql/$(EXTENSION)--$(EXTVERSION).sql
DATA = $(wildcard sql/$(EXTENSION)--*.sql)
MODULE_big = $(EXTENSION)
OBJS = src/pg_re2.o src/re2_cache.o src/re2_wrapper.o

Expand Down
108 changes: 107 additions & 1 deletion doc/re2.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
re2 0.1.2
re2 0.2.0
=========

## Synopsis
Expand Down Expand Up @@ -158,6 +158,112 @@ returns an empty array.

**ClickHouse equivalent: [extractGroups](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#extractGroups)**

### `re2extractallgroupsvertical()` ###

Matches all non-overlapping occurrences of `:pattern` and returns a 2D array
where each inner array contains the capturing groups for one match.

**Syntax**

```sql
SELECT re2extractallgroupsvertical( :haystack, :pattern );
```

**Parameters**

`:haystack`
: Input string to extract from. `TEXT` or `BYTEA`

`:pattern`
: Regular expression with at least one capturing group. `TEXT`

**Returns `text[][]` or `bytea[][]`**

Two-dimensional array of capturing groups, one row per match. If no matches
are found, returns an empty array.

**ClickHouse equivalent: [extractAllGroupsVertical](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#extractAllGroupsVertical)**

### `re2extractallgroupshorizontal()` ###

Matches all non-overlapping occurrences of `:pattern` and returns a 2D array
where each inner array contains all matches for one capturing group.

**Syntax**

```sql
SELECT re2extractallgroupshorizontal( :haystack, :pattern );
```

**Parameters**

`:haystack`
: Input string to extract from. `TEXT` or `BYTEA`

`:pattern`
: Regular expression with at least one capturing group. `TEXT`

**Returns `text[][]` or `bytea[][]`**

Two-dimensional array of matches, one row per capturing group. If no matches
are found, returns an empty array (ClickHouse returns an array of empty
arrays, one per group; PostgreSQL cannot represent that shape, so empty
collapses to a flat empty array).

**ClickHouse equivalent: [extractAllGroupsHorizontal](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#extractAllGroupsHorizontal)**

### `re2regexpquotemeta()` ###

Escapes regex metacharacters with a backslash. Escaped characters: `\0`, `\\`,
`|`, `(`, `)`, `^`, `$`, `.`, `[`, `]`, `?`, `*`, `+`, `{`, `:`, `-`.

**Syntax**

```sql
SELECT re2regexpquotemeta( :input );
```

**Parameters**

`:input`
: String to escape. `TEXT` or `BYTEA`

**Returns `TEXT` or `BYTEA`** matching input type.

**ClickHouse equivalent: [regexpQuoteMeta](https://clickhouse.com/docs/sql-reference/functions/string-functions#regexpquotemeta)**

### `re2splitbyregexp()` ###

Splits `:haystack` into substrings using `:pattern` as a separator. If
`:pattern` is empty, the haystack is split into individual characters. If
`:max_substrings > 0`, returns at most that many substrings (extras are
dropped).

**Syntax**

```sql
SELECT re2splitbyregexp( :haystack, :pattern, :max_substrings DEFAULT 0 );
```

**Parameters**

`:haystack`
: Input string to split. `TEXT` or `BYTEA`

`:pattern`
: Regular expression separator. `TEXT`

`:max_substrings`
: Optional cap on the number of returned substrings. `0` means unlimited.
`INTEGER`

**Returns `text[]` or `bytea[]`** matching haystack type. Note: argument order
is `(haystack, pattern)` to match the pg_re2 convention; ClickHouse uses
`splitByRegexp(pattern, haystack[, max_substrings])`. Zero-length matches are
treated as no-match (matching ClickHouse behavior).

**ClickHouse equivalent: [splitByRegexp](https://clickhouse.com/docs/sql-reference/functions/splitting-merging-functions#splitByRegexp)**

### `re2replaceregexpone()` ###

Replaces the first occurrence of the substring matching the regular expression
Expand Down
2 changes: 1 addition & 1 deletion re2.control
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# re2 extension
comment = 'ClickHouse-compatible regex functions using RE2'
default_version = '0.1'
default_version = '0.2'
module_pathname = 're2'
relocatable = true
trusted = true
33 changes: 33 additions & 0 deletions sql/re2--0.1--0.2.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
\echo Use "ALTER EXTENSION re2 UPDATE TO '0.2'" to load this file. \quit

CREATE FUNCTION re2extractallgroupshorizontal(text, text) RETURNS text[]
AS 'MODULE_PATHNAME', 'pgre2_extractallgroupshorizontal'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2extractallgroupsvertical(text, text) RETURNS text[]
AS 'MODULE_PATHNAME', 'pgre2_extractallgroupsvertical'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2regexpquotemeta(text) RETURNS text
AS 'MODULE_PATHNAME', 'pgre2_regexpquotemeta'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2splitbyregexp(text, text, int DEFAULT 0) RETURNS text[]
AS 'MODULE_PATHNAME', 'pgre2_splitbyregexp'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2extractallgroupshorizontal(bytea, text) RETURNS bytea[]
AS 'MODULE_PATHNAME', 'pgre2_extractallgroupshorizontal_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2extractallgroupsvertical(bytea, text) RETURNS bytea[]
AS 'MODULE_PATHNAME', 'pgre2_extractallgroupsvertical_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2regexpquotemeta(bytea) RETURNS bytea
AS 'MODULE_PATHNAME', 'pgre2_regexpquotemeta_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2splitbyregexp(bytea, text, int DEFAULT 0) RETURNS bytea[]
AS 'MODULE_PATHNAME', 'pgre2_splitbyregexp_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
131 changes: 131 additions & 0 deletions sql/re2--0.2.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
\echo Use "CREATE EXTENSION re2" to load this file. \quit

CREATE FUNCTION re2match(text, text) RETURNS boolean
AS 'MODULE_PATHNAME', 'pgre2_match'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2extract(text, text) RETURNS text
AS 'MODULE_PATHNAME', 'pgre2_extract'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2extractall(text, text) RETURNS text[]
AS 'MODULE_PATHNAME', 'pgre2_extractall'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2regexpextract(text, text, int DEFAULT 1) RETURNS text
AS 'MODULE_PATHNAME', 'pgre2_regexpextract'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2extractgroups(text, text) RETURNS text[]
AS 'MODULE_PATHNAME', 'pgre2_extractgroups'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2extractallgroupshorizontal(text, text) RETURNS text[]
AS 'MODULE_PATHNAME', 'pgre2_extractallgroupshorizontal'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2extractallgroupsvertical(text, text) RETURNS text[]
AS 'MODULE_PATHNAME', 'pgre2_extractallgroupsvertical'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2regexpquotemeta(text) RETURNS text
AS 'MODULE_PATHNAME', 'pgre2_regexpquotemeta'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2splitbyregexp(text, text, int DEFAULT 0) RETURNS text[]
AS 'MODULE_PATHNAME', 'pgre2_splitbyregexp'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2replaceregexpone(text, text, text) RETURNS text
AS 'MODULE_PATHNAME', 'pgre2_replaceregexpone'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2replaceregexpall(text, text, text) RETURNS text
AS 'MODULE_PATHNAME', 'pgre2_replaceregexpall'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2countmatches(text, text) RETURNS integer
AS 'MODULE_PATHNAME', 'pgre2_countmatches'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2countmatchescaseinsensitive(text, text) RETURNS integer
AS 'MODULE_PATHNAME', 'pgre2_countmatchescaseinsensitive'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2multimatchany(text, VARIADIC text[]) RETURNS boolean
AS 'MODULE_PATHNAME', 'pgre2_multimatchany'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2multimatchanyindex(text, VARIADIC text[]) RETURNS integer
AS 'MODULE_PATHNAME', 'pgre2_multimatchanyindex'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2multimatchallindices(text, VARIADIC text[]) RETURNS integer[]
AS 'MODULE_PATHNAME', 'pgre2_multimatchallindices'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

-- bytea overloads (haystack can contain \0 bytes)

CREATE FUNCTION re2match(bytea, text) RETURNS boolean
AS 'MODULE_PATHNAME', 'pgre2_match_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2extract(bytea, text) RETURNS bytea
AS 'MODULE_PATHNAME', 'pgre2_extract_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2extractall(bytea, text) RETURNS bytea[]
AS 'MODULE_PATHNAME', 'pgre2_extractall_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2regexpextract(bytea, text, int DEFAULT 1) RETURNS bytea
AS 'MODULE_PATHNAME', 'pgre2_regexpextract_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2extractgroups(bytea, text) RETURNS bytea[]
AS 'MODULE_PATHNAME', 'pgre2_extractgroups_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2extractallgroupshorizontal(bytea, text) RETURNS bytea[]
AS 'MODULE_PATHNAME', 'pgre2_extractallgroupshorizontal_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2extractallgroupsvertical(bytea, text) RETURNS bytea[]
AS 'MODULE_PATHNAME', 'pgre2_extractallgroupsvertical_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2regexpquotemeta(bytea) RETURNS bytea
AS 'MODULE_PATHNAME', 'pgre2_regexpquotemeta_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2splitbyregexp(bytea, text, int DEFAULT 0) RETURNS bytea[]
AS 'MODULE_PATHNAME', 'pgre2_splitbyregexp_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2replaceregexpone(bytea, text, text) RETURNS bytea
AS 'MODULE_PATHNAME', 'pgre2_replaceregexpone_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2replaceregexpall(bytea, text, text) RETURNS bytea
AS 'MODULE_PATHNAME', 'pgre2_replaceregexpall_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2countmatches(bytea, text) RETURNS integer
AS 'MODULE_PATHNAME', 'pgre2_countmatches_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2countmatchescaseinsensitive(bytea, text) RETURNS integer
AS 'MODULE_PATHNAME', 'pgre2_countmatchescaseinsensitive_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2multimatchany(bytea, VARIADIC text[]) RETURNS boolean
AS 'MODULE_PATHNAME', 'pgre2_multimatchany_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2multimatchanyindex(bytea, VARIADIC text[]) RETURNS integer
AS 'MODULE_PATHNAME', 'pgre2_multimatchanyindex_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION re2multimatchallindices(bytea, VARIADIC text[]) RETURNS integer[]
AS 'MODULE_PATHNAME', 'pgre2_multimatchallindices_bytea'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
Loading