Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 97 additions & 1 deletion docs/impulse/docs/config/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ Maps the silver-layer input tables.
| `container_tags_table` | `str` | No | Full Unity Catalog path. Container EAV tags. |
| `channel_tags_table` | `str` | No | Full Unity Catalog path. Channel EAV tags. |
| `channel_mapping_table` | `str` | No | Full Unity Catalog path. Logical-to-physical channel alias table. Required when using `QueryBuilder.channel_with_alias()` (currently supported by `KeyValueStoreSolver`). |
| `unit_conversion_table` | `str` | No | Full Unity Catalog path. Per-unit-family conversion factors. When configured together with a `channel_mapping_table` whose rows carry `source_unit` / `target_unit` columns, aliased selectors auto-convert values from source to target unit during `solve()` (currently supported by `KeyValueStoreSolver`). |

Tag tables are required for solvers that consume tag-based filters
(`DeltaSolver` with tag filters, `KeyValueStoreSolver`).
Expand Down Expand Up @@ -172,8 +173,9 @@ Per-table sections (each a `TableConfig`):
| `container_metrics`| All solvers | Custom container_id column, custom timestamp columns |
| `channel_tags` | DeltaSolver | Tag key/value column renames |
| `channel_metrics` | All solvers | Custom channel_id column, custom value/timestamp columns |
| `channel_mapping` | KeyValueStoreSolver | Alias-table column renames; `priority` column |
| `channel_mapping` | KeyValueStoreSolver | Alias-table column renames; `priority` column; optional `join_keys` for non-default alias-resolution composite keys |
| `channels` | All solvers | RLE column renames (`tstart`/`tend`/`value`) |
| `unit_conversion` | KeyValueStoreSolver | Unit-conversion table column renames (`unit`, `group_id`, `conversion_factor`) |

Internal column names that mappings can target:

Expand All @@ -187,6 +189,14 @@ Internal column names that mappings can target:
| `priority` | Tie-breaker column on the `channel_mapping` table |
| `project_id` | Project scoping column |
| `parent_id` | Parent/scope identifier |
| `source_channel`| Source-channel identifier on the `channel_mapping` table |
| `data_key` | Data-key identifier (default present on both `channel_mapping` and `channel_metrics`) |
| `channel_alias` | Alias identifier on the `channel_mapping` table |
| `channel_name` | Channel-name identifier on the `channel_metrics` table |
| `source_unit`, `target_unit` | Source/target unit columns on the `channel_mapping` table |
| `unit` | Unit name column on the `unit_conversion` table |
| `group_id` | Unit-family identifier on the `unit_conversion` table |
| `conversion_factor` | Per-unit factor on `unit_conversion`; also the per-channel factor name carried into the solve UDF |

:::note Per-solver feature support

Expand Down Expand Up @@ -235,6 +245,92 @@ However, only the parts each solver supports are actually consumed:

Sections you don't customize can be omitted; defaults are an empty mapping and no filters.

### Unit conversion (optional)

Set `source.unit_conversion_table` and extend `channel_mapping` with `source_unit` / `target_unit` columns
to have aliased selectors auto-convert values from source to target unit during `solve()`. Direct selectors
via `query.channel(...)` always return raw values, even on a channel that an aliased sibling converts —
conversion is a property of the alias, not of the channel. See
[`unit_conversion`](../data_model/silver_layer_schema.md#unit_conversion-optional) for the table schema.

```python
"source": {
"container_metrics_table": "my_catalog.silver.container_metrics",
"channel_metrics_table": "my_catalog.silver.channel_metrics",
"channels_uri": "my_catalog.silver.channels",
"channel_mapping_table": "my_catalog.silver.channel_mapping",
"unit_conversion_table": "my_catalog.silver.unit_conversion"
},
"query_engine": {
"solver": "KeyValueStoreSolver",
"solver_config": {
"unit_conversion": {
"column_name_mapping": {}
}
}
}
```

### Alias-resolution join keys (optional)

`KeyValueStoreSolver.filter_aliased_channel_metrics` joins `channel_mapping`
to `channel_metrics` to resolve aliased selectors. The default composite key
is `(source_channel, channel_name) + (data_key, data_key)`. Override
`channel_mapping.join_keys` to change the arity or column choice — for
example, a single-column join when `data_key` is not part of the channel
identity in your silver layout:

```python
"solver_config": {
"channel_mapping": {
"join_keys": [
{"mapping_col": "source_channel", "metrics_col": "channel_name"}
]
}
}
```

Each `mapping_col` / `metrics_col` is an **internal** name (the name as the
solver sees the column **after** `column_name_mapping` has been applied on
the respective table). The two sides of a pair are independent, so the same
column can carry different names on the two tables. For instance, a layout
where the data-key column has different physical names on the two tables
has two equivalent paths:

```python
# Path 1 — rename both physical columns to the same internal name; the
# default join_keys then works unchanged.
"solver_config": {
"channel_mapping": {
"column_name_mapping": {"mapping_data_key": "data_key"}
},
"channel_metrics": {
"column_name_mapping": {"metrics_data_key": "data_key"}
}
}

# Path 2 — leave the physical names as-is and reference them directly.
"solver_config": {
"channel_mapping": {
"join_keys": [
{"mapping_col": "source_channel", "metrics_col": "channel_name"},
{"mapping_col": "mapping_data_key", "metrics_col": "metrics_data_key"}
]
}
}
```

`query.channel(...)` and `query.channel_with_alias(...)` kwargs are column
references against the **post-`column_name_mapping`** schema. If you
override `join_keys` (or skip renames) so that the solver sees a column
under a non-default name, the same name must be used as the kwarg. Example:
if `join_keys` references `metrics_col: "my_chan_name"` and the column is
not renamed via `column_name_mapping`, call
`query.channel(my_chan_name=...)`. The internal-name properties on
`SolverConfig` exist primarily to remove magic strings from the solver
code; the user-facing contract is "kwarg name == column name as the solver
sees it".

### When to use what

- **`solver_config.<table>.column_name_mapping`** — your silver-layer column is named differently from
Expand Down
25 changes: 25 additions & 0 deletions docs/impulse/docs/data_model/silver_layer_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,32 @@ channel name to one or more physical channels keyed by `project_id` /
| `channel_name` | `string` | No | Logical channel name to match against `channel_with_alias` selectors. |
| `data_key` | `string` | No | Physical lookup key joined to `channel_metrics`. |
| `priority` | `int` | Yes | Tie-breaker when multiple physical channels match a logical name. |
| `source_unit` | `string` | Yes | Unit of the raw channel data. When non-null together with `target_unit` and a configured `unit_conversion_table`, the solver converts values from source to target unit on aliased reads. |
| `target_unit` | `string` | Yes | Target unit for aliased reads of this mapping. |

Configured via `source.channel_mapping_table` (see
[Configuration](../config/configuration.md)). Joins to `channel_metrics`
on `(project_id, data_key, channel_name)`.

---

## unit_conversion (optional)

Per-unit-family conversion factors. Read by `KeyValueStoreSolver` at
solve time when `source.unit_conversion_table` is configured and the
`channel_mapping` table carries `source_unit` / `target_unit` columns.

| Column | Type | Nullable | Description |
|---------------------|----------|----------|------------------------------------------------------------------------------------------------------------|
| `group_id` | `string` | No | Unit family identifier (e.g. `speed`, `rotation`). Only units within the same family can convert into each other. |
| `unit` | `string` | No | Unit name. Matches the `source_unit` / `target_unit` values on `channel_mapping`. |
| `conversion_factor` | `double` | No | Multiplier that converts a value in this unit to the family's base unit. The base unit has factor `1.0`. |

For each aliased channel the solver looks up `source_factor` (the row
whose `unit` matches `source_unit`) and `target_factor` (the row whose
`unit` matches `target_unit`, constrained to the same `group_id`) and
multiplies values by `source_factor / target_factor`. Missing rows or a
`group_id` mismatch yield a null factor and no conversion.

Configured via `source.unit_conversion_table` (see
[Configuration](../config/configuration.md)).
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,13 @@ def resolve_channel_selections(spark, channel_metrics_df,

Union direct and aliased channel metrics, combining selector_ids.

When the aliased side carries ``source_unit`` / ``target_unit``
columns (added by :meth:`filter_aliased_channel_metrics` when a
unit conversion table is configured), those columns are preserved
through the union and aggregation. Direct selectors produce null
unit columns, which causes the downstream conversion-factor join
in :meth:`solve` to leave their values unchanged.

**Arguments**:

- `spark` (`SparkSession`): Spark session used for query execution.
Expand All @@ -179,7 +186,9 @@ Union direct and aliased channel metrics, combining selector_ids.

**Returns**:

`pyspark.sql.DataFrame`: Merged DataFrame with ``(container_id, channel_id, selector_ids)``.
`pyspark.sql.DataFrame`: Merged DataFrame with ``(container_id, channel_id, selector_ids)``
(plus ``source_unit`` / ``target_unit`` when present on the
aliased side).

#### solve

Expand All @@ -189,6 +198,13 @@ def solve(query, channels_df, selections, dtypes) -> DataFrame

Solve the query by grouping channels and applying selections.

When a ``unit_conversion_table`` is configured on the database and
*channels_df* carries ``source_unit`` / ``target_unit`` columns
(added upstream by :meth:`filter_aliased_channel_metrics`),
per-channel conversion factors are computed and propagated into
the grouped-map UDF so that time-series values are converted from
the source to the target unit on the fly.

**Arguments**:

- `query` (`QueryBuilder`): Query object containing database and filter information.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,47 @@ names used by the solver. An empty dict means no renaming
Keys are internal column names; values are the literal values
to match.

## JoinKey

```python
class JoinKey(BaseModel)
```

A single column pair in the ``channel_mapping`` → ``channel_metrics`` join.

Used by :class:`ChannelMappingConfig.join_keys` to override the default
alias-resolution composite key.

Both fields reference column names **after** ``column_name_mapping`` has
been applied on the respective table; the two sides are independent, so
a column may appear under different names on the two tables.

**Arguments**:

- `mapping_col` (`str`): Column name on ``channel_mapping`` after its ``column_name_mapping``
has been applied.
- `metrics_col` (`str`): Column name on ``channel_metrics`` after its ``column_name_mapping``
has been applied.

## ChannelMappingConfig

```python
class ChannelMappingConfig(TableConfig)
```

``TableConfig`` plus an optional alias-resolution join-key spec.

**Arguments**:

- `join_keys` (`list[JoinKey] or None`): Custom composite key for the ``channel_mapping`` → ``channel_metrics``
join performed by ``KeyValueStoreSolver.filter_aliased_channel_metrics``.
When ``None`` (the default), the solver uses the backward-compatible
pair ``[(source_channel, channel_name), (data_key, data_key)]``
sourced from :class:`SolverConfig` internal-name properties.
Provide a custom list to change the join arity or column choice
(e.g. a single-column join when ``data_key`` is not part of the
channel identity in your silver layout).

## SolverConfig

```python
Expand All @@ -58,8 +99,10 @@ so that solver code can always reference the same constants.
- `container_metrics` (`TableConfig`): Column mappings and filters for the container metrics table.
- `channel_tags` (`TableConfig`): Column mappings and filters for the channel tags table.
- `channel_metrics` (`TableConfig`): Column mappings and filters for the channel metrics table.
- `channel_mapping` (`TableConfig`): Column mappings and filters for the channel mapping (alias) table.
- `channel_mapping` (`ChannelMappingConfig`): Column mappings, filters, and the alias-resolution ``join_keys``
override for the channel mapping (alias) table.
- `channels` (`TableConfig`): Column mappings and filters for the channel data table.
- `unit_conversion` (`TableConfig`): Column mappings and filters for the unit conversion table.

#### from\_json

Expand Down Expand Up @@ -176,6 +219,50 @@ def alias_priority_col() -> str
Internal column name for the alias priority on the channel_mapping table.


#### source\_channel\_col

```python
def source_channel_col() -> str
```

Internal column name for the source-channel identifier on the channel_mapping table.


#### data\_key\_col

```python
def data_key_col() -> str
```

Internal column name for the data-key identifier.

Default present on both ``channel_mapping`` and ``channel_metrics``;
used by the default :meth:`effective_alias_join_keys` for both sides.
Layouts where the two tables carry the data-key column under different
physical names can either rename both to ``"data_key"`` via per-table
``column_name_mapping`` or override


#### channel\_alias\_col

```python
def channel_alias_col() -> str
```

Internal column name for the alias identifier on the channel_mapping table.

Referenced by the dedup window in


#### channel\_name\_col

```python
def channel_name_col() -> str
```

Internal column name for the channel-name identifier on the channel_metrics table.


#### project\_id\_col

```python
Expand All @@ -194,6 +281,72 @@ def parent_id_col() -> str
Internal column name for the parent/scope identifier.


#### conversion\_factor\_col

```python
def conversion_factor_col() -> str
```

Internal column name for the conversion factor on the unit_conversion table.

Also used as the column that carries the per-channel combined factor
downstream from :meth:`KeyValueStoreSolver._compute_conversion_factors`
into the grouped-map UDF.


#### source\_unit\_col

```python
def source_unit_col() -> str
```

Internal column name for the source unit on the channel_mapping table.


#### target\_unit\_col

```python
def target_unit_col() -> str
```

Internal column name for the target unit on the channel_mapping table.


#### unit\_col

```python
def unit_col() -> str
```

Internal column name for the unit name on the unit_conversion table.


#### group\_id\_col

```python
def group_id_col() -> str
```

Internal column name for the unit group id on the unit_conversion table.


#### effective\_alias\_join\_keys

```python
def effective_alias_join_keys() -> list[tuple[str, str]]
```

Return the resolved alias-resolution join keys as ``(mapping_col, metrics_col)`` tuples.

Falls back to the default composite key
``[(source_channel_col, channel_name_col), (data_key_col, data_key_col)]``
when :attr:`ChannelMappingConfig.join_keys` is ``None``. Otherwise
returns the configured list.

Both members of each tuple are column names **after**
``column_name_mapping`` has been applied on the respective table.


#### col\_map

```python
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,10 @@ configured) regardless of whether ``container_tags_table`` is set.
- `channels_uri` (`str`): Full Unity Catalog path to the channels data table.
- `channel_mapping_table` (`str`): Full Unity Catalog path to the channel mapping table. Required when using
``channel_with_alias()`` for logical alias resolution.
- `unit_conversion_table` (`str`): Full Unity Catalog path to the unit conversion table. When set together
with a ``channel_mapping_table`` whose rows carry ``source_unit`` and
``target_unit`` columns, the query engine converts time-series values
from the source to the target unit during ``solve()``.

## UnitySink

Expand Down
Loading
Loading