|
| 1 | +# 0.4.0 (2025-10-03) { #0.4.0 } |
| 2 | + |
| 3 | +## Features |
| 4 | + |
| 5 | +- Introduce new [http2kafka][http2kafka] component. ([#281](https://github.com/MobileTeleSystems/data-rentgen/issues/281)) |
| 6 | + |
| 7 | + It allows using DataRentgen with OpenLineage HttpTransport. |
| 8 | + Authentication is done using personal tokens. |
| 9 | + |
| 10 | +- Add REST API endpoints for managing personal tokens. ({issue}`276`) |
| 11 | + |
| 12 | +List of endpoints: |
| 13 | + |
| 14 | +- `GET /personal-tokens` - get personal tokens for current user. |
| 15 | +- `POST /personal-tokens` - create new personal token for current user. |
| 16 | +- `PATCH /personal-tokens/:id` - refresh personal token (revoke token and create new one). |
| 17 | +- `DELETE /personal-tokens/:id` - revoke personal token. |
| 18 | + |
| 19 | +- Add new entities `Tag` and `TagValue`. ([#268](https://github.com/MobileTeleSystems/data-rentgen/issues/268)) |
| 20 | + |
| 21 | + Tags can be used as additional properties for another entities. |
| 22 | + This feature is still under construction. |
| 23 | + |
| 24 | +- Added endpoint `GET /v1/tags`. ([#289](https://github.com/MobileTeleSystems/data-rentgen/issues/289)) |
| 25 | + |
| 26 | + Tag names and values can be paginated, searched by, or fetched by ids. |
| 27 | + |
| 28 | +### Response example |
| 29 | + |
| 30 | +```json |
| 31 | +[ |
| 32 | + { |
| 33 | + "id": 1, |
| 34 | + "name": "env", |
| 35 | + "values": [ |
| 36 | + { |
| 37 | + "id": 1, |
| 38 | + "value": "dev" |
| 39 | + }, |
| 40 | + { |
| 41 | + "id": 2, |
| 42 | + "value": "prod" |
| 43 | + } |
| 44 | + ] |
| 45 | + } |
| 46 | +] |
| 47 | + |
| 48 | +``` |
| 49 | + |
| 50 | +- Updated `GET /v1/datasets` to include `tags: [...]` in response. ([#289](https://github.com/MobileTeleSystems/data-rentgen/issues/289)) |
| 51 | + |
| 52 | +### Dataset response examples |
| 53 | + |
| 54 | +Before: |
| 55 | + |
| 56 | +```python |
| 57 | +{ |
| 58 | + "id": "8400", |
| 59 | + "location": {...}, |
| 60 | + "name": "dataset_name", |
| 61 | + "schema": {}, |
| 62 | +} |
| 63 | +``` |
| 64 | + |
| 65 | +After: |
| 66 | + |
| 67 | +```python |
| 68 | +{ |
| 69 | + "id": "25896", |
| 70 | + "location": {...}, |
| 71 | + "name": "dataset_name", |
| 72 | + "schema": {...}, |
| 73 | + "tags": [ # <--- |
| 74 | + { |
| 75 | + "id": "1", |
| 76 | + "name": "environment", |
| 77 | + "values": [ |
| 78 | + { |
| 79 | + "id": "2", |
| 80 | + "value": "production" |
| 81 | + } |
| 82 | + ] |
| 83 | + }, |
| 84 | + { |
| 85 | + "id": "2", |
| 86 | + "name": "team", |
| 87 | + "values": [ |
| 88 | + { |
| 89 | + "id": "4", |
| 90 | + "value": "my_awesome_team" |
| 91 | + } |
| 92 | + ] |
| 93 | + } |
| 94 | + ] |
| 95 | +} |
| 96 | +``` |
| 97 | + |
| 98 | +- Added new filters to `GET /v1/datasets` endpoint. ([#294](https://github.com/MobileTeleSystems/data-rentgen/issues/294), [#289](https://github.com/MobileTeleSystems/data-rentgen/issues/289)) |
| 99 | + |
| 100 | +Query params: |
| 101 | + |
| 102 | +- location_id: `int` |
| 103 | +- tag_value_id: `list[int]` - if multiple values are passed, dataset should have all of them. |
| 104 | + |
| 105 | +- Added new filters for `GET /v1/jobs` endpoint. ([#319](https://github.com/MobileTeleSystems/data-rentgen/issues/319)) |
| 106 | + |
| 107 | +Query params: |
| 108 | + |
| 109 | +- location_id: `int` |
| 110 | +- job_type: `list[str]` |
| 111 | + |
| 112 | +- Added new filters to `GET /v1/runs` endpoint. ([#322](https://github.com/MobileTeleSystems/data-rentgen/issues/322), [#323](https://github.com/MobileTeleSystems/data-rentgen/issues/323)) |
| 113 | + |
| 114 | +Query params |
| 115 | + |
| 116 | +- job_type: `list[str]` |
| 117 | +- status: `list[RunStatus]` |
| 118 | +- started_since: `datetime | None` |
| 119 | +- started_until: `datetime | None` |
| 120 | +- ended_since: `datetime | None` |
| 121 | +- ended_until: `datetime | None` |
| 122 | +- job_location_id: `int | None` |
| 123 | +- started_by_user: `list[str] | None` |
| 124 | + |
| 125 | +- Added new endpoint `GET /v1/jobs/types`. ([#319](https://github.com/MobileTeleSystems/data-rentgen/issues/319)) |
| 126 | + |
| 127 | +- Add custom `dataRentgen_run` and `dataRentgen_operation` facets. ([#265](https://github.com/MobileTeleSystems/data-rentgen/issues/265)) |
| 128 | + |
| 129 | +These facets allow to |
| 130 | + |
| 131 | +- Passing custom `external_id`, `persistent_log_url` and other fields of Run. |
| 132 | +- Passing custom `name`, `description`, `group`, `positition` fields of Operation. |
| 133 | +- mark event as containing only Operation or both Run + Operation data. |
| 134 | + |
| 135 | +- Set `output.type` based on executed SQL query, e.g. `INSERT`, `UPDATE`, `DELETE`, and so on. ({issue}`310`) |
| 136 | + |
| 137 | +## Improvements |
| 138 | + |
| 139 | +- Improve consumer performance by reducing DB load on reading operations. ([#314](https://github.com/MobileTeleSystems/data-rentgen/issues/314)) |
| 140 | + |
| 141 | +- Add workaround if OpenLineage emitted Spark application event with `job.name=unknown`. ([#263](https://github.com/MobileTeleSystems/data-rentgen/issues/263)) |
| 142 | + |
| 143 | + This requires installing OpenLineage with this fix merged: <https://github.com/OpenLineage/OpenLineage/pull/3848>. |
| 144 | + |
| 145 | +- Dataset symlinks with no inputs/outputs are no longer removed from lineage graph. ([#269](https://github.com/MobileTeleSystems/data-rentgen/issues/269)) |
| 146 | + |
| 147 | +- Make matching for addresses and locations more deterministic by converting them to lowercase. ([#313](https://github.com/MobileTeleSystems/data-rentgen/issues/313)) |
| 148 | + |
| 149 | + Items `oracle://host:1521` and `ORACLE://HOST:1521` are the same item `oracle://host:1521` now. |
| 150 | + |
| 151 | +- Make matching for datasets, jobs, tags and user names case-insensitive by using unique indexes on `lower(name)` expression. ([#313](https://github.com/MobileTeleSystems/data-rentgen/issues/313)) |
| 152 | + |
| 153 | + Item `database.schema.table` and `DATABASE.SCHEMA.TABLE` are the same item now. |
| 154 | + |
| 155 | + As dataset canonical name depends on database naming convention (`UPPERCASE` for Oracle, `lowercase` for Postgres), |
| 156 | + we can't convert them into one specific case (upper or lower). Instead we use first received value as canonical one. |
| 157 | + |
| 158 | +## Bug Fixes |
| 159 | + |
| 160 | +- For lineage with `granularity=DATASET` return real lineage graph. ([#264](https://github.com/MobileTeleSystems/data-rentgen/issues/264)) |
| 161 | + |
| 162 | + v0.4.x resolved lineage by `run_id`, but this may produce wrong lineage. v0.4.x now resolves lineage by `operation_id`. |
| 163 | + |
| 164 | +- Exclude self-referencing lineage edges in case `granularity=DATASET`. ([#261](https://github.com/MobileTeleSystems/data-rentgen/issues/261)) |
| 165 | + |
| 166 | + If some run uses the same table as both input and output (e.g. merging duplicates or performing some checks before writing), |
| 167 | + DataRentgen excludes `dataset1 -> dataset1` relations from lineage. |
| 168 | + |
| 169 | + This doesn't affect chains like `dataset1 -> job1 -> dataset1` or `dataset1 -> dataset2 -> dataset1`. |
0 commit comments