Skip to content

Commit 984a56f

Browse files
author
Anna Mikhaylova
committed
[DOP-27550] update doc files 2
1 parent a9c2ff9 commit 984a56f

67 files changed

Lines changed: 1135 additions & 108 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

mddocs/docs/en/changelog/0.4.0.md

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
# 0.4.0 (2025-10-03) { #0.4.0 }
2+
3+
## Features
4+
5+
- Introduce new [http2kafka][http2kafka] component. ([#281](https://github.com/MobileTeleSystems/data-rentgen/issues/281))
6+
7+
It allows using DataRentgen with OpenLineage HttpTransport.
8+
Authentication is done using personal tokens.
9+
10+
- Add REST API endpoints for managing personal tokens. ({issue}`276`)
11+
12+
List of endpoints:
13+
14+
- `GET /personal-tokens` - get personal tokens for current user.
15+
- `POST /personal-tokens` - create new personal token for current user.
16+
- `PATCH /personal-tokens/:id` - refresh personal token (revoke token and create new one).
17+
- `DELETE /personal-tokens/:id` - revoke personal token.
18+
19+
- Add new entities `Tag` and `TagValue`. ([#268](https://github.com/MobileTeleSystems/data-rentgen/issues/268))
20+
21+
Tags can be used as additional properties for another entities.
22+
This feature is still under construction.
23+
24+
- Added endpoint `GET /v1/tags`. ([#289](https://github.com/MobileTeleSystems/data-rentgen/issues/289))
25+
26+
Tag names and values can be paginated, searched by, or fetched by ids.
27+
28+
### Response example
29+
30+
```json
31+
[
32+
{
33+
"id": 1,
34+
"name": "env",
35+
"values": [
36+
{
37+
"id": 1,
38+
"value": "dev"
39+
},
40+
{
41+
"id": 2,
42+
"value": "prod"
43+
}
44+
]
45+
}
46+
]
47+
48+
```
49+
50+
- Updated `GET /v1/datasets` to include `tags: [...]` in response. ([#289](https://github.com/MobileTeleSystems/data-rentgen/issues/289))
51+
52+
### Dataset response examples
53+
54+
Before:
55+
56+
```python
57+
{
58+
"id": "8400",
59+
"location": {...},
60+
"name": "dataset_name",
61+
"schema": {},
62+
}
63+
```
64+
65+
After:
66+
67+
```python
68+
{
69+
"id": "25896",
70+
"location": {...},
71+
"name": "dataset_name",
72+
"schema": {...},
73+
"tags": [ # <---
74+
{
75+
"id": "1",
76+
"name": "environment",
77+
"values": [
78+
{
79+
"id": "2",
80+
"value": "production"
81+
}
82+
]
83+
},
84+
{
85+
"id": "2",
86+
"name": "team",
87+
"values": [
88+
{
89+
"id": "4",
90+
"value": "my_awesome_team"
91+
}
92+
]
93+
}
94+
]
95+
}
96+
```
97+
98+
- Added new filters to `GET /v1/datasets` endpoint. ([#294](https://github.com/MobileTeleSystems/data-rentgen/issues/294), [#289](https://github.com/MobileTeleSystems/data-rentgen/issues/289))
99+
100+
Query params:
101+
102+
- location_id: `int`
103+
- tag_value_id: `list[int]` - if multiple values are passed, dataset should have all of them.
104+
105+
- Added new filters for `GET /v1/jobs` endpoint. ([#319](https://github.com/MobileTeleSystems/data-rentgen/issues/319))
106+
107+
Query params:
108+
109+
- location_id: `int`
110+
- job_type: `list[str]`
111+
112+
- Added new filters to `GET /v1/runs` endpoint. ([#322](https://github.com/MobileTeleSystems/data-rentgen/issues/322), [#323](https://github.com/MobileTeleSystems/data-rentgen/issues/323))
113+
114+
Query params
115+
116+
- job_type: `list[str]`
117+
- status: `list[RunStatus]`
118+
- started_since: `datetime | None`
119+
- started_until: `datetime | None`
120+
- ended_since: `datetime | None`
121+
- ended_until: `datetime | None`
122+
- job_location_id: `int | None`
123+
- started_by_user: `list[str] | None`
124+
125+
- Added new endpoint `GET /v1/jobs/types`. ([#319](https://github.com/MobileTeleSystems/data-rentgen/issues/319))
126+
127+
- Add custom `dataRentgen_run` and `dataRentgen_operation` facets. ([#265](https://github.com/MobileTeleSystems/data-rentgen/issues/265))
128+
129+
These facets allow to
130+
131+
- Passing custom `external_id`, `persistent_log_url` and other fields of Run.
132+
- Passing custom `name`, `description`, `group`, `positition` fields of Operation.
133+
- mark event as containing only Operation or both Run + Operation data.
134+
135+
- Set `output.type` based on executed SQL query, e.g. `INSERT`, `UPDATE`, `DELETE`, and so on. ({issue}`310`)
136+
137+
## Improvements
138+
139+
- Improve consumer performance by reducing DB load on reading operations. ([#314](https://github.com/MobileTeleSystems/data-rentgen/issues/314))
140+
141+
- Add workaround if OpenLineage emitted Spark application event with `job.name=unknown`. ([#263](https://github.com/MobileTeleSystems/data-rentgen/issues/263))
142+
143+
This requires installing OpenLineage with this fix merged: <https://github.com/OpenLineage/OpenLineage/pull/3848>.
144+
145+
- Dataset symlinks with no inputs/outputs are no longer removed from lineage graph. ([#269](https://github.com/MobileTeleSystems/data-rentgen/issues/269))
146+
147+
- Make matching for addresses and locations more deterministic by converting them to lowercase. ([#313](https://github.com/MobileTeleSystems/data-rentgen/issues/313))
148+
149+
Items `oracle://host:1521` and `ORACLE://HOST:1521` are the same item `oracle://host:1521` now.
150+
151+
- Make matching for datasets, jobs, tags and user names case-insensitive by using unique indexes on `lower(name)` expression. ([#313](https://github.com/MobileTeleSystems/data-rentgen/issues/313))
152+
153+
Item `database.schema.table` and `DATABASE.SCHEMA.TABLE` are the same item now.
154+
155+
As dataset canonical name depends on database naming convention (`UPPERCASE` for Oracle, `lowercase` for Postgres),
156+
we can't convert them into one specific case (upper or lower). Instead we use first received value as canonical one.
157+
158+
## Bug Fixes
159+
160+
- For lineage with `granularity=DATASET` return real lineage graph. ([#264](https://github.com/MobileTeleSystems/data-rentgen/issues/264))
161+
162+
v0.4.x resolved lineage by `run_id`, but this may produce wrong lineage. v0.4.x now resolves lineage by `operation_id`.
163+
164+
- Exclude self-referencing lineage edges in case `granularity=DATASET`. ([#261](https://github.com/MobileTeleSystems/data-rentgen/issues/261))
165+
166+
If some run uses the same table as both input and output (e.g. merging duplicates or performing some checks before writing),
167+
DataRentgen excludes `dataset1 -> dataset1` relations from lineage.
168+
169+
This doesn't affect chains like `dataset1 -> job1 -> dataset1` or `dataset1 -> dataset2 -> dataset1`.

mddocs/docs/en/changelog/0.4.1.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# 0.4.1 (2025-10-08) { #0.4.1 }
2+
3+
## Features
4+
5+
- Add new `GET /v1/locations/types` endpoint returning list of all known location types. ([#328](https://github.com/MobileTeleSystems/data-rentgen/issues/328))
6+
- Add new filter to `GET /v1/jobs` ([#328](https://github.com/MobileTeleSystems/data-rentgen/issues/328)):
7+
- location_type: `list[str]`
8+
- Add new filter to `GET /v1/datasets` ([#328](https://github.com/MobileTeleSystems/data-rentgen/issues/328)):
9+
- location_type: `list[str]`
10+
- Allow passing multiple `location_type` filters to `GET /v1/locations`. ([#328](https://github.com/MobileTeleSystems/data-rentgen/issues/328))
11+
- Allow passing multiple values to `GET` endpoinds with filters like `job_id`, `parent_run_id`, and so on. ([#329](https://github.com/MobileTeleSystems/data-rentgen/issues/329))

mddocs/docs/en/changelog/0.4.2.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# 0.4.2 (2025-10-29) { #0.4.2 }
2+
3+
## Bug fixes
4+
5+
- Fix search query filter on UI Run list page.
6+
- Fix passing multiple filters to `GET /v1/runs`.
7+
8+
## Doc only Changes
9+
10+
- Document `DATA_RENTGEN__UI__AUTH_PROVIDER` config variable.

mddocs/docs/en/changelog/0.4.3.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# 0.4.3 (2025-11-21) { #0.4.3 }
2+
3+
## Features
4+
5+
- Disable `server.session.enabled` by default. It is required only by KeycloakAuthProvider which is not used by default.
6+
7+
## Bug Fixes
8+
9+
- Escape unprintable ASCII symbols in SQL queries before storing them in Postgres. Previously saving queries containing `\x00` symbol lead to exceptions.
10+
- Kafka topic with malformed messages doesn't have to use the same number partitions as input topics.
11+
- Prevent OpenLineage from reporting events which [claim to read 8 Exabytes of data](https://github.com/OpenLineage/OpenLineage/pull/4165), this is actually a Spark quirk.

mddocs/docs/en/changelog/0.4.4.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# 0.4.4 (2025-11-21) { #0.4.4 }
2+
3+
## Bug Fixes
4+
5+
- 0.4.3 release broken inputs with 0 bytes statistics, fixed

mddocs/docs/en/changelog/0.4.5.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# 0.4.5 (2025-12-24) { #0.4.5 }
2+
3+
## Improvements
4+
5+
Allow disabling `SessionMiddleware`, as it only required by `KeycloakAuthProvider`.

mddocs/docs/en/changelog/0.4.6.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# 0.4.6 (2025-01-12) { #0.4.6 }
2+
3+
Dependency-only updates.

mddocs/docs/en/changelog/0.4.7.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# 0.4.7 (2025-01-20) { #0.4.7 }
2+
3+
Dependency-only updates.

mddocs/docs/en/changelog/0.4.8.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# 0.4.8 (2025-01-26) { #0.4.8 }
2+
3+
Fixed issue with updating Location's `external_id` field - server returned response coe 200 but ignored the input value.

0 commit comments

Comments
 (0)