Skip to content

Commit 1b5cd50

Browse files
committed
[BDL-69966] Prepare for release
1 parent f81fd8e commit 1b5cd50

16 files changed

Lines changed: 339 additions & 241 deletions

docs/changelog/0.5.0.rst

Lines changed: 338 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,338 @@
1+
0.5.0 (2026-03-19)
2+
=====================================
3+
4+
OpenLineage-related features
5+
----------------------------
6+
7+
Extracting dataset & job tags
8+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9+
10+
:issue:`367`, :issue:`368`, :issue:`369`, :issue:`372`
11+
12+
Now DataRentgen extracts tags from OpenLineage events:
13+
- dataset tags (currently not reported by any integration)
14+
- job & run tags
15+
16+
Some of tags are created based on engine versions:
17+
- ``airflow.version``
18+
- ``dbt.version``
19+
- ``flink.version``
20+
- ``hive.version``
21+
- ``spark.version``
22+
- ``openlineage_adapter.version``
23+
- ``openlineage_client.version`` (only for Python client v1.38.0 or higher)
24+
25+
Note that passing job & run tags depends on integration. For example, tags can be setup for Spark, Airflow and dbt, but not for Flink or Hive.
26+
Also tags are configured in a different way in each integration.
27+
28+
Extracting ``nominalTime``
29+
~~~~~~~~~~~~~~~~~~~~~~~~~~
30+
31+
:issue:`378`
32+
33+
Now DataRentgen extracts ``nominalTime`` run facet, and stores values in ``run.expected_start_at``, ``run.expected_end_at`` fields.
34+
35+
Extracting ``jobDependencies``
36+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
37+
38+
:issue:`402`
39+
40+
Now DataRentgen extracts information from `jobDependencies <https://openlineage.io/docs/spec/facets/run-facets/job_dependencies/>`_ facet, and store it in ``job_dependency`` table.
41+
For now this is just a simple tuple ``from_dataset_id, to_dataset_id, type`` (arbitrary string provided by integration, not enum).
42+
This can be changed in future versions of Data.Rentgen.
43+
44+
Currently the only integration providing this kind of information is Airflow. But it is implemented only in most recent version of OpenLineage provider for Airflow (`2.10 or higher <https://github.com/apache/airflow/pull/59521>`_).
45+
For now provider also doesn't send facet with information about direct task -> task dependencies - only indirect ones are included (declared via `Asset <https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/assets.html>`_).
46+
So there is a fallback for Airflow which extracts these dependencies from ``downstream_task_ids`` and ``upstream_task_ids`` task fields.
47+
48+
49+
REST API features
50+
-----------------
51+
52+
- Added ``GET /v1/jobs/hierarchy`` API endpoint to retrieve job hierarchy graph (parents, dependencies) for a given job. (:issue:`407`, :issue:`412`)
53+
54+
.. dropdown:: Response example
55+
56+
.. code-block:: python
57+
58+
{
59+
"relations": {
60+
"parents": [
61+
{
62+
"from": {"kind": "JOB", "id": "1"},
63+
"to": {"kind": "JOB", "id": "2"}
64+
}
65+
],
66+
"dependencies": [
67+
{
68+
"from": {"kind": "JOB", "id": "3"},
69+
"to": {"kind": "JOB", "id": "1"},
70+
"type": "DIRECT_DEPENDENCY"
71+
},
72+
{
73+
"from": {"kind": "JOB", "id": "1"},
74+
"to": {"kind": "JOB", "id": "4"},
75+
"type": "DIRECT_DEPENDENCY"
76+
}
77+
]
78+
},
79+
"nodes": {
80+
"jobs": {
81+
"1": {
82+
"id": 1,
83+
"parent_job_id": null,
84+
"name": "my_job",
85+
"type": "SPARK_APPLICATION",
86+
"location": {
87+
"name": "my_cluster",
88+
"type": "YARN"
89+
}
90+
},
91+
"2": {
92+
"id": 2,
93+
"parent_job_id": 1,
94+
"name": "my_job.child_task",
95+
"type": "SPARK_APPLICATION",
96+
"location": {
97+
"name": "my_cluster",
98+
"type": "YARN"
99+
}
100+
},
101+
"3": {
102+
"id": 3,
103+
"parent_job_id": null,
104+
"name": "source_job",
105+
"type": "SPARK_APPLICATION",
106+
"location": {
107+
"name": "my_cluster",
108+
"type": "YARN"
109+
}
110+
},
111+
"4": {
112+
"id": 4,
113+
"parent_job_id": null,
114+
"name": "target_job",
115+
"type": "SPARK_APPLICATION",
116+
"location": {
117+
"name": "my_cluster",
118+
"type": "YARN"
119+
}
120+
}
121+
}
122+
}
123+
}
124+
125+
- Added parent relation between jobs. (:issue:`394`)
126+
127+
Jobs can now reference a parent job via ``parent_job_id`` field.
128+
129+
Before:
130+
131+
.. dropdown:: Response example
132+
133+
.. code-block:: python
134+
135+
{
136+
"meta": { ... },
137+
"items": [
138+
{
139+
"id": "42",
140+
"data": {
141+
"id": "42",
142+
"name": "my-spark-task",
143+
"type": "SPARK_APPLICATION",
144+
"location": { ... }
145+
}
146+
}
147+
]
148+
}
149+
150+
After:
151+
152+
.. dropdown:: Response example
153+
154+
.. code-block:: python
155+
156+
{
157+
"meta": { ... },
158+
"items": [
159+
{
160+
"id": "42",
161+
"data": {
162+
"id": "42",
163+
"name": "my-spark-task",
164+
"type": "SPARK_APPLICATION",
165+
"location": { ... },
166+
"parent_job_id": "10"
167+
}
168+
}
169+
]
170+
}
171+
172+
- Added JOB-JOB and RUN-RUN relations to ``relations.parent`` field of lineage API. (:issue:`392`, :issue:`399`, :issue:`401`)
173+
174+
For example, it is possible to get Airflow DAG → Airflow Task → Spark app chain from a single response.
175+
176+
Before:
177+
178+
.. dropdown:: Response example
179+
180+
.. code-block:: python
181+
182+
{
183+
"relations": {
184+
"parents": [
185+
{"from": {"kind": "JOB", "id": "1"}, "to": {"kind": "RUN", "id": "parent-run-uuid"}},
186+
{"from": {"kind": "JOB", "id": "2"}, "to": {"kind": "RUN", "id": "run-uuid"}}
187+
],
188+
"symlinks": [],
189+
"inputs": [...],
190+
"outputs": [...]
191+
},
192+
"nodes": {...}
193+
}
194+
195+
After:
196+
197+
.. dropdown:: Response example
198+
199+
.. code-block:: python
200+
201+
{
202+
"relations": {
203+
"parents": [
204+
{"from": {"kind": "JOB", "id": "1"}, "to": {"kind": "RUN", "id": "parent-run-uuid"}},
205+
{"from": {"kind": "JOB", "id": "2"}, "to": {"kind": "RUN", "id": "run-uuid"}},
206+
# NEW:
207+
{"from": {"kind": "JOB", "id": "1"}, "to": {"kind": "JOB", "id": "2"}},
208+
{"from": {"kind": "RUN", "id": "parent-run-uuid"}, "to": {"kind": "RUN", "id": "run-uuid"}}
209+
],
210+
"symlinks": [],
211+
"inputs": [...],
212+
"outputs": [...]
213+
},
214+
"nodes": {...}
215+
}
216+
217+
- Include ``job`` to ``GET /v1/runs`` response. (:issue:`411`)
218+
219+
Before:
220+
221+
.. dropdown:: Response example
222+
223+
.. code:: python
224+
225+
{
226+
"meta": {
227+
"page": 1,
228+
"page_size": 20,
229+
"total_count": 1,
230+
"pages_count": 1,
231+
"has_next": False,
232+
"has_previous": False,
233+
"next_page": None,
234+
"previous_page": None,
235+
},
236+
"items": [
237+
{
238+
"id": "01908224-8410-79a2-8de6-a769ad6944c9",
239+
"data": {
240+
"id": "01908224-8410-79a2-8de6-a769ad6944c9",
241+
"created_at": "2024-07-05T09:05:49.584000",
242+
"job_id": "123",
243+
...
244+
},
245+
"statistics": { ... }
246+
}
247+
]
248+
}
249+
250+
After:
251+
252+
.. dropdown:: Response example
253+
254+
.. code:: python
255+
256+
{
257+
"meta": {
258+
"page": 1,
259+
"page_size": 20,
260+
"total_count": 1,
261+
"pages_count": 1,
262+
"has_next": False,
263+
"has_previous": False,
264+
"next_page": None,
265+
"previous_page": None,
266+
},
267+
"items": [
268+
{
269+
"id": "01908224-8410-79a2-8de6-a769ad6944c9",
270+
"data": {
271+
"id": "01908224-8410-79a2-8de6-a769ad6944c9",
272+
"created_at": "2024-07-05T09:05:49.584000",
273+
"job_id": "123",
274+
...
275+
},
276+
"job": {
277+
"id": "123",
278+
"name": "myjob",
279+
...
280+
},
281+
"statistics": { ... }
282+
}
283+
]
284+
}
285+
286+
- Include ``last_run`` field to ``GET /v1/jobs`` endpoint response, showing the most recently started run for each job. (:issue:`387`)
287+
288+
Before:
289+
290+
.. dropdown:: Response example
291+
292+
.. code-block:: python
293+
294+
{
295+
"meta": { ... },
296+
"items": [
297+
{
298+
"id": "42",
299+
"data": {
300+
"id": "42",
301+
"name": "my-spark-task",
302+
"type": "SPARK_APPLICATION",
303+
"location": { ... },
304+
"parent_job_id": "10"
305+
}
306+
}
307+
]
308+
}
309+
310+
After:
311+
312+
.. dropdown:: Response example
313+
314+
.. code-block:: python
315+
316+
{
317+
"meta": { ... },
318+
"items": [
319+
{
320+
"id": "42",
321+
"data": {
322+
"id": "42",
323+
"name": "my-spark-task",
324+
"type": "SPARK_APPLICATION",
325+
"location": { ... },
326+
"parent_job_id": "10"
327+
},
328+
"last_run": {
329+
"id": "01908224-8410-79a2-8de6-a769ad6944c9",
330+
"created_at": "2024-07-05T09:05:49.584000",
331+
"job_id": "123",
332+
...
333+
}
334+
}
335+
]
336+
}
337+
338+
This allows to show last start time, status and duration in UI for job.

docs/changelog/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
:caption: Changelog
44

55
DRAFT
6+
0.5.0
67
0.4.8
78
0.4.7
89
0.4.6

docs/changelog/next_release/367.feature.rst

Lines changed: 0 additions & 1 deletion
This file was deleted.

docs/changelog/next_release/368.feature.rst

Lines changed: 0 additions & 8 deletions
This file was deleted.

docs/changelog/next_release/369.feature.rst

Lines changed: 0 additions & 1 deletion
This file was deleted.

docs/changelog/next_release/372.feature.rst

Lines changed: 0 additions & 1 deletion
This file was deleted.

docs/changelog/next_release/378.feature.rst

Lines changed: 0 additions & 1 deletion
This file was deleted.

docs/changelog/next_release/387.improvement.rst

Lines changed: 0 additions & 1 deletion
This file was deleted.

0 commit comments

Comments
 (0)