Skip to content

Conversation

@e-strauss
Copy link
Contributor

@e-strauss e-strauss commented Nov 24, 2025

This patch extends the previously added data transfer with unix pipes to SystemDS frame transfer capabilities. Additionally, the matrix transfer was further improved by fusing the non-zero value computation into data reading and reducing unnecessary array allocations. The impact is nicely visible in the flame graph of the profiler:

Numpy transfer w/ fused DenseBlock creation and nnz count

Experiment setup: Python --> Java --> Python

matrix = ctx.from_numpy(array)
matrix = matrix.rbind(matrix) # trigger transfer python to java
matrix.compute() # triggers transfer java to python

py4j profiling

runtime: 16s
Py4j Profiling

unix-pipe profiling

runtime: 0.7s
Unix-pipe Profiling

Updated Python --> Java numpy transfer time:

#elements Size py4j unix pipe (previous) unix pipe (fused)
5000000 190 MB 6.81 0.79 0.56
10000000 380 MB 12.80 1.31 0.96
20000000 763 MB 22.58 2.13 1.65
50000000 1.9 GB ERR 4.54 3.51
100000000 3.8 GB ERR 8.78 6.31
200000000 7.5 GB ERR 17.52 12.21

DataFrame Transfer stats:

@codecov
Copy link

codecov bot commented Nov 24, 2025

Codecov Report

❌ Patch coverage is 97.05882% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.32%. Comparing base (79122eb) to head (f569dc0).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...ain/java/org/apache/sysds/api/PythonDMLScript.java 85.00% 3 Missing and 3 partials ⚠️
...a/org/apache/sysds/runtime/util/UnixPipeUtils.java 98.36% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2363      +/-   ##
============================================
+ Coverage     72.29%   72.32%   +0.03%     
- Complexity    46937    47000      +63     
============================================
  Files          1513     1513              
  Lines        178421   178738     +317     
  Branches      35036    35079      +43     
============================================
+ Hits         128993   129277     +284     
- Misses        39665    39675      +10     
- Partials       9763     9786      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@e-strauss e-strauss force-pushed the squashed-frame-transfer branch 3 times, most recently from 2c3771f to 624ae74 Compare December 3, 2025 19:46
@e-strauss e-strauss marked this pull request as ready for review December 3, 2025 21:03
@e-strauss e-strauss requested a review from Baunsgaard December 3, 2025 21:05
Copy link
Contributor

@Baunsgaard Baunsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I mainly have formatting comments that needs to be addressed.

@e-strauss e-strauss force-pushed the squashed-frame-transfer branch from 624ae74 to f569dc0 Compare December 8, 2025 10:43
@e-strauss e-strauss closed this in 0e8e966 Dec 9, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in SystemDS PR Queue Dec 9, 2025
@e-strauss e-strauss deleted the squashed-frame-transfer branch December 10, 2025 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants