Skip to content

Conversation

@yiseungmi87
Copy link

@yiseungmi87 yiseungmi87 commented Dec 18, 2025

This PR introduces improved documentation for new users of SystemDS:

Added

  • quickstart_extended.md - Overview page linking installation and execution docs
  • release_install.md - Clean, updated installation guide for release users
  • source_install.md - Updated guide for building SystemDS from source
  • run_extended.md - Comprehensive execution guide (local, Spark, federated)
  • run.md- Slightly modified

Scope

  • Documentation-only changes
  • No changes to SystemDS code or runtime behavior
  • Existing run.md is intentionally left for compatibility

Purpose

These changes provide clearer onboarding for new SystemDS users and consolidate documentation into a consistent structure.

Let me know if adjustments are desired before merging.

Copy link
Contributor

@janniklinde janniklinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @yiseungmi87 for the good first PR, it seems to be quite clear and understandable so far.

I did not manage to set up SystemDS for Ubuntu by only following your guide (which should be the goal of the install guide) so please have a look into that. You can use a clean docker image to follow your guide and identify possible points of failure. Similarly, please check that for the other operating systems no such weak points exist (if you have windows, maybe try the setup on a new user). Also, I realized that when cloning SystemDS source code via GitHub Desktop on Windows, it might get stuck in the cloning process so we should provide a solution for that (e.g. use 'git' CLI for cloning rather than the app). So far, I have not tested the install for Windows / macOS but will do so once my current comments are resolved.


Download the official release archive from the Apache SystemDS website:

https://apache.org/dyn/closer.lua/systemds/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rather point to https://systemds.apache.org/download

Comment on lines +59 to +73

### 3.1 Extract the Release

```bash
cd /path/to/install
tar -xvf systemds-<VERSION>.tar.gz
cd systemds-<VERSION>
```

### 3.2 Add SystemDS to PATH

```bash
export SYSTEMDS_ROOT=$(pwd)
export PATH="$SYSTEMDS_ROOT/bin:$PATH"
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to follow the guide for ubuntu 22.04 (I set up a fresh docker image with java, tar and wget installed). After downloading and extracting the release, I got stuck with this error.

Docker image I tested on:

FROM ubuntu:22.04

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install -y \
        openjdk-17-jdk \
        ca-certificates \
        wget \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /opt

RUN wget https://dlcdn.apache.org/systemds/3.3.0/systemds-3.3.0-bin.tgz && \
    tar -xzf systemds-3.3.0-bin.tgz && \
    rm systemds-3.3.0-bin.tgz

CMD ["bash"]
root@9385e1a25ddd:/opt# ls
systemds-3.3.0-bin
root@9385e1a25ddd:/opt# cd systemds-3.3.0-bin
root@9385e1a25ddd:/opt/systemds-3.3.0-bin# java -version
openjdk version "17.0.17" 2025-10-21
OpenJDK Runtime Environment (build 17.0.17+10-Ubuntu-122.04)
OpenJDK 64-Bit Server VM (build 17.0.17+10-Ubuntu-122.04, mixed mode, sharing)
root@9385e1a25ddd:/opt/systemds-3.3.0-bin# export SYSTEMDS_ROOT=$(pwd)
root@9385e1a25ddd:/opt/systemds-3.3.0-bin# export PATH="$SYSTEMDS_ROOT/bin:$PATH"
root@9385e1a25ddd:/opt/systemds-3.3.0-bin# systemds -help
Help requested. Will exit after extended usage message!

Usage: /opt/systemds-3.3.0-bin/bin/systemds [-r] [SystemDS.jar] [-f] <dml-filename> [arguments] [-help]

    SystemDS.jar : Specify a custom SystemDS.jar file (this will be prepended
                   to the classpath
                   or fed to spark-submit
    -r           : Spawn a debug server for remote debugging (standalone and
                   spark driver only atm). Default port is 8787 - change within
                   this script if necessary. See SystemDS documentation on how
                   to attach a remote debugger.
    -f           : Optional prefix to the dml-filename for consistency with
                   previous behavior dml-filename : The script file to run.
                   This is mandatory unless running as a federated worker
                   (see below).
    arguments    : The arguments specified after the DML script are passed to
                   SystemDS. Specify parameters that need to go to
                   java/spark-submit by editing this run script.
    -help        : Print this usage message and SystemDS parameter info

Worker Usage: /opt/systemds-3.3.0-bin/bin/systemds [-r] WORKER [SystemDS.jar] <portnumber> [arguments] [-help]

    port         : The port to open for the federated worker.

Federated Monitoring Usage: /opt/systemds-3.3.0-bin/bin/systemds [-r] FEDMONITORING [SystemDS.jar] <portnumber> [arguments] [-help]

    port         : The port to open for the federated monitoring tool.

Set custom launch configuration by setting/editing SYSTEMDS_STANDALONE_OPTS
and/or SYSTEMDS_DISTRIBUTED_OPTS.

Set the environment variable SYSDS_DISTRIBUTED=1 to run spark-submit instead of
local java Set SYSDS_QUIET=1 to omit extra information printed by this run
script.

----------------------------------------------------------------------
Further help on SystemDS arguments:
Error: Unable to access jarfile org.apache.sysds.api.DMLScript

root@9385e1a25ddd:/opt/systemds-3.3.0-bin# cd ..
root@9385e1a25ddd:/opt# echo 'print("Hello World!")' > hello.dml
root@9385e1a25ddd:/opt# systemds -f hello.dml
###############################################################################
#  SYSTEMDS_ROOT= /opt/systemds-3.3.0-bin
#  SYSTEMDS_JAR_FILE= 
#  SYSDS_EXEC_MODE= singlenode
#  CONFIG_FILE= -config /opt/systemds-3.3.0-bin/conf/SystemDS-config.xml
#  LOG4JPROP= -Dlog4j.configuration=file:/opt/systemds-3.3.0-bin/conf/log4j.properties
#  HADOOP_HOME= /opt/systemds-3.3.0-bin/lib/hadoop
#
#  Running script hello.dml locally with opts: 
#  Executing command:    java -Xmx4g -Xms4g -Xmn400m    -Dlog4j.configuration=file:/opt/systemds-3.3.0-bin/conf/log4j.properties   -jar    -f hello.dml   -exec singlenode   -config /opt/systemds-3.3.0-bin/conf/SystemDS-config.xml   
###############################################################################
Error: Invalid or corrupt jarfile hello.dml

Comment on lines +48 to +55
It can be beneficial to enter these into your `~/.profile` or `~/.bashrc` for linux,
(but remember to change `$(pwd` to the full folder path)
or your environment variables in windows to enable reuse between terminals and restarts.

```bash
echo 'export SYSTEMDS_ROOT='$(pwd) >> ~/.bashrc
echo 'export PATH=$SYSTEMDS_ROOT/bin:$PATH' >> ~/.bashrc
```
Copy link
Contributor

@janniklinde janniklinde Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mention that in release_install as well. Otherwise, after restarting the terminal people might get confused when only following quickstart. Also, for prerequisites that are already mentioned in the install, guides reference them rather than repeating the same thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to mention that you can also add the bin folder to PATH. Then you can directly access your last local build through CLI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants