Skip to content

Conversation

@TerrenceZhangX
Copy link
Contributor

This pull request rebrands the project from "WorkloadSim" to "FlowSim" and introduces several improvements to the developer and user experience. The most significant changes include renaming project references, updating Docker and build scripts, modernizing and expanding the documentation, and adding a new profiling script for simplified workflows.

Project rebranding and documentation overhaul:

  • Updated all references from "WorkloadSim" to "FlowSim" across the codebase, including Docker image/container names, directory names, and documentation. [1] [2] [3]
  • Major rewrite and expansion of the README.md to provide clearer, step-by-step instructions for profiling, parsing, and simulation workflows, as well as improved developer guidance and troubleshooting tips. [1] [2]

Docker and build system improvements:

  • Updated the Docker build context and paths to use /flowsim instead of /workloadsim, including all code copies, working directories, and build instructions. Also updated the maintainer label and improved comments for clarity. [1] [2]
  • Removed deprecated or unnecessary Docker build steps (e.g., commented-out platform-specific installs and volume mount options), and improved the build and run targets in the Makefile to match the new naming and workflow. [1] [2]

Developer workflow enhancements:

  • Added a new script, scripts/run_profile.py, which automates launching an sglang server and running profiling workloads with customizable options, making it easier to generate traces in Docker or Kubernetes environments.
  • Added a new script, scripts/run_simulate.py, which simulates the kernels in the profiled workloads on LLMCompass backend. Return results will be a summary csv/json file, concluding all kernels' simulation status.
  • Updated the LLMCompass backend submodule to the latest version for improved simulator integration.

(References: [1] [2] [3] [4] [5] [6] [7] [8] [9]

@TerrenceZhangX TerrenceZhangX self-assigned this Dec 25, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request rebrands the project from "WorkloadSim" to "FlowSim" and introduces developer workflow improvements. The changes include comprehensive path updates across the codebase, documentation enhancements with step-by-step guides, and new automation scripts for profiling and simulation workflows.

Key changes:

  • Renamed all references from /workloadsim to /flowsim across tests, Docker files, and core modules
  • Added scripts/run_profile.py and scripts/run_simulate.py to automate profiling and simulation tasks
  • Expanded README.md with detailed Getting Started and Developer sections including concrete usage examples

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/utils.py Updated default artifact and sglang directory paths to use /flowsim
tests/unit/*.py Updated trace file paths in test fixtures and docstrings to use /flowsim
tests/integration/*.py Updated model config paths, working directories, and artifact paths to use /flowsim
simulator/base_parser.py Updated kernel database, NCCL test binary, and unknown kernels paths to use /flowsim
scripts/run_simulate.py New script automating kernel submission to LLMCompass backend with result tracking
scripts/run_profile.py New script automating sglang server launch and profiling workload execution
dockerfiles/cuda12.6.dockerfile Updated working directories, copy paths, maintainer label, and build commands to use /flowsim; added patch application steps
backend/LLMCompass Updated submodule commit to latest version
README.md Complete rewrite with structured Getting Started guide, detailed workflow steps, and developer documentation
Makefile Updated image/container names to flowsim-*; removed MOUNT_VOLUME parameter


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 56 to 61
default="/workloadsim/server_profile",
help="Directory where profiler traces (.trace.json.gz) will be written",
)
p.add_argument(
"--log-dir",
default="/workloadsim/tests/test-artifacts",
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default path still uses old /workloadsim prefix instead of /flowsim. This inconsistency will cause the script to fail when users rely on the default value.

Suggested change
default="/workloadsim/server_profile",
help="Directory where profiler traces (.trace.json.gz) will be written",
)
p.add_argument(
"--log-dir",
default="/workloadsim/tests/test-artifacts",
default="/flowsim/server_profile",
help="Directory where profiler traces (.trace.json.gz) will be written",
)
p.add_argument(
"--log-dir",
default="/flowsim/tests/test-artifacts",

Copilot uses AI. Check for mistakes.


def wait_for_port(host: str, port: int, timeout: int = 600) -> bool:
"""Wait until a TCP port becomes reachable."""
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function uses tab characters for indentation instead of spaces. Python PEP 8 recommends using 4 spaces per indentation level. This is inconsistent with the rest of the codebase which uses spaces.

Copilot uses AI. Check for mistakes.
Comment on lines 38 to 39
def clean_dir(path: str) -> None:
"""Clean or create a directory."""
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function uses tab characters for indentation instead of spaces. Python PEP 8 recommends using 4 spaces per indentation level. This is inconsistent with the rest of the codebase which uses spaces.

Copilot uses AI. Check for mistakes.
Comment on lines 27 to 87
"""Wait until a TCP port becomes reachable."""
deadline = time.time() + timeout
while time.time() < deadline:
try:
with socket.create_connection((host, port), timeout=2):
return True
except Exception:
time.sleep(1)
return False


def clean_dir(path: str) -> None:
"""Clean or create a directory."""
if os.path.exists(path):
for name in os.listdir(path):
fp = os.path.join(path, name)
if os.path.isfile(fp) or os.path.islink(fp):
os.unlink(fp)
elif os.path.isdir(fp):
shutil.rmtree(fp)
else:
os.makedirs(path, exist_ok=True)


def parse_args(argv: Optional[list] = None) -> argparse.Namespace:
p = argparse.ArgumentParser(description="Run sglang profiling workload")

p.add_argument(
"--profile-dir",
default="/workloadsim/server_profile",
help="Directory where profiler traces (.trace.json.gz) will be written",
)
p.add_argument(
"--log-dir",
default="/workloadsim/tests/test-artifacts",
help="Directory to write server/client logs",
)
p.add_argument(
"--server-opts",
required=True,
help=(
"All options for sglang.launch_server (include --host, --port, --model-path, --tp, etc). "
"Example: '--model-path /path --tp 1 --host 0.0.0.0 --port 30001 --disable-cuda-graph'"
),
)
p.add_argument(
"--bench-opts",
required=True,
help=(
"All options for bench_serving.py (include --backend, --host, --port, --dataset-name, --profile, etc). "
"Example: '--backend sglang --host 0.0.0.0 --port 30001 --dataset-name defined-len --num-prompts 16 --profile'"
),
)
p.add_argument(
"--bench-timeout",
type=int,
default=1200,
help="Timeout in seconds for bench_serving.py",
)

return p.parse_args(argv)
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function uses tab characters for indentation instead of spaces. Python PEP 8 recommends using 4 spaces per indentation level. This is inconsistent with the rest of the codebase which uses spaces.

Suggested change
"""Wait until a TCP port becomes reachable."""
deadline = time.time() + timeout
while time.time() < deadline:
try:
with socket.create_connection((host, port), timeout=2):
return True
except Exception:
time.sleep(1)
return False
def clean_dir(path: str) -> None:
"""Clean or create a directory."""
if os.path.exists(path):
for name in os.listdir(path):
fp = os.path.join(path, name)
if os.path.isfile(fp) or os.path.islink(fp):
os.unlink(fp)
elif os.path.isdir(fp):
shutil.rmtree(fp)
else:
os.makedirs(path, exist_ok=True)
def parse_args(argv: Optional[list] = None) -> argparse.Namespace:
p = argparse.ArgumentParser(description="Run sglang profiling workload")
p.add_argument(
"--profile-dir",
default="/workloadsim/server_profile",
help="Directory where profiler traces (.trace.json.gz) will be written",
)
p.add_argument(
"--log-dir",
default="/workloadsim/tests/test-artifacts",
help="Directory to write server/client logs",
)
p.add_argument(
"--server-opts",
required=True,
help=(
"All options for sglang.launch_server (include --host, --port, --model-path, --tp, etc). "
"Example: '--model-path /path --tp 1 --host 0.0.0.0 --port 30001 --disable-cuda-graph'"
),
)
p.add_argument(
"--bench-opts",
required=True,
help=(
"All options for bench_serving.py (include --backend, --host, --port, --dataset-name, --profile, etc). "
"Example: '--backend sglang --host 0.0.0.0 --port 30001 --dataset-name defined-len --num-prompts 16 --profile'"
),
)
p.add_argument(
"--bench-timeout",
type=int,
default=1200,
help="Timeout in seconds for bench_serving.py",
)
return p.parse_args(argv)
"""Wait until a TCP port becomes reachable."""
deadline = time.time() + timeout
while time.time() < deadline:
try:
with socket.create_connection((host, port), timeout=2):
return True
except Exception:
time.sleep(1)
return False
def clean_dir(path: str) -> None:
"""Clean or create a directory."""
if os.path.exists(path):
for name in os.listdir(path):
fp = os.path.join(path, name)
if os.path.isfile(fp) or os.path.islink(fp):
os.unlink(fp)
elif os.path.isdir(fp):
shutil.rmtree(fp)
else:
os.makedirs(path, exist_ok=True)
def parse_args(argv: Optional[list] = None) -> argparse.Namespace:
p = argparse.ArgumentParser(description="Run sglang profiling workload")
p.add_argument(
"--profile-dir",
default="/workloadsim/server_profile",
help="Directory where profiler traces (.trace.json.gz) will be written",
)
p.add_argument(
"--log-dir",
default="/workloadsim/tests/test-artifacts",
help="Directory to write server/client logs",
)
p.add_argument(
"--server-opts",
required=True,
help=(
"All options for sglang.launch_server (include --host, --port, --model-path, --tp, etc). "
"Example: '--model-path /path --tp 1 --host 0.0.0.0 --port 30001 --disable-cuda-graph'"
),
)
p.add_argument(
"--bench-opts",
required=True,
help=(
"All options for bench_serving.py (include --backend, --host, --port, --dataset-name, --profile, etc). "
"Example: '--backend sglang --host 0.0.0.0 --port 30001 --dataset-name defined-len --num-prompts 16 --profile'"
),
)
p.add_argument(
"--bench-timeout",
type=int,
default=1200,
help="Timeout in seconds for bench_serving.py",
)
return p.parse_args(argv)

Copilot uses AI. Check for mistakes.
Comment on lines 90 to 91
def main(argv: Optional[list] = None) -> int:
args = parse_args(argv)
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function uses tab characters for indentation instead of spaces. Python PEP 8 recommends using 4 spaces per indentation level. This is inconsistent with the rest of the codebase which uses spaces.

Copilot uses AI. Check for mistakes.
else:
profiled_duration = nb.run_nccl_all_gather_perf(
cmd_path="/workloadsim/third_party/nccl-tests/build/all_gather_perf",
cmd_path="/flowsim/third_party/nccl-tests/build/all_gather_perf",
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path to NCCL all_gather_perf binary was updated to /flowsim, but the corresponding all_reduce_perf path at line 558 still uses /workloadsim. This inconsistency will cause failures when calibrating all_reduce operations.

Copilot uses AI. Check for mistakes.
TerrenceZhangX and others added 2 commits December 24, 2025 16:25
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@TerrenceZhangX
Copy link
Contributor Author

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link

Copilot AI commented Dec 25, 2025

@TerrenceZhangX I've opened a new pull request, #7, to work on those changes. Once the pull request is ready, I'll request review from you.

* Initial plan

* Fix indentation and path inconsistencies per review comments

Co-authored-by: TerrenceZhangX <39916879+TerrenceZhangX@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: TerrenceZhangX <39916879+TerrenceZhangX@users.noreply.github.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TerrenceZhangX and others added 3 commits December 24, 2025 16:41
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@TerrenceZhangX TerrenceZhangX merged commit 443db6b into main Dec 25, 2025
3 checks passed
@TerrenceZhangX TerrenceZhangX deleted the zhangt/unified_naming branch January 13, 2026 19:41
@TerrenceZhangX TerrenceZhangX restored the zhangt/unified_naming branch January 13, 2026 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants