Skip to content

remove environment-specific manifests from public branch; #87

Open
mkuznet1 wants to merge 2 commits intoROCm:aicomnet_devfrom
mkuznet1:aicomnet_dev
Open

remove environment-specific manifests from public branch; #87
mkuznet1 wants to merge 2 commits intoROCm:aicomnet_devfrom
mkuznet1:aicomnet_dev

Conversation

@mkuznet1
Copy link

The .gitignore file has been restored, and the manifest files have been deleted

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR appears to clean up local/runtime artifacts by removing previously committed run-manifest JSON files and an environment file, and updating .gitignore to ignore additional generated content.

Changes:

  • Removed two committed run_manifest_*.json files under manifests/.
  • Removed manifests/mad.env (shell environment exports).
  • Updated .gitignore (adds *.json, and fixes formatting for .madengine_session_start).

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated no comments.

File Description
manifests/run_manifest_pyt_vllm_dissag_llama-3.1-8b_3node_rdma_localimage.json Removed a committed run-manifest JSON (likely local/generated).
manifests/run_manifest_primus_2node_qwen_localimage.json Removed a committed run-manifest JSON (likely local/generated).
manifests/mad.env Removed a committed environment export file (likely local setup).
.gitignore Ignores additional files (notably *.json) and normalizes an entry’s formatting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

- Introduced per-node artifact staging to a dedicated results directory.
- Implemented a mechanism to wait for all nodes to complete staging before merging results.
- Added logic to merge performance CSV files from multiple nodes, selecting the best file based on content.
- Updated the master node's result collection process to reflect these changes, ensuring comprehensive data aggregation.

This update aims to improve the reliability and accuracy of performance reporting in distributed SLURM runs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants