AgentSlurm is an intelligent analyzer for SLURM job scripts that helps users optimize their HPC jobs, particularly focusing on Lustre filesystem performance. This tool automatically reviews your SLURM scripts and provides actionable feedback to improve performance and efficiency.
-
Clone the repository:
git clone https://github.com/basillicus/agentSlurm.git cd agentSlurm -
Create and activate a conda environment (recommended):
conda create -n agentslurm-env python=3.9 # Or your preferred Python version conda activate agentslurm-env -
Install dependencies:
pip install -e . -
Verify installation:
agentslurm --help
Basic usage:
agentslurm /path/to/your/slurm_script.slurmWith user profile specification:
agentslurm /path/to/your/slurm_script.slurm --profile MediumAvailable user profiles:
Basic: Simple explanations for newcomers to HPCMedium: Balanced explanations for regular users (default)Advanced: Technical details for experienced HPC users
script_path: Path to the SLURM script to analyze (required)--profile: User experience level (Basic, Medium, Advanced) [default: Medium]--output-file: Path to save the analysis report in Markdown format--focus-on: Comma-separated list of categories to focus on (e.g., LUSTRE,PERFORMANCE)
--use-llm: Enable LLM for deeper analysis--llm-provider: LLM provider to use (openai, anthropic, ollama, huggingface) [default: openai]--llm-model: Model to use [default: gpt-3.5-turbo]--api-key: API key for LLM provider (not needed for Ollama); if not provided, the system will try environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, or HF_TOKEN respectively)--base-url: Base URL for LLM provider (for Ollama, Hugging Face custom endpoints)--export-rules: Export learned rules to this file (only with LLM analysis)
# Basic OpenAI analysis with API key
agentslurm /path/to/your/script.slurm --use-llm --llm-provider openai --llm-model gpt-3.5-turbo --api-key YOUR_API_KEY
# Or using environment variable (set OPENAI_API_KEY=your_key in your environment)
agentslurm /path/to/your/script.slurm --use-llm --llm-provider openai --llm-model gpt-3.5-turbo# Using Claude models with API key
agentslurm /path/to/your/script.slurm --use-llm --llm-provider anthropic --llm-model claude-3-sonnet --api-key YOUR_API_KEY
# Or using environment variable (set ANTHROPIC_API_KEY=your_key in your environment)
agentslurm /path/to/your/script.slurm --use-llm --llm-provider anthropic --llm-model claude-3-sonnet# Using locally running models (make sure Ollama is installed and running: `ollama serve`)
# No API key needed for local models
agentslurm /path/to/your/script.slurm --use-llm --llm-provider ollama --llm-model llama2
# With custom Ollama server URL
agentslurm /path/to/your/script.slurm --use-llm --llm-provider ollama --llm-model mistral --base-url http://localhost:11434/v1# Using Hugging Face models with API key
agentslurm /path/to/your/script.slurm --use-llm --llm-provider huggingface --llm-model microsoft/DialoGPT-medium --api-key YOUR_HF_API_KEY
# With custom Hugging Face endpoint
agentslurm /path/to/your/script.slurm --use-llm --llm-provider huggingface --base-url https://your-endpoint.hf.space --api-key YOUR_HF_API_KEYAfter running LLM analysis, you can export newly learned rules:
agentslurm /path/to/your/script.slurm --use-llm --llm-provider openai --api-key YOUR_API_KEY --export-rules learned_rules.yamlThe analyzer produces a structured report with:
-
Issues Found: Problems detected in your script with severity indicators:
⚠️ Warning: Something that could cause performance issues- ❌ Error: Something that could cause job failure
- ℹ️ Info: Helpful information about your script
-
Analysis Summary: Key information about what was detected in your script
Agentic Slurm Analyzer - Analysis Report
=======================================
Issues Found:
-------------
1. ⚠️ Missing Lustre Striping Configuration
This workflow appears to process large files (detected tools like bwa, gatk, etc.) without explicit Lustre striping configuration. For large-file I/O patterns, setting an appropriate stripe count and size using 'lfs setstripe' can significantly improve performance. Consider adding 'lfs setstripe -c [n] -s [size] [directory]' where appropriate.
Analysis Summary:
-----------------
• Total findings: 1
• User profile: Medium
• Tools detected: bwa, samtools
Detection: When the script includes tools commonly used for processing large files (bwa, gatk, samtools, vasp, star, hisat2, bowtie2) but no lfs setstripe command is present.
Recommendation: Add an appropriate lfs setstripe command:
# For large files, spread across multiple OSTs
lfs setstripe -c 4 -s 64M $OUTPUT_DIRDetection: When the script includes tools commonly used for processing many small files (fastqc, multiqc, blastn, blastp, diamond) and a lfs setstripe command with stripe count > 1 is present.
Recommendation: Use single stripe for small files:
# For many small files, use single stripe
lfs setstripe -c 1 $OUTPUT_DIR- Create an account at OpenAI Platform
- Generate an API key in the dashboard
- Use the API key with the
--api-keyoption or set theOPENAI_API_KEYenvironment variable
- Create an account at Anthropic
- Generate an API key
- Use the API key with the
--api-keyoption or set theANTHROPIC_API_KEYenvironment variable
- Install Ollama from ollama.ai
- Pull a model:
ollama pull llama2 - Start the Ollama server:
ollama serve - Run Agent Slurm without an API key
- Create an account at Hugging Face and get an API key
- Use the API key with the
--api-keyoption or set theHF_TOKENenvironment variable
- API keys should be treated as sensitive information
- Don't hardcode API keys in script files
- Consider using environment variables or secure credential management in production environments
- The system only processes the SLURM script content and doesn't store your API keys
- API keys can be provided via command line or environment variables
- Analyze Your Scripts: Run AgentSlurm before submitting large or important jobs
- Check Lustre I/O: Apply Lustre striping recommendations for better performance
- Use Appropriate User Profiles: Select the profile level that matches your HPC expertise
- Enable LLM Integration: For complex scripts, use LLM analysis to get deeper insights
If you encounter issues or have questions:
- Run your script through AgentSlurm with different user profiles
- Check your HPC center's documentation for Lustre guidelines
- Consult with system administrators for site-specific recommendations