BenjaminIsaac0111 · BenjaminIsaac0111 · Feb 25, 2026 · Feb 24, 2026 · Feb 25, 2026 · Feb 25, 2026
@@ -0,0 +1,17 @@
+# Global owners (catch-all for all files)
+*       @BenjaminIsaac0111
+
+# Core Model and Training Logic
+src/spatial_transcript_former/models/    @BenjaminIsaac0111
+src/spatial_transcript_former/training/  @BenjaminIsaac0111
+
+# Data Management and Scripts
+src/spatial_transcript_former/data/      @BenjaminIsaac0111
+scripts/                                 @BenjaminIsaac0111
+
+# Documentation
+docs/                                    @BenjaminIsaac0111
+*.md                                     @BenjaminIsaac0111
+
+# GitHub Actions and Infrastructure
+.github/                                 @BenjaminIsaac0111
@@ -0,0 +1,69 @@
+# Contributing to SpatialTranscriptFormer
+
+Thank you for your interest in contributing! As a project at the intersection of deep learning and pathology, we value rigorous, well-tested contributions.
+
+## Project Status
+
+> [!IMPORTANT]
+> This project is a **Work in Progress**. We are actively refining the core interaction logic and scaling behaviors. Expect breaking changes in the CLI and data schemas.
+
+## Intellectual Property & Licensing
+
+SpatialTranscriptFormer is protected under a **Proprietary Source Code License**.
+
+- **Academic/Non-Profit**: We encourage contributions from the research community. Contributions made under an academic affiliation are generally welcome.
+- **Commercial/For-Profit**: Contributions from commercial entities or individuals intended for profit-seeking use require a separate agreement.
+- **Assignment**: By submitting a Pull Request, you agree that your contributions will be licensed under the project's existing license, granting the author the right to include them in both the open-access and proprietary versions of the software.
+
+## Development Workflow
+
+### 1. Environment Setup
+
+Use the provided setup scripts to ensure a consistent development environment:
+
+```bash
+# Windows
+.\setup.ps1
+
+# Linux/HPC
+bash setup.sh
+```
+
+### 2. Coding Standards
+
+We use `black` for formatting and `flake8` for linting. Please ensure your code passes these checks before submitting.
+
+```bash
+black .
+flake8 src/
+```
+
+### 3. Testing
+
+All new features must include unit tests in the `tests/` directory. We use `pytest` for our test suite.
+
+```bash
+# Run all tests
+.\test.ps1  # Windows
+bash test.sh # Linux
+```
+
+## Pull Request Process
+
+1. **Open an Issue**: For major changes, please open an issue first to discuss the design.
+2. **Branching**: Work on a descriptive feature branch (e.g., `feature/pathway-attention-mask`).
+3. **Documentation**: Update relevant files in `docs/` and the `README.md` if your change affects usage.
+4. **Verification**: Ensure all CI checks (GitHub Actions) pass.
+
+### Branch Protections
+
+To maintain code quality and stability, the following protections are enforced on the `main` branch:
+
+- **Require Pull Request Reviews**: All merges to `main` require at least one approval from a project maintainer.
+- **Required Status Checks**: The `CI` workflow must pass successfully before a PR can be merged. This includes formatting checks (`black`) and the full test suite (`pytest`).
+- **No Direct Pushes**: Pushing directly to `main` is disabled. All changes must go through the Pull Request process.
+- **Linear History**: We prefer **Squash and Merge** to keep the `main` branch history clean and concise.
+
+## Contact
+
+For questions regarding commercial licensing or complex architectural changes, please contact the author directly.
@@ -1,6 +1,16 @@
 # SpatialTranscriptFormer
 
-A transformer-based model for spatial transcriptomics.
+> [!WARNING]
+> **Work in Progress**: This project is under active development. Core architectures, CLI flags, and data formats are subject to major changes.
+
+A transformer-based model for spatial transcriptomics that bridges histology and biological pathways.
+
+## Key Features
+
+- **Quad-Flow Interaction**: Configurable attention between Pathways and Histology patches (`p2p`, `p2h`, `h2p`, `h2h`).
+- **Pathway Bottleneck**: Interpretable gene expression prediction via 50 MSigDB Hallmark tokens.
+- **Spatial Pattern Coherence**: Optimized using a composite **MSE + PCC (Pearson Correlation) loss** to prevent spatial collapse and ensure accurate morphology-expression mapping.
+- **Biologically Informed Initialization**: Gene reconstruction weights derived from known hallmark memberships.
 
 ## License
 
@@ -25,71 +35,66 @@ This project requires [Conda](https://docs.conda.io/en/latest/).
 
 ## Usage
 
-After installation, the following command-line tools are available in your `SpatialTranscriptFormer` environment:
-
 ### Download HEST Data
 
 Download specific subsets using filters or patterns:
 
 ```bash
-# List available organs
-stf-download --list_organs
-
 # Download only the Bowel Cancer subset (including ST data and WSIs)
 stf-download --organ Bowel --disease Cancer --local_dir hest_data
-
-# Download any other organ
-stf-download --organ Kidney
 ```
 
-### Split Dataset
+### Train Models
+
+We provide presets for baseline models and scaled versions of the SpatialTranscriptFormer.
 
-Perform patient-stratified splitting on the metadata:
+```bash
+# Recommended: Run the Interaction model with 4 transformer layers
+python scripts/run_preset.py --preset stf_interaction_l4
 
-```powershell
-stf-split HEST_v1_3_0.csv --val_ratio 0.2
+# Run the lightweight 2-layer version
+python scripts/run_preset.py --preset stf_interaction_l2
+
+# Run baselines
+python scripts/run_preset.py --preset he2rna_baseline
 ```
 
-### Train Models
+For a complete list of configurations, see the [Training Guide](docs/TRAINING_GUIDE.md).
 
-Train baseline models (HE2RNA, ViT) or the proposed interaction model. For a complete list of configurations and examples, see the [Training Guide](docs/TRAINING_GUIDE.md).
+### Real-Time Monitoring
 
-```bash
-# Option 1: Using the standard command
-stf-train --data-dir A:\hest_data --model he2rna --epochs 20
+Monitor training progress, loss curves, and **prediction variance (collapse detector)** via the web dashboard:
 
-# Option 2: Using the preset launcher (recommended for complex models)
-python scripts/run_preset.py --preset stf_interaction --epochs 30
+```bash
+python scripts/monitor.py --run-dir runs/stf_interaction_l4
 ```
 
 ### Inference & Visualization
 
-Generate spatial maps comparing Ground Truth vs Predictions for specific samples:
+Generate spatial maps comparing Ground Truth vs Predictions:
 
 ```bash
-stf-predict --data-dir A:\hest_data --sample-id MEND29 --model-path checkpoints/best_model_he2rna.pth --model-type he2rna
+stf-predict --data-dir A:\hest_data --sample-id MEND29 --model-path checkpoints/best_model.pth --model-type interaction
 ```
 
 Visualization plots will be saved to the `./results` directory.
 
 ## Documentation
 
-For detailed information on the data and code implementation, see:
-
+- [Models](docs/MODELS.md): Detailed model architectures and scaling parameters.
 - [Data Structure](docs/DATA_STRUCTURE.md): Organization of HEST data on disk.
-- [Dataloader](docs/DATALOADER.md): Technical implementation of the PyTorch dataset and loaders.
-- [Gene Analysis](docs/GENE_ANALYSIS.md): Analysis of available genes and modeling strategies.
-- [Pathway Mapping](docs/PATHWAY_MAPPING.md): Strategies for clinical interpretability and pathway integration.
-- [Latent Discovery](docs/LATENT_DISCOVERY.md): Unsupervised discovery of biological pathways from data.
-- [Models](docs/MODELS.md): Model architectures and literature references.
+- [Pathway Mapping](docs/PATHWAY_MAPPING.md): Clinical interpretability and pathway integration.
+- [Gene Analysis](docs/GENE_ANALYSIS.md): Modeling strategies for high-dimensional gene space.
 
 ## Development
 
 ### Running Tests
 
-Use the included test wrapper:
-
 ```bash
-# Run all tests
+# Run all tests (Pytest wrapper)
 .\test.ps1
 ```
+
+## Contributing
+
+We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for details on our coding standards and the process for submitting pull requests. Note that this project is under a proprietary license; contributions involve an assignment of rights for non-academic use.
@@ -4,15 +4,12 @@
 # Data Paths
 # Candidates for the HEST data directory (checked in order)
 data_dirs:
-  - "hest_data"
-  - "../hest_data"
-  - "./data"
   - "A:\\hest_data"
 
 # Training Defaults
 training:
   num_genes: 1000
-  batch_size: 32
+  batch_size: 8
   learning_rate: 0.0001
   output_dir: "./checkpoints"