Skip to content

Commit dd64219

Browse files
Merge pull request #1 from BenjaminIsaac0111/InteractionModelDebugging
I felt the model was getting a bit complicated, and technical debt was building. Simplified the model to a point that it at least learns something without any extended components, beyond the standard MIL-like interaction model using transformer layers. I might look back at some of the initial ideas, but I think this version is a good launch pad for extending ideas from. Improved the test, but I still think I need to make these more robust and give better coverage! I will work on this...
2 parents 4a9b086 + 39530e8 commit dd64219

32 files changed

Lines changed: 2356 additions & 1705 deletions

.github/CODEOWNERS

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Global owners (catch-all for all files)
2+
* @BenjaminIsaac0111
3+
4+
# Core Model and Training Logic
5+
src/spatial_transcript_former/models/ @BenjaminIsaac0111
6+
src/spatial_transcript_former/training/ @BenjaminIsaac0111
7+
8+
# Data Management and Scripts
9+
src/spatial_transcript_former/data/ @BenjaminIsaac0111
10+
scripts/ @BenjaminIsaac0111
11+
12+
# Documentation
13+
docs/ @BenjaminIsaac0111
14+
*.md @BenjaminIsaac0111
15+
16+
# GitHub Actions and Infrastructure
17+
.github/ @BenjaminIsaac0111

CONTRIBUTING.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Contributing to SpatialTranscriptFormer
2+
3+
Thank you for your interest in contributing! As a project at the intersection of deep learning and pathology, we value rigorous, well-tested contributions.
4+
5+
## Project Status
6+
7+
> [!IMPORTANT]
8+
> This project is a **Work in Progress**. We are actively refining the core interaction logic and scaling behaviors. Expect breaking changes in the CLI and data schemas.
9+
10+
## Intellectual Property & Licensing
11+
12+
SpatialTranscriptFormer is protected under a **Proprietary Source Code License**.
13+
14+
- **Academic/Non-Profit**: We encourage contributions from the research community. Contributions made under an academic affiliation are generally welcome.
15+
- **Commercial/For-Profit**: Contributions from commercial entities or individuals intended for profit-seeking use require a separate agreement.
16+
- **Assignment**: By submitting a Pull Request, you agree that your contributions will be licensed under the project's existing license, granting the author the right to include them in both the open-access and proprietary versions of the software.
17+
18+
## Development Workflow
19+
20+
### 1. Environment Setup
21+
22+
Use the provided setup scripts to ensure a consistent development environment:
23+
24+
```bash
25+
# Windows
26+
.\setup.ps1
27+
28+
# Linux/HPC
29+
bash setup.sh
30+
```
31+
32+
### 2. Coding Standards
33+
34+
We use `black` for formatting and `flake8` for linting. Please ensure your code passes these checks before submitting.
35+
36+
```bash
37+
black .
38+
flake8 src/
39+
```
40+
41+
### 3. Testing
42+
43+
All new features must include unit tests in the `tests/` directory. We use `pytest` for our test suite.
44+
45+
```bash
46+
# Run all tests
47+
.\test.ps1 # Windows
48+
bash test.sh # Linux
49+
```
50+
51+
## Pull Request Process
52+
53+
1. **Open an Issue**: For major changes, please open an issue first to discuss the design.
54+
2. **Branching**: Work on a descriptive feature branch (e.g., `feature/pathway-attention-mask`).
55+
3. **Documentation**: Update relevant files in `docs/` and the `README.md` if your change affects usage.
56+
4. **Verification**: Ensure all CI checks (GitHub Actions) pass.
57+
58+
### Branch Protections
59+
60+
To maintain code quality and stability, the following protections are enforced on the `main` branch:
61+
62+
- **Require Pull Request Reviews**: All merges to `main` require at least one approval from a project maintainer.
63+
- **Required Status Checks**: The `CI` workflow must pass successfully before a PR can be merged. This includes formatting checks (`black`) and the full test suite (`pytest`).
64+
- **No Direct Pushes**: Pushing directly to `main` is disabled. All changes must go through the Pull Request process.
65+
- **Linear History**: We prefer **Squash and Merge** to keep the `main` branch history clean and concise.
66+
67+
## Contact
68+
69+
For questions regarding commercial licensing or complex architectural changes, please contact the author directly.

README.md

Lines changed: 37 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,16 @@
11
# SpatialTranscriptFormer
22

3-
A transformer-based model for spatial transcriptomics.
3+
> [!WARNING]
4+
> **Work in Progress**: This project is under active development. Core architectures, CLI flags, and data formats are subject to major changes.
5+
6+
A transformer-based model for spatial transcriptomics that bridges histology and biological pathways.
7+
8+
## Key Features
9+
10+
- **Quad-Flow Interaction**: Configurable attention between Pathways and Histology patches (`p2p`, `p2h`, `h2p`, `h2h`).
11+
- **Pathway Bottleneck**: Interpretable gene expression prediction via 50 MSigDB Hallmark tokens.
12+
- **Spatial Pattern Coherence**: Optimized using a composite **MSE + PCC (Pearson Correlation) loss** to prevent spatial collapse and ensure accurate morphology-expression mapping.
13+
- **Biologically Informed Initialization**: Gene reconstruction weights derived from known hallmark memberships.
414

515
## License
616

@@ -25,71 +35,66 @@ This project requires [Conda](https://docs.conda.io/en/latest/).
2535

2636
## Usage
2737

28-
After installation, the following command-line tools are available in your `SpatialTranscriptFormer` environment:
29-
3038
### Download HEST Data
3139

3240
Download specific subsets using filters or patterns:
3341

3442
```bash
35-
# List available organs
36-
stf-download --list_organs
37-
3843
# Download only the Bowel Cancer subset (including ST data and WSIs)
3944
stf-download --organ Bowel --disease Cancer --local_dir hest_data
40-
41-
# Download any other organ
42-
stf-download --organ Kidney
4345
```
4446

45-
### Split Dataset
47+
### Train Models
48+
49+
We provide presets for baseline models and scaled versions of the SpatialTranscriptFormer.
4650

47-
Perform patient-stratified splitting on the metadata:
51+
```bash
52+
# Recommended: Run the Interaction model with 4 transformer layers
53+
python scripts/run_preset.py --preset stf_interaction_l4
4854

49-
```powershell
50-
stf-split HEST_v1_3_0.csv --val_ratio 0.2
55+
# Run the lightweight 2-layer version
56+
python scripts/run_preset.py --preset stf_interaction_l2
57+
58+
# Run baselines
59+
python scripts/run_preset.py --preset he2rna_baseline
5160
```
5261

53-
### Train Models
62+
For a complete list of configurations, see the [Training Guide](docs/TRAINING_GUIDE.md).
5463

55-
Train baseline models (HE2RNA, ViT) or the proposed interaction model. For a complete list of configurations and examples, see the [Training Guide](docs/TRAINING_GUIDE.md).
64+
### Real-Time Monitoring
5665

57-
```bash
58-
# Option 1: Using the standard command
59-
stf-train --data-dir A:\hest_data --model he2rna --epochs 20
66+
Monitor training progress, loss curves, and **prediction variance (collapse detector)** via the web dashboard:
6067

61-
# Option 2: Using the preset launcher (recommended for complex models)
62-
python scripts/run_preset.py --preset stf_interaction --epochs 30
68+
```bash
69+
python scripts/monitor.py --run-dir runs/stf_interaction_l4
6370
```
6471

6572
### Inference & Visualization
6673

67-
Generate spatial maps comparing Ground Truth vs Predictions for specific samples:
74+
Generate spatial maps comparing Ground Truth vs Predictions:
6875

6976
```bash
70-
stf-predict --data-dir A:\hest_data --sample-id MEND29 --model-path checkpoints/best_model_he2rna.pth --model-type he2rna
77+
stf-predict --data-dir A:\hest_data --sample-id MEND29 --model-path checkpoints/best_model.pth --model-type interaction
7178
```
7279

7380
Visualization plots will be saved to the `./results` directory.
7481

7582
## Documentation
7683

77-
For detailed information on the data and code implementation, see:
78-
84+
- [Models](docs/MODELS.md): Detailed model architectures and scaling parameters.
7985
- [Data Structure](docs/DATA_STRUCTURE.md): Organization of HEST data on disk.
80-
- [Dataloader](docs/DATALOADER.md): Technical implementation of the PyTorch dataset and loaders.
81-
- [Gene Analysis](docs/GENE_ANALYSIS.md): Analysis of available genes and modeling strategies.
82-
- [Pathway Mapping](docs/PATHWAY_MAPPING.md): Strategies for clinical interpretability and pathway integration.
83-
- [Latent Discovery](docs/LATENT_DISCOVERY.md): Unsupervised discovery of biological pathways from data.
84-
- [Models](docs/MODELS.md): Model architectures and literature references.
86+
- [Pathway Mapping](docs/PATHWAY_MAPPING.md): Clinical interpretability and pathway integration.
87+
- [Gene Analysis](docs/GENE_ANALYSIS.md): Modeling strategies for high-dimensional gene space.
8588

8689
## Development
8790

8891
### Running Tests
8992

90-
Use the included test wrapper:
91-
9293
```bash
93-
# Run all tests
94+
# Run all tests (Pytest wrapper)
9495
.\test.ps1
9596
```
97+
98+
## Contributing
99+
100+
We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for details on our coding standards and the process for submitting pull requests. Note that this project is under a proprietary license; contributions involve an assignment of rights for non-academic use.

config.yaml

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,12 @@
44
# Data Paths
55
# Candidates for the HEST data directory (checked in order)
66
data_dirs:
7-
- "hest_data"
8-
- "../hest_data"
9-
- "./data"
107
- "A:\\hest_data"
118

129
# Training Defaults
1310
training:
1411
num_genes: 1000
15-
batch_size: 32
12+
batch_size: 8
1613
learning_rate: 0.0001
1714
output_dir: "./checkpoints"
1815

0 commit comments

Comments
 (0)