Skip to content

Commit 2db8890

Browse files
refactor: improve setup UX, HF authentication, and download docs
- Enhanced setup scripts with strict error handling and Conda prerequisite checks. - Added explicit PyTorch and CUDA installation logic to automated setup. - Integrated Hugging Face authentication status checks into setup flow. - Updated README with environment activation and dataset authentication requirements. - Expanded download documentation with examples for full dataset and filtered subsets.
1 parent a3ffc00 commit 2db8890

3 files changed

Lines changed: 131 additions & 9 deletions

File tree

README.md

Lines changed: 32 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,14 @@
33
> [!WARNING]
44
> **Work in Progress**: This project is under active development. Core architectures, CLI flags, and data formats are subject to major changes.
55
6-
A transformer-based model for spatial transcriptomics that bridges histology and biological pathways.
6+
**SpatialTranscriptFormer** bridges histology and biological pathways through a high-performance transformer architecture. By modeling the dense interplay between morphological features and gene expression signatures, it provides an interpretable and spatially-coherent mapping of the tissue microenvironment.
77

8-
## Key Features
8+
## Key Technical Pillars
99

1010
- **Quad-Flow Interaction**: Configurable attention between Pathways and Histology patches (`p2p`, `p2h`, `h2p`, `h2h`).
1111
- **Pathway Bottleneck**: Interpretable gene expression prediction via 50 MSigDB Hallmark tokens.
1212
- **Spatial Pattern Coherence**: Optimized using a composite **MSE + PCC (Pearson Correlation) loss** to prevent spatial collapse and ensure accurate morphology-expression mapping.
13+
- **Foundation Model Ready**: Native support for **CTransPath**, **Phikon**, **Hibou**, and **GigaPath**.
1314
- **Biologically Informed Initialization**: Gene reconstruction weights derived from known hallmark memberships.
1415

1516
## License
@@ -30,18 +31,44 @@ This project requires [Conda](https://docs.conda.io/en/latest/).
3031

3132
1. Clone the repository.
3233
2. Run the automated setup script:
33-
- On Windows: `.\setup.ps1`
34+
3. On Windows: `.\setup.ps1`
3435
- On Linux/HPC: `bash setup.sh`
3536

3637
## Usage
3738

39+
**Before running any commands**, you must activate the conda environment:
40+
41+
```bash
42+
conda activate SpatialTranscriptFormer
43+
```
44+
3845
### Download HEST Data
3946

40-
Download specific subsets using filters or patterns:
47+
> [!CAUTION]
48+
> **Authentication Required**: The HEST dataset is gated. You must accept the terms of use at [MahmoodLab/hest](https://huggingface.co/datasets/MahmoodLab/hest) and authenticate with your Hugging Face account to download the data.
49+
50+
Please provide your token using ONE of the following methods before running the download tool:
51+
52+
1. **Persistent Login**: Run `huggingface-cli login` and paste your access token when prompted.
53+
2. **Environment Variable**: Set the `HF_TOKEN` environment variable in your active terminal session.
54+
55+
Once authenticated, download specific subsets using filters or the entire dataset:
4156

4257
```bash
43-
# Download only the Bowel Cancer subset (including ST data and WSIs)
58+
# Option 1: Download the ENTIRE HEST dataset (requires confirmation)
59+
stf-download --local_dir hest_data
60+
61+
# Option 2: Download a specific subset (e.g., Bowel Cancer)
4462
stf-download --organ Bowel --disease Cancer --local_dir hest_data
63+
64+
# Option 3: Filter by technology (e.g., Visium)
65+
stf-download --tech Visium --local_dir hest_data
66+
```
67+
68+
To see all available organs in the metadata:
69+
70+
```bash
71+
stf-download --list_organs
4572
```
4673

4774
### Train Models

setup.ps1

Lines changed: 53 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,20 @@
11
# setup.ps1 - Automated environment setup for SpatialTranscriptFormer
22

3+
$ErrorActionPreference = 'Stop'
4+
35
Write-Host "--- SpatialTranscriptFormer Setup ---" -ForegroundColor Cyan
46

57
$EnvName = "SpatialTranscriptFormer"
68

9+
# Check if conda exists
10+
try {
11+
conda --version | Out-Null
12+
}
13+
catch {
14+
Write-Error "Conda was not found. Please ensure Conda is installed and added to your PATH."
15+
exit 1
16+
}
17+
718
# Check if conda environment exists
819
$CondaEnv = conda env list | Select-String $EnvName
920
if ($null -eq $CondaEnv) {
@@ -14,12 +25,52 @@ else {
1425
Write-Host "Conda environment '$EnvName' already exists." -ForegroundColor Green
1526
}
1627

28+
Write-Host "Installing PyTorch (CUDA 11.8)..." -ForegroundColor Yellow
29+
conda run -n $EnvName pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
30+
if ($LASTEXITCODE -ne 0) {
31+
Write-Error "Failed to install PyTorch."
32+
exit $LASTEXITCODE
33+
}
34+
1735
Write-Host "Installing/Updating package in editable mode..." -ForegroundColor Yellow
1836
conda run -n $EnvName pip install -e .[dev]
37+
if ($LASTEXITCODE -ne 0) {
38+
Write-Error "Failed to install SpatialTranscriptFormer."
39+
exit $LASTEXITCODE
40+
}
41+
42+
Write-Host "Checking Hugging Face authentication..." -ForegroundColor Yellow
43+
$HFLoginStatus = conda run -n $EnvName huggingface-cli whoami 2>&1
44+
if ($LASTEXITCODE -ne 0 -or $HFLoginStatus -match "Not logged in") {
45+
$HFNeedLogin = $true
46+
}
47+
else {
48+
$HFNeedLogin = $false
49+
Write-Host "Hugging Face authentication found: $HFLoginStatus" -ForegroundColor Green
50+
}
1951

2052
Write-Host ""
21-
Write-Host "Setup Complete!" -ForegroundColor Green
22-
Write-Host "You can now use the following commands:"
53+
Write-Host "=========================================" -ForegroundColor Green
54+
Write-Host " SETUP COMPLETE! " -ForegroundColor Green
55+
Write-Host "=========================================" -ForegroundColor Green
56+
Write-Host ""
57+
Write-Host "IMPORTANT: You must activate the environment before using the tools:" -ForegroundColor Yellow
58+
Write-Host " conda activate $EnvName" -ForegroundColor Cyan
59+
Write-Host ""
60+
61+
if ($HFNeedLogin) {
62+
Write-Host "------------------------------------------------------------" -ForegroundColor DarkYellow
63+
Write-Host "DATASET ACCESS REQUIRES AUTHENTICATION" -ForegroundColor Red
64+
Write-Host "The HEST-1k dataset on Hugging Face is gated. You must provide an access token." -ForegroundColor DarkYellow
65+
Write-Host "Please do ONE of the following before downloading data:"
66+
Write-Host " Option A (Persistent): Run 'conda run -n $EnvName huggingface-cli login' and paste your token."
67+
Write-Host " Option B (Temporary): Set the 'HF_TOKEN' environment variable."
68+
Write-Host "Get your token from: https://huggingface.co/settings/tokens" -ForegroundColor DarkCyan
69+
Write-Host "------------------------------------------------------------" -ForegroundColor DarkYellow
70+
Write-Host ""
71+
}
72+
73+
Write-Host "You can then use the following commands:"
2374
Write-Host " stf-download --help"
2475
Write-Host " stf-split --help"
2576
Write-Host " stf-build-vocab --help"

setup.sh

Lines changed: 46 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,18 @@
11
#!/bin/bash
22
# setup.sh - Automated environment setup for SpatialTranscriptFormer (Linux/HPC)
33

4+
set -e
5+
46
echo "--- SpatialTranscriptFormer Setup ---"
57

68
ENV_NAME="SpatialTranscriptFormer"
79

10+
# Check if conda exists
11+
if ! command -v conda &> /dev/null; then
12+
echo "Error: conda was not found. Please ensure Conda is installed and in your PATH."
13+
exit 1
14+
fi
15+
816
# Check if conda environment exists
917
if ! conda env list | grep -q "$ENV_NAME"; then
1018
echo "Creating conda environment '$ENV_NAME' with Python 3.9..."
@@ -13,12 +21,48 @@ else
1321
echo "Conda environment '$ENV_NAME' already exists."
1422
fi
1523

24+
echo "Installing PyTorch (CUDA 11.8)..."
25+
conda run -n $ENV_NAME pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
26+
1627
echo "Installing/Updating package in editable mode..."
1728
conda run -n $ENV_NAME pip install -e .[dev]
1829

30+
echo "Checking Hugging Face authentication..."
31+
# Temporarily disable exit on error for this check
32+
set +e
33+
HF_STATUS=$(conda run -n $ENV_NAME huggingface-cli whoami 2>&1)
34+
HF_EXIT=$?
35+
set -e
36+
37+
if [ $HF_EXIT -ne 0 ] || [[ "$HF_STATUS" == *"Not logged in"* ]]; then
38+
HF_NEED_LOGIN=true
39+
else
40+
HF_NEED_LOGIN=false
41+
echo "Hugging Face authentication found: $HF_STATUS"
42+
fi
43+
44+
echo ""
45+
echo "========================================="
46+
echo " SETUP COMPLETE! "
47+
echo "========================================="
48+
echo ""
49+
echo "IMPORTANT: You must activate the environment before using the tools:"
50+
echo " conda activate $ENV_NAME"
1951
echo ""
20-
echo "Setup Complete!"
21-
echo "You can now use the following commands (after activating the environment):"
52+
53+
if [ "$HF_NEED_LOGIN" = true ]; then
54+
echo "------------------------------------------------------------"
55+
echo "DATASET ACCESS REQUIRES AUTHENTICATION"
56+
echo "The HEST-1k dataset on Hugging Face is gated. You must provide an access token."
57+
echo "Please do ONE of the following before downloading data:"
58+
echo " Option A (Persistent): Run 'conda run -n $ENV_NAME huggingface-cli login' and paste your token."
59+
echo " Option B (Temporary): Run 'export HF_TOKEN=your_token_here'"
60+
echo "Get your token from: https://huggingface.co/settings/tokens"
61+
echo "------------------------------------------------------------"
62+
echo ""
63+
fi
64+
65+
echo "You can then use the following commands:"
2266
echo " stf-download --help"
2367
echo " stf-split --help"
2468
echo " stf-build-vocab --help"

0 commit comments

Comments
 (0)