Skip to content

Commit 30e22d3

Browse files
committed
Squash merge from dev
1 parent e0a794e commit 30e22d3

File tree

137 files changed

+10239
-12909
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

137 files changed

+10239
-12909
lines changed

.github/workflows/black.yml

Lines changed: 0 additions & 10 deletions
This file was deleted.

.github/workflows/pre-commit.yml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
name: Pre-commit Check
2+
3+
on:
4+
push:
5+
branches: [main, master]
6+
pull_request:
7+
8+
jobs:
9+
pre-commit:
10+
runs-on: ubuntu-latest
11+
steps:
12+
- uses: actions/checkout@v4
13+
14+
- uses: actions/setup-python@v5
15+
with:
16+
python-version: '3.10'
17+
18+
- name: Run pre-commit
19+
uses: pre-commit/action@v3.0.1
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# This workflow will upload a Python Package to PyPI when a release is created
2+
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries
3+
4+
# This workflow uses actions that are not certified by GitHub.
5+
# They are provided by a third-party and are governed by
6+
# separate terms of service, privacy policy, and support
7+
# documentation.
8+
9+
name: Upload Python Package
10+
11+
on:
12+
release:
13+
types: [published]
14+
15+
permissions:
16+
contents: read
17+
18+
jobs:
19+
release-build:
20+
runs-on: ubuntu-latest
21+
22+
steps:
23+
- uses: actions/checkout@v4
24+
25+
- uses: actions/setup-python@v5
26+
with:
27+
python-version: "3.12"
28+
29+
- name: Build release distributions
30+
run: |
31+
python -m pip install build
32+
python -m build
33+
34+
- name: Upload distributions
35+
uses: actions/upload-artifact@v4
36+
with:
37+
name: release-dists
38+
path: dist/
39+
40+
pypi-publish:
41+
runs-on: ubuntu-latest
42+
needs:
43+
- release-build
44+
permissions:
45+
# IMPORTANT: this permission is mandatory for trusted publishing
46+
id-token: write
47+
48+
# Dedicated environments with protections for publishing are strongly recommended.
49+
# For more information, see: https://docs.github.com/en/actions/deployment/targeting-different-environments/using-environments-for-deployment#deployment-protection-rules
50+
environment:
51+
name: pypi
52+
# OPTIONAL: uncomment and update to include your PyPI project URL in the deployment status:
53+
url: https://pypi.org/p/chebai
54+
#
55+
# ALTERNATIVE: if your GitHub Release name is the PyPI project version string
56+
# ALTERNATIVE: exactly, uncomment the following line instead:
57+
# url: https://pypi.org/project/YOURPROJECT/${{ github.event.release.name }}
58+
59+
steps:
60+
- name: Retrieve release distributions
61+
uses: actions/download-artifact@v4
62+
with:
63+
name: release-dists
64+
path: dist/
65+
66+
- name: Publish release distributions to PyPI
67+
uses: pypa/gh-action-pypi-publish@release/v1
68+
with:
69+
packages-dir: dist/

.github/workflows/test.yml

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,19 +9,30 @@ jobs:
99
strategy:
1010
fail-fast: false
1111
matrix:
12-
python-version: ["3.9", "3.10", "3.11"]
12+
python-version: ["3.10", "3.11", "3.12"]
1313

1414
steps:
1515
- uses: actions/checkout@v4
16+
1617
- name: Set up Python ${{ matrix.python-version }}
1718
uses: actions/setup-python@v5
1819
with:
1920
python-version: ${{ matrix.python-version }}
21+
2022
- name: Install dependencies
2123
run: |
2224
python -m pip install --upgrade pip
2325
python -m pip install --upgrade pip setuptools wheel
2426
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
25-
python -m pip install -e .
26-
- name: Display Python version
27-
run: python -m unittest discover -s tests/unit
27+
python -m pip install -e .[dev]
28+
29+
- name: Display Python & Installed Packages
30+
run: |
31+
python --version
32+
pip freeze
33+
34+
- name: Run Unit Tests
35+
run: python -m unittest discover -s tests/unit -v
36+
env:
37+
ACTIONS_STEP_DEBUG: true # Enable debug logs
38+
ACTIONS_RUNNER_DEBUG: true # Additional debug logs from Github Actions itself

.github/workflows/token_consistency.yaml

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,21 +13,17 @@ on:
1313
- "chebai/preprocessing/bin/smiles_token/tokens.txt"
1414
- "chebai/preprocessing/bin/smiles_token_unlabeled/tokens.txt"
1515
- "chebai/preprocessing/bin/selfies/tokens.txt"
16-
- "chebai/preprocessing/bin/protein_token/tokens.txt"
1716
- "chebai/preprocessing/bin/graph_properties/tokens.txt"
1817
- "chebai/preprocessing/bin/graph/tokens.txt"
1918
- "chebai/preprocessing/bin/deepsmiles_token/tokens.txt"
20-
- "chebai/preprocessing/bin/protein_token_3_gram/tokens.txt"
2119
pull_request:
2220
paths:
2321
- "chebai/preprocessing/bin/smiles_token/tokens.txt"
2422
- "chebai/preprocessing/bin/smiles_token_unlabeled/tokens.txt"
2523
- "chebai/preprocessing/bin/selfies/tokens.txt"
26-
- "chebai/preprocessing/bin/protein_token/tokens.txt"
2724
- "chebai/preprocessing/bin/graph_properties/tokens.txt"
2825
- "chebai/preprocessing/bin/graph/tokens.txt"
2926
- "chebai/preprocessing/bin/deepsmiles_token/tokens.txt"
30-
- "chebai/preprocessing/bin/protein_token_3_gram/tokens.txt"
3127

3228
jobs:
3329
check_tokens:
@@ -58,11 +54,9 @@ jobs:
5854
"chebai/preprocessing/bin/smiles_token/tokens.txt"
5955
"chebai/preprocessing/bin/smiles_token_unlabeled/tokens.txt"
6056
"chebai/preprocessing/bin/selfies/tokens.txt"
61-
"chebai/preprocessing/bin/protein_token/tokens.txt"
6257
"chebai/preprocessing/bin/graph_properties/tokens.txt"
6358
"chebai/preprocessing/bin/graph/tokens.txt"
6459
"chebai/preprocessing/bin/deepsmiles_token/tokens.txt"
65-
"chebai/preprocessing/bin/protein_token_3_gram/tokens.txt"
6660
)
6761
echo "TOKENS_FILES=${TOKENS_FILES[*]}" >> $GITHUB_ENV
6862

.gitignore

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,3 +167,16 @@ cython_debug/
167167
/logs
168168
/results_buffer
169169
electra_pretrained.ckpt
170+
171+
build
172+
.virtual_documents
173+
.jupyter
174+
chebai.egg-info
175+
lightning_logs
176+
logs
177+
.isort.cfg
178+
/.vscode
179+
180+
*.out
181+
*.err
182+
*.sh

.pre-commit-config.yaml

Lines changed: 16 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,19 @@
11
repos:
2-
- repo: https://github.com/psf/black
3-
rev: "24.2.0"
4-
hooks:
5-
- id: black
6-
- id: black-jupyter # for formatting jupyter-notebook
2+
# Use `pre-commit autoupdate` to update all the hook.
73

8-
- repo: https://github.com/pycqa/isort
9-
rev: 5.13.2
10-
hooks:
11-
- id: isort
12-
name: isort (python)
13-
args: ["--profile=black"]
4+
- repo: https://github.com/astral-sh/ruff-pre-commit
5+
# Ruff version. https://docs.astral.sh/ruff/integrations/#pre-commit
6+
rev: v0.14.11
7+
hooks:
8+
# Run the linter.
9+
- id: ruff-check
10+
args: [ --fix ]
11+
# Run the formatter.
12+
- id: ruff-format
1413

15-
- repo: https://github.com/asottile/seed-isort-config
16-
rev: v2.2.0
17-
hooks:
18-
- id: seed-isort-config
19-
20-
- repo: https://github.com/pre-commit/pre-commit-hooks
21-
rev: v4.6.0
22-
hooks:
23-
- id: check-yaml
24-
- id: end-of-file-fixer
25-
- id: trailing-whitespace
14+
- repo: https://github.com/pre-commit/pre-commit-hooks
15+
rev: v6.0.0
16+
hooks:
17+
- id: check-yaml
18+
- id: end-of-file-fixer
19+
- id: trailing-whitespace

README.md

Lines changed: 64 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,9 @@
33
ChEBai is a deep learning library designed for the integration of deep learning methods with chemical ontologies, particularly ChEBI.
44
The library emphasizes the incorporation of the semantic qualities of the ontology into the learning process.
55

6-
## Note for developers
6+
## News
77

8-
If you have used ChEBai before PR #39, the file structure in which your ChEBI-data is saved has changed. This means that
9-
datasets will be freshly generated. The data however is the same. If you want to keep the old data (including the old
10-
splits), you can use a migration script. It copies the old data to the new location for a specific ChEBI class
11-
(including chebi version and other parameters). The script can be called by specifying the data module from a config
12-
```
13-
python chebai/preprocessing/migration/chebi_data_migration.py migrate --datamodule=[path-to-data-config]
14-
```
15-
or by specifying the class name (e.g. `ChEBIOver50`) and arguments separately
16-
```
17-
python chebai/preprocessing/migration/chebi_data_migration.py migrate --class_name=[data-class] [--chebi_version=[version]]
18-
```
19-
The new dataset will by default generate random data splits (with a given seed).
20-
To reuse a fixed data split, you have to provide the path of the csv file generated during the migration:
21-
`--data.init_args.splits_file_path=[path-to-processed_data]/splits.csv`
8+
Starting in version 1.1, we support regression tasks!
229

2310
## Installation
2411

@@ -33,9 +20,31 @@ git clone https://github.com/ChEB-AI/python-chebai.git
3320

3421
```
3522
cd python-chebai
36-
pip install .
23+
pip install -e .
3724
```
3825

26+
Some packages are not installed by default but can be added with the following extras:
27+
```
28+
pip install chebai[dev]
29+
```
30+
installs additional packages useful to people who want to contribute to the library.
31+
This includes `pre-commit`, which runs automatic formatting before each commit.
32+
To set up `pre-commit` for your workflow, run `pre-commit install`.
33+
For more details, see the [`pre-commit` documentation](https://pre-commit.com).
34+
35+
```
36+
pip install chebai[plot]
37+
```
38+
installs additional packages useful for plotting and visualisation.
39+
```
40+
pip install chebai[wandb]
41+
```
42+
installs the [Weights & Biases](https://wandb.ai) integration for automated logging of training runs.
43+
```
44+
pip install chebai[all]
45+
```
46+
installs all optional dependencies.
47+
3948
## Usage
4049

4150
The training and inference is abstracted using the Pytorch Lightning modules.
@@ -54,14 +63,19 @@ python -m chebai fit --trainer=configs/training/default_trainer.yml --model=conf
5463
```
5564
A command with additional options may look like this:
5665
```
57-
python3 -m chebai fit --trainer=configs/training/default_trainer.yml --model=configs/model/electra.yml --model.train_metrics=configs/metrics/micro-macro-f1.yml --model.test_metrics=configs/metrics/micro-macro-f1.yml --model.val_metrics=configs/metrics/micro-macro-f1.yml --model.pretrained_checkpoint=electra_pretrained.ckpt --model.load_prefix=generator. --data=configs/data/chebi50.yml --model.out_dim=1446 --model.criterion=configs/loss/bce.yml --data.init_args.batch_size=10 --trainer.logger.init_args.name=chebi50_bce_unweighted --data.init_args.num_workers=9 --model.pass_loss_kwargs=false --data.init_args.chebi_version=231 --data.init_args.data_limit=1000
66+
python3 -m chebai fit --trainer=configs/training/default_trainer.yml --model=configs/model/electra.yml --model.train_metrics=configs/metrics/micro-macro-f1.yml --model.test_metrics=configs/metrics/micro-macro-f1.yml --model.val_metrics=configs/metrics/micro-macro-f1.yml --model.pretrained_checkpoint=electra_pretrained.ckpt --model.load_prefix=generator. --data=configs/data/chebi/chebi50.yml --model.criterion=configs/loss/bce.yml --data.init_args.batch_size=10 --trainer.logger.init_args.name=chebi50_bce_unweighted --data.init_args.num_workers=9 --model.pass_loss_kwargs=false --data.init_args.chebi_version=231 --data.init_args.data_limit=1000
5867
```
5968

60-
### Fine-tuning for Toxicity prediction
69+
### Fine-tuning for classification tasks, e.g. Toxicity prediction
6170
```
6271
python -m chebai fit --config=[path-to-your-tox21-config] --trainer.callbacks=configs/training/default_callbacks.yml --model.pretrained_checkpoint=[path-to-pretrained-model]
6372
```
6473

74+
### Fine-tuning for regression tasks, e.g. solubility prediction
75+
```
76+
python -m chebai fit --config=[path-to-your-esol-config] --trainer.callbacks=configs/training/solCur_callbacks.yml --model.pretrained_checkpoint=[path-to-pretrained-model]
77+
```
78+
6579
### Predicting classes given SMILES strings
6680
```
6781
python3 -m chebai predict_from_file --model=[path-to-model-config] --checkpoint_path=[path-to-model] --input_path={path-to-file-containing-smiles] [--classes_path=[path-to-classes-file]] [--save_to=[path-to-output]]
@@ -72,8 +86,21 @@ The `classes_path` is the path to the dataset's `raw/classes.txt` file that cont
7286

7387
## Evaluation
7488

75-
An example for evaluating a model trained on the ontology extension task is given in `tutorials/eval_model_basic.ipynb`.
76-
It takes in the finetuned model as input for performing the evaluation.
89+
You can evaluate a model trained on the ontology extension task in one of two ways:
90+
91+
### 1. Using the Jupyter Notebook
92+
An example notebook is provided at `tutorials/eval_model_basic.ipynb`.
93+
- Load your finetuned model and run the evaluation cells to compute metrics on the test set.
94+
95+
### 2. Using the Lightning CLI
96+
Alternatively, you can evaluate the model via the CLI:
97+
98+
```bash
99+
python -m chebai test --trainer=configs/training/default_trainer.yml --trainer.devices=1 --trainer.num_nodes=1 --ckpt_path=[path-to-finetuned-model] --model=configs/model/electra.yml --model.test_metrics=configs/metrics/micro-macro-f1.yml --data=configs/data/chebi/chebi50.yml --data.init_args.batch_size=32 --data.init_args.num_workers=10 --data.init_args.chebi_version=[chebi-version] --model.pass_loss_kwargs=false --model.criterion=configs/loss/bce.yml --model.criterion.init_args.beta=0.99 --data.init_args.splits_file_path=[path-to-splits-file]
100+
```
101+
102+
> **Note**: It is recommended to use `devices=1` and `num_nodes=1` during testing; multi-device settings use a `DistributedSampler`, which may replicate some samples to maintain equal batch sizes, so using a single device ensures that each sample or batch is evaluated exactly once.
103+
77104

78105
## Cross-validation
79106
You can do inner k-fold cross-validation, i.e., train models on k train-validation splits that all use the same test
@@ -87,3 +114,20 @@ and the fold to be used in the current optimisation run as
87114
```
88115
To train K models, you need to do K such calls, each with a different `fold_index`. On the first call with a given
89116
`inner_k_folds`, all folds will be created and stored in the data directory
117+
118+
## Note for developers
119+
120+
If you have used ChEBai before PR #39, the file structure in which your ChEBI-data is saved has changed. This means that
121+
datasets will be freshly generated. The data however is the same. If you want to keep the old data (including the old
122+
splits), you can use a migration script. It copies the old data to the new location for a specific ChEBI class
123+
(including chebi version and other parameters). The script can be called by specifying the data module from a config
124+
```
125+
python chebai/preprocessing/migration/chebi_data_migration.py migrate --datamodule=[path-to-data-config]
126+
```
127+
or by specifying the class name (e.g. `ChEBIOver50`) and arguments separately
128+
```
129+
python chebai/preprocessing/migration/chebi_data_migration.py migrate --class_name=[data-class] [--chebi_version=[version]]
130+
```
131+
The new dataset will by default generate random data splits (with a given seed).
132+
To reuse a fixed data split, you have to provide the path of the csv file generated during the migration:
133+
`--data.init_args.splits_file_path=[path-to-processed_data]/splits.csv`

chebai/callbacks.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ def write_on_epoch_end(
8080
else:
8181
labels = [None for _ in idents]
8282
output = torch.sigmoid(p["output"]["logits"]).tolist()
83-
for i, l, o in zip(idents, labels, output):
83+
for i, l, o in zip(idents, labels, output): # noqa: E741
8484
pred_list.append(dict(ident=i, labels=l, predictions=o))
8585
with open(os.path.join(self.output_dir, self.target_file), "wt") as fout:
8686
json.dump(pred_list, fout)

chebai/callbacks/epoch_metrics.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,8 @@ def update(self, preds: torch.Tensor, labels: torch.Tensor) -> None:
6262
labels (torch.Tensor): Ground truth labels.
6363
"""
6464
tps = torch.sum(
65-
torch.logical_and(preds > self.threshold, labels.to(torch.bool)), dim=0
65+
torch.logical_and(preds > self.threshold, labels.to(torch.bool)),
66+
dim=0,
6667
)
6768
self.true_positives += tps
6869
self.positive_predictions += torch.sum(preds > self.threshold, dim=0)

0 commit comments

Comments
 (0)