Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
144 changes: 144 additions & 0 deletions .github/workflows/CD.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# This file is autogenerated by maturin v1.11.5
# To update, run
#
# maturin generate-ci github --platform manylinux windows macos --output .github/workflows/CD.yml
#
name: CD
on:
push:
branches:
- main
tags:
- '*'
workflow_dispatch:
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
linux:
runs-on: ${{ matrix.platform.runner }}
strategy:
matrix:
platform:
- runner: ubuntu-22.04
target: x86_64
- runner: ubuntu-22.04
target: x86
- runner: ubuntu-22.04
target: aarch64
- runner: ubuntu-22.04
target: armv7
steps:
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
with:
python-version: 3.13
- name: Build wheels
uses: PyO3/maturin-action@v1
with:
target: ${{ matrix.platform.target }}
args: --release --out dist -i python3.10 -i python3.11 -i python3.12 -i python3.13 -i python3.14
sccache: ${{ !startsWith(github.ref, 'refs/tags/') }}
manylinux: auto
- name: Upload wheels
uses: actions/upload-artifact@v5
with:
name: wheels-linux-${{ matrix.platform.target }}
path: dist
windows:
runs-on: ${{ matrix.platform.runner }}
strategy:
matrix:
platform:
- runner: windows-latest
target: x64
python_arch: x64
- runner: windows-latest
target: x86
python_arch: x86
steps:
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
with:
python-version: 3.13
architecture: ${{ matrix.platform.python_arch }}
- name: Build wheels
uses: PyO3/maturin-action@v1
with:
target: ${{ matrix.platform.target }}
args: --release --out dist --find-interpreter
sccache: ${{ !startsWith(github.ref, 'refs/tags/') }}
- name: Upload wheels
uses: actions/upload-artifact@v5
with:
name: wheels-windows-${{ matrix.platform.target }}
path: dist
macos:
runs-on: ${{ matrix.platform.runner }}
strategy:
matrix:
platform:
- runner: macos-15-intel
target: x86_64
- runner: macos-latest
target: aarch64
steps:
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
with:
python-version: 3.13
- name: Build wheels
uses: PyO3/maturin-action@v1
with:
target: ${{ matrix.platform.target }}
args: --release --out dist --find-interpreter
sccache: ${{ !startsWith(github.ref, 'refs/tags/') }}
- name: Upload wheels
uses: actions/upload-artifact@v5
with:
name: wheels-macos-${{ matrix.platform.target }}
path: dist
sdist:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Build sdist
uses: PyO3/maturin-action@v1
with:
command: sdist
args: --out dist
- name: Upload sdist
uses: actions/upload-artifact@v5
with:
name: wheels-sdist
path: dist
release:
name: Release
runs-on: ubuntu-latest
if: ${{ startsWith(github.ref, 'refs/tags/') || github.event_name == 'workflow_dispatch' }}
needs:
- linux
- windows
- macos
- sdist
permissions:
# Use to sign the release artifacts
id-token: write
# Used to upload release artifacts
contents: write
# Used to generate artifact attestation
attestations: write
steps:
- uses: actions/download-artifact@v6
- name: Generate artifact attestation
uses: actions/attest-build-provenance@v3
with:
subject-path: wheels-*/*
- name: Install uv
if: ${{ startsWith(github.ref, 'refs/tags/') }}
uses: astral-sh/setup-uv@v7
- name: Publish to PyPI
if: ${{ startsWith(github.ref, 'refs/tags/') }}
run: uv publish 'wheels-*/*'
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,14 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased](https://github.com/BattModels/smirk)
## [v0.2.0](https://github.com/BattModels/smirk)

Paper published in ACS JCIM: [*Tokenization for Molecular Foundation Models*](https://doi.org/10.1021/acs.jcim.5c01856)

### Added

- Started a changelog ([#2](https://github.com/BattModels/smirk/pull/2))
- Added a release pipeline ([#6](https://github.com/BattModels/smirk/pull/6))

### Changed

Expand Down
4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "smirk"
version = "0.2.0-dev"
version = "0.2.0"
edition = "2021"
license = "Apache-2.0"
description = "A chemically complete tokenizer for OpenSMILES"
Expand All @@ -25,7 +25,7 @@ either = "1.13.0"
macro_rules_attribute = "0.2.0"
once_cell = "1.19.0"
paste = "1.0.14"
pyo3 = { version = "^0.23", features = ["extension-module"] }
pyo3 = { version = "^0.27", features = ["extension-module"] }
regex = "1.10.3"
serde = "1.0.197"
serde_json = "1.0.114"
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
<div align="center" display="flex" >

![GitHub License](https://img.shields.io/github/license/BattModels/smirk)
<a href="https://doi.org/10.1021/acs.jcim.5c01856">![paper](https://img.shields.io/badge/paper-10.1021%2Facs.jcim.5c01856-blue)</a>
<a href="https://doi.org/10.5281/zenodo.13761262">![data](https://img.shields.io/badge/data-10.5281%2Fzenodo.13761262-blue)</a>
<a href="https://arxiv.org/abs/2409.15370">![arXiv:2409.15370](https://img.shields.io/badge/cs.LG-2409.15370-b31b1b?style=flat&amp;logo=arxiv&amp;logoColor=red)</a>

</div>
Expand All @@ -11,7 +13,7 @@ Smirk is a chemistry-specific tokenizer that provides complete coverage of the [
specification, that is built using Rust 🦀 and [HuggingFace's tokenizers](https://huggingface.co/docs/tokenizers) 🤗.
Installation is easy, and Smirk works out-of-the-box with the [HuggingFace](https://huggingface.co/docs) ecosystem.

Check our [documentation](https://eeg.engin.umich.edu/smirk) to see `smirk` in action, or [read the paper](https://arxiv.org/abs/2409.15370) to learn
Check our [documentation](https://eeg.engin.umich.edu/smirk) to see `smirk` in action, or [read the paper](https://doi.org/10.1021/acs.jcim.5c01856) to learn
about tokenization for molecular foundation models.

## Installation
Expand Down
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
# -- MyST-NB -----------------------------------------------------------------
nb_execution_in_temp = True
nb_execution_mode = "cache"
nb_execution_timeout = 300
nb_render_markdown_format = "myst"
myst_enable_extensions = [
"fieldlist",
Expand Down
4 changes: 3 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
<div align="center" display="flex" >

![GitHub License](https://img.shields.io/github/license/BattModels/smirk)
<a href="https://doi.org/10.1021/acs.jcim.5c01856">![paper](https://img.shields.io/badge/paper-10.1021%2Facs.jcim.5c01856-blue)</a>
<a href="https://doi.org/10.5281/zenodo.13761262">![data](https://img.shields.io/badge/data-10.5281%2Fzenodo.13761262-blue)</a>
<a href="https://arxiv.org/abs/2409.15370">![arXiv:2409.15370](https://img.shields.io/badge/cs.LG-2409.15370-b31b1b?style=flat&amp;logo=arxiv&amp;logoColor=red)</a>

</div>
Expand All @@ -22,7 +24,7 @@ that fail to represent *all* of chemistry, inherently limiting their performance
Enabling complete coverage of [OpenSMILES] with a vocabulary of 167 tokens.

[OpenSMILES]: http://opensmiles.org/
[paper]: https://arxiv.org/abs/2409.15370
[paper]: https://doi.org/10.1021/acs.jcim.5c01856
[HuggingFace]: https://huggingface.co/docs
[Tokenizers]: https://huggingface.co/docs/tokenizers

Expand Down
2 changes: 1 addition & 1 deletion docs/smirk_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"Check out the [paper] for all the details; otherwise, let's see it in action!\n",
"\n",
"[OpenSMILES]: http://opensmiles.org/\n",
"[paper]: https://doi.org/10.48550/arXiv.2409.15370\n",
"[paper]: https://doi.org/10.1021/acs.jcim.5c01856\n",
"[Atom-wise]: https://doi.org/10.1039/C8SC02339E\n",
"[bracketed atom]: https://en.wikipedia.org/wiki/Simplified_Molecular_Input_Line_Entry_System#Atoms\n",
"\n",
Expand Down