token-api

A research framework for detecting malicious tokens (ERC-20, ERC-721, ERC-1155) on Ethereum using machine learning.

Overview

token-api provides a factory-driven, config-driven platform for building and evaluating token classification models. It supports multiple model architectures, data sources, and training strategies that can be composed via YAML configuration files without code changes.

Key capabilities:

Multiple model architectures -- graph neural networks, XGBoost, logistic regression, with easy extension to new models
Factory-driven design -- extend with new models, data sources, and training strategies via ModelFactory, DataModelFactory, and FeatureFactory
Config-driven pipelines -- swap model, data, and training parameters through YAML configs
Rust-accelerated feature extraction -- high-performance data extraction and feature computation with Python bindings (PyO3/Maturin)
SQLMesh data pipeline -- reproducible data transformations backed by DuckDB and HuggingFace
MLflow experiment tracking -- track training runs, metrics, and model artifacts

Project Structure

.
├── token-api/                      # Python package (Poetry)
│   ├── src/token_api/              # Core library
│   │   ├── models/                 # Model implementations & ModelFactory
│   │   ├── data/                   # Data models, configs & DataModelFactory
│   │   ├── trainers/               # Training orchestration (Trainer)
│   │   ├── evaluator/              # Evaluation, metrics & MetricFactory
│   │   ├── scripts/                # Data processing utilities
│   │   └── main.py                 # CLI entry point
│   ├── src/assets/configs/         # Model, data, training & pipeline YAML configs
│   ├── configs/                    # Top-level training configurations
│   ├── token_api_data/             # SQLMesh data pipeline (DuckDB + HuggingFace)
│   ├── notebooks/                  # Research & analysis notebooks
│   ├── book/                       # Jupyter Book documentation
│   ├── tests/                      # Test suite
│   └── Makefile                    # SQLMesh & training Make targets
├── crates/tokenscout-dataset/      # Rust crate: high-perf data and feature extraction (PyO3)
├── docs/                           # Extended project documentation
├── .env.example                    # Environment variable template
└── run.sh                          # Root setup & build commands

Models

Model	Description
TokenScout (Staged Pipeline)	Multi-phase GNN: graph embedding, refinement, and classification on transfer graphs
TokenFlow GNN	Graph neural network operating on token transfer flow patterns
XGBoost	Binary classification for NFT/token spam using on-chain features (grid search, k-fold CV)
Logistic Regression	Text-based classification using token metadata (name, symbol, description)

All models extend BaseModel and are registered through ModelFactory. New models can be added by implementing the base class and registering a config.

Getting Started

Prerequisites

Python 3.13+
Poetry
Rust toolchain (for the feature extraction crate)
Maturin (for building Python bindings)

Setup

chmod +x run.sh
./run.sh permissions    # Make sub-scripts executable
./run.sh dev            # Install Python dependencies via Poetry
./run.sh rust-build     # Build the Rust crate (release mode)
./run.sh rust-bindings  # Build and install Python bindings into the Poetry venv

Copy .env.example to .env and fill in the required values.

Running Tests

./run.sh pre-commit     # Runs all code checks and tests
./run.sh rust-test      # Run Rust crate tests

Data Pipeline

The SQLMesh pipeline lives in token-api/token_api_data/ and uses DuckDB as its local engine. Key Make targets (run from token-api/):

make sqlmesh-init       # First-time setup: download from HuggingFace, run full plan
make sqlmesh-resume     # Incremental fetch + save back to HuggingFace
make sqlmesh-plan       # Run SQLMesh plan with auto-apply
make sqlmesh-run        # Incremental update (no backfill)
make sqlmesh-ui         # Start the SQLMesh web UI
make sqlmesh-clean      # Remove local DuckDB database and data files

Training

Training is config-driven. Configs in token-api/src/assets/configs/trainings/ define model, data, and training parameters. Run training via the entry points below:

For ERC-20 related models, use the Makefile targets for local end-to-end workflows:

make train-local        # Train on SQLMesh data
make evaluate-curated   # Evaluate on curated token set
make e2e-help           # Show all E2E training commands

For NFT related models, use the following:

cd token-api
poetry run python -m token_api.main <path-to-config.yaml>

Development

Run all checks before committing:

./run.sh pre-commit

Tooling:

black -- code formatting
pyright -- static type checking
flake8-bugbear -- bug detection
docformatter -- docstring formatting
pytest + testmon -- testing with change detection

Contributing

Contributions are welcome. Please ensure all checks pass via ./run.sh pre-commit before opening a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
crates/tokenscout-dataset		crates/tokenscout-dataset
docs		docs
token-api		token-api
.env.example		.env.example
.env.example.license		.env.example.license
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
.python-version.license		.python-version.license
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

token-api

Overview

Project Structure

Models

Getting Started

Prerequisites

Setup

Running Tests

Data Pipeline

Training

Development

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

semiotic-ai/token-api

Folders and files

Latest commit

History

Repository files navigation

token-api

Overview

Project Structure

Models

Getting Started

Prerequisites

Setup

Running Tests

Data Pipeline

Training

Development

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages