A research framework for detecting malicious tokens (ERC-20, ERC-721, ERC-1155) on Ethereum using machine learning.
token-api provides a factory-driven, config-driven platform for building and evaluating token classification models. It supports multiple model architectures, data sources, and training strategies that can be composed via YAML configuration files without code changes.
Key capabilities:
- Multiple model architectures -- graph neural networks, XGBoost, logistic regression, with easy extension to new models
- Factory-driven design -- extend with new models, data sources, and training strategies via
ModelFactory,DataModelFactory, andFeatureFactory - Config-driven pipelines -- swap model, data, and training parameters through YAML configs
- Rust-accelerated feature extraction -- high-performance data extraction and feature computation with Python bindings (PyO3/Maturin)
- SQLMesh data pipeline -- reproducible data transformations backed by DuckDB and HuggingFace
- MLflow experiment tracking -- track training runs, metrics, and model artifacts
.
├── token-api/ # Python package (Poetry)
│ ├── src/token_api/ # Core library
│ │ ├── models/ # Model implementations & ModelFactory
│ │ ├── data/ # Data models, configs & DataModelFactory
│ │ ├── trainers/ # Training orchestration (Trainer)
│ │ ├── evaluator/ # Evaluation, metrics & MetricFactory
│ │ ├── scripts/ # Data processing utilities
│ │ └── main.py # CLI entry point
│ ├── src/assets/configs/ # Model, data, training & pipeline YAML configs
│ ├── configs/ # Top-level training configurations
│ ├── token_api_data/ # SQLMesh data pipeline (DuckDB + HuggingFace)
│ ├── notebooks/ # Research & analysis notebooks
│ ├── book/ # Jupyter Book documentation
│ ├── tests/ # Test suite
│ └── Makefile # SQLMesh & training Make targets
├── crates/tokenscout-dataset/ # Rust crate: high-perf data and feature extraction (PyO3)
├── docs/ # Extended project documentation
├── .env.example # Environment variable template
└── run.sh # Root setup & build commands
| Model | Description |
|---|---|
| TokenScout (Staged Pipeline) | Multi-phase GNN: graph embedding, refinement, and classification on transfer graphs |
| TokenFlow GNN | Graph neural network operating on token transfer flow patterns |
| XGBoost | Binary classification for NFT/token spam using on-chain features (grid search, k-fold CV) |
| Logistic Regression | Text-based classification using token metadata (name, symbol, description) |
All models extend BaseModel and are registered through ModelFactory. New models can be added by implementing the base class and registering a config.
- Python 3.13+
- Poetry
- Rust toolchain (for the feature extraction crate)
- Maturin (for building Python bindings)
chmod +x run.sh
./run.sh permissions # Make sub-scripts executable
./run.sh dev # Install Python dependencies via Poetry
./run.sh rust-build # Build the Rust crate (release mode)
./run.sh rust-bindings # Build and install Python bindings into the Poetry venvCopy .env.example to .env and fill in the required values.
./run.sh pre-commit # Runs all code checks and tests
./run.sh rust-test # Run Rust crate testsThe SQLMesh pipeline lives in token-api/token_api_data/ and uses DuckDB as its local engine. Key Make targets (run from token-api/):
make sqlmesh-init # First-time setup: download from HuggingFace, run full plan
make sqlmesh-resume # Incremental fetch + save back to HuggingFace
make sqlmesh-plan # Run SQLMesh plan with auto-apply
make sqlmesh-run # Incremental update (no backfill)
make sqlmesh-ui # Start the SQLMesh web UI
make sqlmesh-clean # Remove local DuckDB database and data filesTraining is config-driven. Configs in token-api/src/assets/configs/trainings/ define model, data, and training parameters. Run training via the entry points below:
For ERC-20 related models, use the Makefile targets for local end-to-end workflows:
make train-local # Train on SQLMesh data
make evaluate-curated # Evaluate on curated token set
make e2e-help # Show all E2E training commandsFor NFT related models, use the following:
cd token-api
poetry run python -m token_api.main <path-to-config.yaml>Run all checks before committing:
./run.sh pre-commitTooling:
- black -- code formatting
- pyright -- static type checking
- flake8-bugbear -- bug detection
- docformatter -- docstring formatting
- pytest + testmon -- testing with change detection
Contributions are welcome. Please ensure all checks pass via ./run.sh pre-commit before opening a pull request.