From 085e80c04cd83e3db440ba0303442b9bfb90ac4d Mon Sep 17 00:00:00 2001 From: fcakyon Date: Sun, 9 Nov 2025 22:32:49 +0300 Subject: [PATCH] docs: restructure README with visual design and separate documentation Transform README into visual-first design with logo, emojis, and performance benchmarks while moving detailed content into dedicated documentation files. - Add centered logo and visual styling to README header - Include performance comparison table (nsfw-detector-mini vs Azure AI vs Falconsai) - Reduce README from 221 to 120 lines by moving content to docs/ - Create comprehensive documentation structure: - docs/INSTALLATION.md: Detailed installation options (pip, uv, source) - docs/CLI.md: Complete CLI usage guide with examples - docs/API.md: Python API reference and advanced usage - docs/FAQ.md: Common questions and answers - docs/TROUBLESHOOTING.md: Issue resolution guide - Add navigation links from README to separate documentation files - Update .gitignore to exclude .DS_Store files --- .gitignore | 5 +- README.md | 213 ++++++++++++---------------------------- docs/API.md | 111 +++++++++++++++++++++ docs/CLI.md | 63 ++++++++++++ docs/FAQ.md | 53 ++++++++++ docs/INSTALLATION.md | 74 ++++++++++++++ docs/TROUBLESHOOTING.md | 92 +++++++++++++++++ 7 files changed, 459 insertions(+), 152 deletions(-) create mode 100644 docs/API.md create mode 100644 docs/CLI.md create mode 100644 docs/FAQ.md create mode 100644 docs/INSTALLATION.md create mode 100644 docs/TROUBLESHOOTING.md diff --git a/.gitignore b/.gitignore index 74bd445..5cfcea5 100644 --- a/.gitignore +++ b/.gitignore @@ -213,4 +213,7 @@ uv.lock .mcp.json # vscode -.vscode/ \ No newline at end of file +.vscode/ + +# macos +.DS_Store \ No newline at end of file diff --git a/README.md b/README.md index 21a6eb4..f3604e3 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,6 @@ +
+ Moderators Logo + # Moderators [![Moderators PYPI](https://img.shields.io/pypi/v/moderators?color=blue)](https://pypi.org/project/moderators/) @@ -5,49 +8,42 @@ [![Moderators CI](https://github.com/viddexa/moderators/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/viddexa/moderators/actions/workflows/ci.yml) [![Moderators License](https://img.shields.io/pypi/l/moderators)](https://github.com/viddexa/moderators/blob/main/LICENSE) -Run open‑source content moderation models (NSFW, toxicity, etc.) with one line — from Python or the CLI. Works with Hugging Face models or local folders. Outputs are normalized and app‑ready. +Run open‑source content moderation models (NSFW, nudity, etc.) with one line — from Python or the CLI. + +
+ +## ✨ Key Highlights - One simple API and CLI - Use any compatible Transformers model from the Hub or disk - Normalized JSON output you can plug into your app - Optional auto‑install of dependencies for a smooth first run -Note: Today we ship a Transformers-based integration for image/text classification. +## 🚀 Performance +NSFW image detection performance of `nsfw-detector-mini` compared with [Azure Content Safety AI](https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety) and [Falconsai](https://huggingface.co/Falconsai/nsfw_image_detection). -## Who is this for? -Developers and researchers/academics who want to quickly evaluate or deploy moderation models without wiring different runtimes or dealing with model‑specific output formats. +**F_safe** and **F_nsfw** below are class-wise F1 scores for safe and nsfw classes, respectively. Results show that `nsfw-detector-mini` performs better than Falconsai and Azure AI with fewer parameters. +| Model | F_safe | F_nsfw | Params | +| ------------------------------------------------------------------------------------ | ---------: | ---------: | ------: | +| [nsfw-detector-nano](https://huggingface.co/viddexa/nsfw-detection-nano) | 96.91% | 96.87% | 4M | +| **[nsfw-detector-mini](https://huggingface.co/viddexa/nsfw-detector-mini)** | **97.90%** | **97.89%** | **17M** | +| [Azure AI](https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety) | 96.79% | 96.57% | N/A | +| [Falconsai](https://huggingface.co/Falconsai/nsfw_image_detection) | 89.52% | 89.32% | 85M | -## Installation -Pick one option: +## 📦 Installation -Using pip (recommended): ```bash pip install moderators ``` -Using uv: -```bash -uv venv --python 3.10 -source .venv/bin/activate -uv add moderators -``` - -From source (cloned repo): -```bash -uv sync --extra transformers -``` - -Requirements: -- Python 3.10+ -- For image tasks, Pillow and a DL framework (PyTorch preferred). Moderators can auto‑install these. +For detailed installation options, see the [Installation Guide](docs/INSTALLATION.md). +## 🚀 Quickstart -## Quickstart -Run a model in a few lines. +**Python API:** -Python API: ```python from moderators import AutoModerator @@ -59,163 +55,78 @@ result = moderator("/path/to/image.jpg") print(result) ``` -CLI: +**CLI:** + ```bash +# Image classification moderators viddexa/nsfw-detector-mini /path/to/image.jpg -``` -Text example (sentiment/toxicity): -```bash +# Text classification moderators distilbert/distilbert-base-uncased-finetuned-sst-2-english "I love this!" ``` +## 📊 Real Output Example -## What do results look like? -You get a list of normalized prediction entries. In Python, they’re dataclasses; in the CLI, you get JSON. +![Example input image](https://img.freepik.com/free-photo/front-view-woman-doing-exercises_23-2148498678.jpg?t=st=1760435237~exp=1760438837~hmac=9a0a0a56f83d8fa52f424c7acdf4174dffc3e4d542e189398981a13af3f82b40&w=360) -Python shape (pretty-printed): -```text -[ - PredictionResult( - source_path='', - classifications={'NSFW': 0.9821}, - detections=[], - raw_output={'label': 'NSFW', 'score': 0.9821} - ), - ... -] -``` +Moderators normalized JSON output: -JSON shape (CLI output): ```json [ { "source_path": "", - "classifications": {"NSFW": 0.9821}, + "classifications": { "safe": 0.9999891519546509 }, "detections": [], - "raw_output": {"label": "NSFW", "score": 0.9821} + "raw_output": { "label": "safe", "score": 0.9999891519546509 } + }, + { + "source_path": "", + "classifications": { "nsfw": 0.000010843970812857151 }, + "detections": [], + "raw_output": { "label": "nsfw", "score": 0.000010843970812857151 } } ] ``` -Tip (Python): -```python -from dataclasses import asdict -from moderators import AutoModerator +## 🔍 Comparison at a Glance -moderator = AutoModerator.from_pretrained("viddexa/nsfw-detector-mini") -result = moderator("/path/to/image.jpg") -json_ready = [asdict(r) for r in result] -print(json_ready) -``` +| Feature | Transformers.pipeline() | Moderators | +| ------------------- | ----------------------------- | ---------------------------------------------------------- | +| Usage | `pipeline("task", model=...)` | `AutoModerator.from_pretrained(...)` | +| Model configuration | Manual or model-specific | Automatic via `config.json` (task inference when possible) | +| Output format | Varies by model/pipe | Standardized `PredictionResult` / JSON | +| Requirements | Manual dependency setup | Optional automatic `pip/uv` install | +| CLI | None or project-specific | Built-in `moderators` CLI (JSON to stdout) | +| Extensibility | Mostly one ecosystem | Open to new integrations (same interface) | +| Error messages | Vary by model | Consistent, task/integration-guided | +| Task detection | User-provided | Auto-inferred from config when possible | +## 🎯 Pick a Model -## Example: Real output on a sample image -Image source: +- **From the Hub**: Pass a model ID like `viddexa/nsfw-detector-mini` or any compatible Transformers model +- **From disk**: Pass a local folder that contains a `config.json` next to your weights -![Example input image](https://img.freepik.com/free-photo/front-view-woman-doing-exercises_23-2148498678.jpg?t=st=1760435237~exp=1760438837~hmac=9a0a0a56f83d8fa52f424c7acdf4174dffc3e4d542e189398981a13af3f82b40&w=360) +Moderators detects the task and integration from the config when possible, so you don't have to specify pipelines manually. -Raw model scores: -```json -[ - { "normal": 0.9999891519546509 }, - { "nsfw": 0.000010843970812857151 } -] -``` +## 📚 Documentation -Moderators normalized JSON shape: -```json -[ - { "source_path": "", "classifications": {"normal": 0.9999891519546509}, "detections": [], "raw_output": {"label": "normal", "score": 0.9999891519546509} }, - { "source_path": "", "classifications": {"nsfw": 0.000010843970812857151}, "detections": [], "raw_output": {"label": "nsfw", "score": 0.000010843970812857151} } -] -``` +- [Installation Guide](docs/INSTALLATION.md) - Detailed installation options and requirements +- [CLI Reference](docs/CLI.md) - Complete command-line usage guide +- [API Documentation](docs/API.md) - Python API reference and output formats +- [FAQ](docs/FAQ.md) - Frequently asked questions +- [Troubleshooting](docs/TROUBLESHOOTING.md) - Common issues and solutions +## 📝 Examples -## Comparison at a glance -The table below places Moderators next to the raw Transformers `pipeline()` usage. +Small demos and benchmarking script: `examples/README.md`, `examples/benchmarks.py` -| Feature | Transformers.pipeline() | Moderators | -|---|---|---| -| Usage | `pipeline("task", model=...)` | `AutoModerator.from_pretrained(...)` | -| Model configuration | Manual or model-specific | Automatic via `config.json` (task inference when possible) | -| Output format | Varies by model/pipe | Standardized `PredictionResult` / JSON | -| Requirements | Manual dependency setup | Optional automatic `pip/uv` install | -| CLI | None or project-specific | Built-in `moderators` CLI (JSON to stdout) | -| Extensibility | Mostly one ecosystem | Open to new integrations (same interface) | -| Error messages | Vary by model | Consistent, task/integration-guided | -| Task detection | User-provided | Auto-inferred from config when possible | +## 🗺️ Roadmap - -## Pick a model -- From the Hub: pass a model id like `viddexa/nsfw-detector-mini` or any compatible Transformers model. -- From disk: pass a local folder that contains a `config.json` next to your weights. - -Moderators detects the task and integration from the config when possible, so you don’t have to specify pipelines manually. - - -## Command line usage -Run models from your terminal and get normalized JSON to stdout. - -Usage: -```bash -moderators [--local-files-only] -``` - -Examples: -- Text classification: - ```bash - moderators distilbert/distilbert-base-uncased-finetuned-sst-2-english "I love this!" - ``` -- Image classification (local image): - ```bash - moderators viddexa/nsfw-detector-mini /path/to/image.jpg - ``` - -Tips: -- `--local-files-only` forces offline usage if files are cached. -- The CLI prints a single JSON array (easy to pipe or parse). - - -## Examples -- Small demos and benchmarking script: `examples/README.md`, `examples/benchmarks.py` - - -## FAQ -- Which tasks are supported? - - Image and text classification via Transformers (e.g., NSFW, sentiment/toxicity). More can be added over time. -- Does it need a GPU? - - No. CPU is fine for small models. If your framework has CUDA installed, it will use it. -- How are dependencies handled? - - If something is missing (e.g., `torch`, `transformers`, `Pillow`), Moderators can auto‑install via `uv` or `pip` unless you disable it. To disable: - ```bash - export MODERATORS_DISABLE_AUTO_INSTALL=1 - ``` -- Can I run offline? - - Yes. Use `--local-files-only` in the CLI or `local_files_only=True` in Python after you have the model cached. -- What does “normalized output” mean? - - Regardless of the underlying pipeline, you always get the same result schema (classifications/detections/raw_output), so your app code stays simple. - - -## Roadmap -What’s planned: - Ultralytics integration (YOLO family) via `UltralyticsModerator` - Optional ONNX Runtime backend where applicable - Simple backend switch (API/CLI flag, e.g., `--backend onnx|torch`) - Expanded benchmarks: latency, throughput, memory on common tasks -- Documentation and examples to help you pick the right option - - -## Troubleshooting -- ImportError (PIL/torch/transformers): - - Install the package (`pip install moderators`) or let auto‑install run (ensure `MODERATORS_DISABLE_AUTO_INSTALL` is unset). If you prefer manual dependency control, install extras: `pip install "moderators[transformers]"`. -- OSError: couldn’t find `config.json` / model files: - - Check your model id or local folder path; ensure `config.json` is present. -- HTTP errors when pulling from the Hub: - - Verify connectivity and auth (if private). Use offline mode if already cached. -- GPU not used: - - Ensure your framework is installed with CUDA support. +## 📄 License -## License -Apache-2.0. See `LICENSE`. +Apache-2.0. See [LICENSE](LICENSE). diff --git a/docs/API.md b/docs/API.md new file mode 100644 index 0000000..1a432d4 --- /dev/null +++ b/docs/API.md @@ -0,0 +1,111 @@ +# API Reference + +## Output Format + +Moderators provides normalized, consistent output regardless of the underlying model or framework. + +### Python API + +Results are returned as a list of `PredictionResult` dataclass instances: + +```python +[ + PredictionResult( + source_path='', + classifications={'NSFW': 0.9821}, + detections=[], + raw_output={'label': 'NSFW', 'score': 0.9821} + ), + ... +] +``` + +### JSON Format (CLI) + +The CLI outputs the same structure as JSON: + +```json +[ + { + "source_path": "", + "classifications": { "NSFW": 0.9821 }, + "detections": [], + "raw_output": { "label": "NSFW", "score": 0.9821 } + } +] +``` + +## Converting Python Results to JSON + +Use `dataclasses.asdict()` to convert Python results to JSON-ready dictionaries: + +```python +from dataclasses import asdict +from moderators import AutoModerator + +moderator = AutoModerator.from_pretrained("viddexa/nsfw-detector-mini") +result = moderator("/path/to/image.jpg") +json_ready = [asdict(r) for r in result] +print(json_ready) +``` + +## PredictionResult Fields + +- **`source_path`** (str): Path to the input file, or empty string for text/direct input +- **`classifications`** (dict): Normalized classification results as `{label: score}` pairs +- **`detections`** (list): Object detection results (empty for classification tasks) +- **`raw_output`** (dict): Original model output for reference + +## AutoModerator API + +### Loading Models + +**From Hugging Face Hub:** + +```python +from moderators import AutoModerator + +moderator = AutoModerator.from_pretrained("viddexa/nsfw-detector-mini") +``` + +**From local directory:** + +```python +moderator = AutoModerator.from_pretrained("/path/to/model") +``` + +**With offline mode:** + +```python +moderator = AutoModerator.from_pretrained("model-id", local_files_only=True) +``` + +### Running Inference + +**Image input:** + +```python +result = moderator("/path/to/image.jpg") +``` + +**Text input:** + +```python +result = moderator("Text to classify") +``` + +## Task Detection + +Moderators automatically detects the task type from the model's `config.json` when possible, so you don't need to specify the task manually. + +Supported tasks: + +- Image classification (e.g., NSFW detection) +- Text classification (e.g., sentiment analysis, toxicity detection) + +## Model Selection + +- **From the Hub**: Pass a model ID like `viddexa/nsfw-detector-mini` or any compatible Transformers model +- **From disk**: Pass a local folder that contains a `config.json` next to your model weights + +The system automatically infers the task and integration from the config when possible. diff --git a/docs/CLI.md b/docs/CLI.md new file mode 100644 index 0000000..b8eb5da --- /dev/null +++ b/docs/CLI.md @@ -0,0 +1,63 @@ +# Command Line Reference + +Run moderation models from your terminal and get normalized JSON output to stdout. + +## Usage + +```bash +moderators [--local-files-only] +``` + +### Arguments + +- ``: Hugging Face model ID (e.g., `viddexa/nsfw-detector-mini`) or path to local model directory +- ``: Input data - either a file path (for images) or text string (for text models) +- `--local-files-only` (optional): Force offline mode using cached files only + +## Examples + +### Text Classification + +```bash +moderators distilbert/distilbert-base-uncased-finetuned-sst-2-english "I love this!" +``` + +### Image Classification + +```bash +moderators viddexa/nsfw-detector-mini /path/to/image.jpg +``` + +### Offline Mode + +```bash +moderators viddexa/nsfw-detector-mini /path/to/image.jpg --local-files-only +``` + +## Output Format + +The CLI prints a JSON array to stdout, making it easy to pipe or parse: + +```json +[ + { + "source_path": "", + "classifications": { "NSFW": 0.9821 }, + "detections": [], + "raw_output": { "label": "NSFW", "score": 0.9821 } + } +] +``` + +## Tips + +- The output is a single JSON array per execution +- Use `--local-files-only` to ensure no network requests are made +- Pipe output to `jq` for advanced JSON processing: + ```bash + moderators viddexa/nsfw-detector-mini image.jpg | jq '.[0].classifications' + ``` +- Redirect output to a file for batch processing: + ```bash + moderators viddexa/nsfw-detector-mini image.jpg > results.json + ``` diff --git a/docs/FAQ.md b/docs/FAQ.md new file mode 100644 index 0000000..14612a2 --- /dev/null +++ b/docs/FAQ.md @@ -0,0 +1,53 @@ +# Frequently Asked Questions + +## Which tasks are supported? + +Image and text classification via Transformers (e.g., NSFW detection, sentiment/toxicity analysis). More tasks can be added over time. + +## Does it need a GPU? + +No. CPU is fine for small models. If your framework has CUDA installed, it will automatically use GPU acceleration. + +## How are dependencies handled? + +If something is missing (e.g., `torch`, `transformers`, `Pillow`), Moderators can auto-install via `uv` or `pip` unless you disable it. + +To disable auto-installation: + +```bash +export MODERATORS_DISABLE_AUTO_INSTALL=1 +``` + +For manual dependency control: + +```bash +pip install "moderators[transformers]" +``` + +## Can I run offline? + +Yes. Use `--local-files-only` in the CLI or `local_files_only=True` in Python after you have the model cached. + +**CLI:** + +```bash +moderators model-id input.jpg --local-files-only +``` + +**Python:** + +```python +moderator = AutoModerator.from_pretrained("model-id", local_files_only=True) +``` + +## What does "normalized output" mean? + +Regardless of the underlying pipeline, you always get the same result schema (`PredictionResult` with classifications/detections/raw_output), so your application code stays simple and consistent across different models. + +## Can I use my own custom models? + +Yes! As long as your model has a `config.json` file and is compatible with Transformers, you can use it with Moderators. Just point to the model directory or Hugging Face model ID. + +## How do I contribute or request features? + +Check out the [GitHub repository](https://github.com/viddexa/moderators) to open issues or submit pull requests. Feature requests and contributions are welcome! diff --git a/docs/INSTALLATION.md b/docs/INSTALLATION.md new file mode 100644 index 0000000..1d50d15 --- /dev/null +++ b/docs/INSTALLATION.md @@ -0,0 +1,74 @@ +# Installation Guide + +## Installation Options + +Choose the method that works best for your workflow: + +### Using pip (recommended) + +```bash +pip install moderators +``` + +### Using uv + +```bash +uv venv --python 3.10 +source .venv/bin/activate +uv add moderators +``` + +### From source (cloned repo) + +```bash +uv sync --extra transformers +``` + +## Requirements + +- **Python**: 3.10+ +- **For image tasks**: Pillow and a deep learning framework (PyTorch preferred) + - Moderators can auto-install these dependencies when needed + +## Dependency Auto-Installation + +If something is missing (e.g., `torch`, `transformers`, `Pillow`), Moderators can automatically install it via `uv` or `pip` unless you disable this feature. + +To disable auto-installation: + +```bash +export MODERATORS_DISABLE_AUTO_INSTALL=1 +``` + +## Manual Dependency Control + +If you prefer to manage dependencies manually, install with extras: + +```bash +pip install "moderators[transformers]" +``` + +## Offline Mode + +After caching models, you can run completely offline: + +**CLI:** + +```bash +moderators --local-files-only +``` + +**Python API:** + +```python +moderator = AutoModerator.from_pretrained("model-id", local_files_only=True) +``` + +## GPU Support + +Moderators works on CPU by default. If your deep learning framework (e.g., PyTorch) is installed with CUDA support, GPU acceleration will be used automatically. + +To ensure GPU usage: + +- Install PyTorch with CUDA support following [PyTorch installation guide](https://pytorch.org/get-started/locally/) +- Verify CUDA availability in your environment diff --git a/docs/TROUBLESHOOTING.md b/docs/TROUBLESHOOTING.md new file mode 100644 index 0000000..64d7690 --- /dev/null +++ b/docs/TROUBLESHOOTING.md @@ -0,0 +1,92 @@ +# Troubleshooting Guide + +## ImportError (PIL/torch/transformers) + +**Problem**: Missing dependencies when trying to run Moderators. + +**Solution**: + +- Install the package: `pip install moderators` +- Let auto-install run (ensure `MODERATORS_DISABLE_AUTO_INSTALL` is unset) +- For manual control: `pip install "moderators[transformers]"` + +## OSError: couldn't find `config.json` / model files + +**Problem**: Model configuration or files not found. + +**Solution**: + +- Check your model ID or local folder path +- Ensure `config.json` is present in the model directory +- For Hugging Face models, verify the model ID is correct +- Try downloading the model first to verify it exists: + ```python + from transformers import AutoConfig + AutoConfig.from_pretrained("your-model-id") + ``` + +## HTTP errors when pulling from the Hub + +**Problem**: Network errors or authentication failures when downloading models. + +**Solution**: + +- Verify internet connectivity +- For private models, ensure you're authenticated: + ```bash + huggingface-cli login + ``` +- Use offline mode if the model is already cached: + ```bash + moderators model-id input.jpg --local-files-only + ``` + +## GPU not used + +**Problem**: Model running on CPU despite having a GPU available. + +**Solution**: + +- Ensure your framework is installed with CUDA support +- For PyTorch, reinstall with CUDA: + ```bash + pip install torch --index-url https://download.pytorch.org/whl/cu118 + ``` +- Verify CUDA availability: + ```python + import torch + print(torch.cuda.is_available()) + ``` + +## Model inference is slow + +**Problem**: Inference taking longer than expected. + +**Suggestions**: + +- Use GPU acceleration (see "GPU not used" above) +- Try smaller models (e.g., `nsfw-detector-nano` instead of larger variants) +- Consider batch processing for multiple inputs +- Check if auto-installation is downloading dependencies (first run only) + +## Output format unexpected + +**Problem**: Results don't match expected format. + +**Solution**: + +- Check the API documentation for the correct output schema +- Use `asdict()` to convert Python results to dictionaries: + ```python + from dataclasses import asdict + json_ready = [asdict(r) for r in result] + ``` +- Verify you're using the correct input type (image path vs text string) + +## Need More Help? + +If you're still experiencing issues: + +- Check the [GitHub Issues](https://github.com/viddexa/moderators/issues) +- Review the examples in the `examples/` folder +- Open a new issue with details about your environment and error messages