Skip to content

sdaza/ml-analytics-tools

Repository files navigation

ML Analytics Tools

CI GitHub release PyPI Python 3.11+ License: MIT

Utilities for common analytics and machine learning workflows: Redshift, S3, Google Sheets, Slack, MLflow, model evaluation, and SQL pipelines.

The package is intentionally infrastructure-neutral. Buckets, credentials, MLflow hosts, and tokens are provided by your environment or by explicit arguments.

What Is Included

  • DataConnector: run Redshift SQL, load SQL files, unload/load data through S3, and create Redshift tables from DataFrames.
  • S3Connector: read, write, list, delete, and query S3 data with DuckDB.
  • GSheet: read, write, share, and export Google Sheets data.
  • SlackConnector: send messages, upload files, and manage simple Slack interactions.
  • ModelManager: create MLflow experiments, log models, register versions, manage aliases, and handle permissions.
  • model_tools: classification, regression, survival analysis, CatBoost helpers, plotting, and reporting utilities.
  • utils: project-root discovery, SQL file loading, logging, credentials, and YAML SQL pipelines.

Install

From PyPI, after a release is available:

uv add ml-analytics-tools

Directly from GitHub:

uv add git+https://github.com/sdaza/ml-analytics-tools

For local development:

uv sync --all-groups

Configuration

The package loads a .env file from the project root when it is imported. Only configure the services you use.

# Redshift
BI_REDSHIFT_HOST=redshift-cluster.example.com
BI_REDSHIFT_DB=analytics
BI_REDSHIFT_USER=analytics_user
BI_REDSHIFT_PASSWORD=secret
BI_REDSHIFT_PORT=5439

# S3
ML_ANALYTICS_S3_BUCKET=my-analytics-bucket

# MLflow
MLFLOW_TRACKING_URI=https://mlflow.example.com
MLFLOW_TRACKING_USERNAME=user@example.com
MLFLOW_TRACKING_PASSWORD=secret

# Google Sheets
GSHEET_SPREADSHEET_ID=optional-default-sheet-id
GOOGLE_CREDENTIALS='{"type":"service_account", ...}'

# Slack
SLACK_BOT_TOKEN=xoxb-your-token

S3 buckets are never hard-coded. Pass bucket=... or s3_bucket=..., or set ML_ANALYTICS_S3_BUCKET.

AWS Authentication

Use the CLI helper for AWS SSO:

ml-analytics-auth

You can also call it from Python:

from ml_analytics import ensure_aws_authenticated

ensure_aws_authenticated()

See AWS Authentication and CLI Commands for details.

Quick Examples

Query Redshift

from ml_analytics import DataConnector

dc = DataConnector()

df = dc.sql("SELECT * FROM analytics.customer_features LIMIT 100")
df_polars = dc.sql("queries/features.sql", format="polars", country="es")

Create A Redshift Table From A DataFrame

dc.create_table_from_dataframe(
    df,
    table="model_scores",
    schema="analytics",
    drop_existing_table=True,
)

Work With S3

from ml_analytics import S3Connector

s3 = S3Connector(bucket="my-analytics-bucket", s3_root="projects/churn")

s3.save_dataframe(df, directory="outputs", file_name="scores")

summary = s3.query(
    """
    SELECT segment, count(*) AS rows
    FROM read_parquet('s3://my-analytics-bucket/projects/churn/outputs/*.parquet')
    GROUP BY segment
    """
)

Read And Write Google Sheets

from ml_analytics import GSheet

gsheet = GSheet(credentials_path="gsheet_credentials.json")

df = gsheet.read_sheet(spreadsheet_id="...", sheet_name="Input")
gsheet.write_sheet(df, spreadsheet_id="...", sheet_name="Results")

Log To MLflow

from ml_analytics import ModelManager

manager = ModelManager(model_name="churn-model", user="user@example.com")

manager.start_run("training")
manager.log_metric("auc", 0.91)
manager.end_run()

Send A Slack Message

from ml_analytics import SlackConnector

slack = SlackConnector()
slack.send_message(channel="#ml-alerts", text="Training finished")

Detailed Guides

Guide Use It For
AWS Authentication AWS SSO setup and Python helpers
CLI Commands Available console commands
Google Sheets Sheets setup, sharing, exports, and examples
Slack Slack token setup and message/file examples
Tunnel Manager SSH tunnel configuration and CLI usage

Development

Run the standard checks before opening a PR:

uv run ruff check
uv run pytest

CI runs Ruff and pytest on Python 3.11 and 3.12.

Releases

This repository uses Release Please. Conventional commits on main create or update a release PR with the next version and changelog. When that PR is merged, the release workflow builds the package and publishes it to PyPI through Trusted Publishing using the pypi GitHub environment.

Contributing

Keep changes small, covered by tests when behavior changes, and free of environment-specific defaults. Prefer explicit configuration over hidden infrastructure assumptions.

About

Utilities for common analytics and machine learning tasks

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages