Utilities for common analytics and machine learning workflows: Redshift, S3, Google Sheets, Slack, MLflow, model evaluation, and SQL pipelines.
The package is intentionally infrastructure-neutral. Buckets, credentials, MLflow hosts, and tokens are provided by your environment or by explicit arguments.
DataConnector: run Redshift SQL, load SQL files, unload/load data through S3, and create Redshift tables from DataFrames.S3Connector: read, write, list, delete, and query S3 data with DuckDB.GSheet: read, write, share, and export Google Sheets data.SlackConnector: send messages, upload files, and manage simple Slack interactions.ModelManager: create MLflow experiments, log models, register versions, manage aliases, and handle permissions.model_tools: classification, regression, survival analysis, CatBoost helpers, plotting, and reporting utilities.utils: project-root discovery, SQL file loading, logging, credentials, and YAML SQL pipelines.
From PyPI, after a release is available:
uv add ml-analytics-toolsDirectly from GitHub:
uv add git+https://github.com/sdaza/ml-analytics-toolsFor local development:
uv sync --all-groupsThe package loads a .env file from the project root when it is imported.
Only configure the services you use.
# Redshift
BI_REDSHIFT_HOST=redshift-cluster.example.com
BI_REDSHIFT_DB=analytics
BI_REDSHIFT_USER=analytics_user
BI_REDSHIFT_PASSWORD=secret
BI_REDSHIFT_PORT=5439
# S3
ML_ANALYTICS_S3_BUCKET=my-analytics-bucket
# MLflow
MLFLOW_TRACKING_URI=https://mlflow.example.com
MLFLOW_TRACKING_USERNAME=user@example.com
MLFLOW_TRACKING_PASSWORD=secret
# Google Sheets
GSHEET_SPREADSHEET_ID=optional-default-sheet-id
GOOGLE_CREDENTIALS='{"type":"service_account", ...}'
# Slack
SLACK_BOT_TOKEN=xoxb-your-tokenS3 buckets are never hard-coded. Pass bucket=... or s3_bucket=..., or set
ML_ANALYTICS_S3_BUCKET.
Use the CLI helper for AWS SSO:
ml-analytics-authYou can also call it from Python:
from ml_analytics import ensure_aws_authenticated
ensure_aws_authenticated()See AWS Authentication and CLI Commands for details.
from ml_analytics import DataConnector
dc = DataConnector()
df = dc.sql("SELECT * FROM analytics.customer_features LIMIT 100")
df_polars = dc.sql("queries/features.sql", format="polars", country="es")dc.create_table_from_dataframe(
df,
table="model_scores",
schema="analytics",
drop_existing_table=True,
)from ml_analytics import S3Connector
s3 = S3Connector(bucket="my-analytics-bucket", s3_root="projects/churn")
s3.save_dataframe(df, directory="outputs", file_name="scores")
summary = s3.query(
"""
SELECT segment, count(*) AS rows
FROM read_parquet('s3://my-analytics-bucket/projects/churn/outputs/*.parquet')
GROUP BY segment
"""
)from ml_analytics import GSheet
gsheet = GSheet(credentials_path="gsheet_credentials.json")
df = gsheet.read_sheet(spreadsheet_id="...", sheet_name="Input")
gsheet.write_sheet(df, spreadsheet_id="...", sheet_name="Results")from ml_analytics import ModelManager
manager = ModelManager(model_name="churn-model", user="user@example.com")
manager.start_run("training")
manager.log_metric("auc", 0.91)
manager.end_run()from ml_analytics import SlackConnector
slack = SlackConnector()
slack.send_message(channel="#ml-alerts", text="Training finished")| Guide | Use It For |
|---|---|
| AWS Authentication | AWS SSO setup and Python helpers |
| CLI Commands | Available console commands |
| Google Sheets | Sheets setup, sharing, exports, and examples |
| Slack | Slack token setup and message/file examples |
| Tunnel Manager | SSH tunnel configuration and CLI usage |
Run the standard checks before opening a PR:
uv run ruff check
uv run pytestCI runs Ruff and pytest on Python 3.11 and 3.12.
This repository uses Release Please. Conventional commits on main create or
update a release PR with the next version and changelog. When that PR is merged,
the release workflow builds the package and publishes it to PyPI through Trusted
Publishing using the pypi GitHub environment.
Keep changes small, covered by tests when behavior changes, and free of environment-specific defaults. Prefer explicit configuration over hidden infrastructure assumptions.