InferCost

True cost intelligence for on-premises AI inference.

InferCost is a Kubernetes-native platform that makes on-premises AI inference costs fully attributable, from GPU hardware amortization through electricity to per-request token economics. It bridges the gap between infrastructure cost tools and AI observability platforms, combining hardware economics with token-level attribution in a way that neither category covers alone.

Website: infercost.ai | License: Apache 2.0

The Problem

The AI inference ecosystem has two categories of cost tools, and a gap between them:

Infrastructure cost tools track GPU-hours and resource allocation, but don't understand tokens, models, or inference workloads
AI observability platforms track tokens and requests, but treat on-prem infrastructure as free ($0 cost)

Neither category answers the question enterprises actually need answered: "What does it truly cost to run inference on our own hardware, and how does that compare to cloud APIs?"

InferCost sits at the intersection, combining hardware economics with token-level attribution to compute true cost-per-token for on-prem inference.

What InferCost Does

One controller pod. No database. No UI to host. Plugs into infrastructure you already run.

token_cost = (GPU_amortization + electricity x power_draw x PUE) / tokens_per_hour
  -> attributed per team and per user
    -> compared against what OpenAI, Anthropic, or Google would charge

Install: helm install infercost infercost/infercost + apply one CostProfile describing your hardware. Time to value: First cost data in under 5 minutes.

Features

True cost-per-token: Computed from hardware amortization, real-time GPU power draw (DCGM), and electricity rates
Cloud comparison: Shows what the same tokens would cost on OpenAI, Anthropic, or Google, with verified pricing and honest results (including when cloud is cheaper)
Per-team attribution: Costs broken down by Kubernetes namespace with zero configuration
Prometheus metrics: 12 gauges scrapeable by any monitoring tool, not locked to Grafana
REST API: Programmatic access to cost data for custom dashboards, bots, and integrations
CLI: infercost status and infercost compare for terminal-based cost analysis
Pre-built Grafana dashboard: Ships as JSON, auto-provisionable via sidecar
Multi-backend: Scrapes llama.cpp, vLLM, or any Prometheus-compatible inference engine

Quick Start

Prerequisites

Kubernetes cluster with GPU workloads
DCGM Exporter running (for GPU power metrics)
Inference pods exposing Prometheus metrics (llama.cpp --metrics, vLLM /metrics)

Install CRDs and Controller

# Install CRDs
kubectl apply -f https://raw.githubusercontent.com/defilantech/infercost/main/config/crd/bases/finops.infercost.ai_costprofiles.yaml
kubectl apply -f https://raw.githubusercontent.com/defilantech/infercost/main/config/crd/bases/finops.infercost.ai_usagereports.yaml

Declare Your Hardware Economics

# costprofile.yaml
apiVersion: finops.infercost.ai/v1alpha1
kind: CostProfile
metadata:
  name: my-gpu-cluster
spec:
  hardware:
    gpuModel: "NVIDIA H100 SXM5"
    gpuCount: 8
    purchasePriceUSD: 280000
    amortizationYears: 3
    maintenancePercentPerYear: 0.10
  electricity:
    ratePerKWh: 0.12
    pueFactor: 1.4
  nodeSelector:
    kubernetes.io/hostname: gpu-node-01

kubectl apply -f costprofile.yaml

See Your Costs

$ kubectl get costprofiles
NAME             GPU MODEL           GPUs   $/HR    POWER (W)   AGE
my-gpu-cluster   NVIDIA H100 SXM5    8      $1.24   2400W       5m

CLI

$ infercost status

INFRASTRUCTURE COSTS
====================
PROFILE         GPU MODEL         GPUs  $/HOUR   AMORT    ELEC     POWER    AGE
my-gpu-cluster  NVIDIA H100 SXM5  8     $1.2400  $1.0700  $0.1700  2400W    5m

  my-gpu-cluster projected: $893/month, $10,862/year

INFERENCE MODELS
================
MODEL        NAMESPACE    INPUT TOKENS  OUTPUT TOKENS  TOKENS/SEC  STATUS
llama-70b    production   45.2M         12.8M          42.5        Active (3 req)

QUICK COMPARISON
================
  PROVIDER    MODEL              CLOUD COST   SAVINGS
  Anthropic   claude-opus-4-6    $832.00      $794 (95%)
  OpenAI      gpt-5.4            $523.00      $485 (93%)
  Google      gemini-2.5-pro     $312.00      $274 (88%)

REST API

$ curl http://localhost:8092/api/v1/costs/current
{
  "profileName": "my-gpu-cluster",
  "gpuModel": "NVIDIA H100 SXM5",
  "hourlyCostUSD": 1.24,
  "powerDrawWatts": 2400,
  "monthlyProjectedUSD": 893.00
}

$ curl http://localhost:8092/api/v1/compare
{
  "comparisons": [...],
  "pricingSources": {
    "OpenAI": "https://developers.openai.com/api/docs/pricing",
    "Anthropic": "https://platform.claude.com/docs/en/about-claude/pricing"
  },
  "disclaimer": "Cloud pricing shown is list price..."
}

Grafana Dashboard

Import the pre-built dashboard from config/grafana/infercost-dashboard.json or auto-provision via Grafana sidecar:

kubectl create configmap infercost-dashboard \
  --from-file=config/grafana/infercost-dashboard.json \
  -n monitoring
kubectl label configmap infercost-dashboard grafana_dashboard=1 -n monitoring

Architecture

InferCost runs as a single controller pod. It reads from data sources you already have, computes costs, and writes results to multiple output channels.

Data Sources (inputs):

Source	What it provides
DCGM Exporter	GPU power draw in watts (real-time)
llama.cpp / vLLM	Token counts per inference pod
CostProfile CRD	Hardware purchase price, amortization, electricity rate
LiteLLM PostgreSQL	Per-user token attribution (optional)

Controller Pipeline: GPU Power Scraper → Token Counter → Cost Calculator → Attribution Engine → Cloud Comparator → Report Writer

Outputs:

Output	Use case
Prometheus metrics	Any monitoring tool (Grafana, Datadog, New Relic, etc.)
REST API	Custom dashboards, bots, programmatic access
Grafana dashboard	Pre-built JSON, ships with the project
UsageReport CRDs	kubectl access, GitOps workflows

CRDs

CRD	Purpose
CostProfile	Declares hardware economics for a node or pool: GPU model, purchase price, amortization period, electricity rate, PUE factor
UsageReport	Auto-populated cost reports with per-model and per-namespace breakdowns, plus cloud comparison data
TokenBudget	Per-namespace spend limits with alert thresholds (coming soon)

Roadmap

Phase	Status	Capabilities
Observe	Live	Cost-per-token, GPU power tracking, efficiency metrics
Report	Live	Per-team/model attribution, cloud comparison, UsageReport CRDs
Alert	Coming Soon	Budget thresholds, anomaly detection via Alertmanager
Enforce	Planned	Rate-limit over-budget teams, graceful model degradation
Optimize	Planned	Model switching recommendations, scale-down scheduling
Comply	Planned	Audit log export (EU AI Act, SOC 2), FOCUS spec compatible

Cloud Pricing

InferCost ships with verified list prices for 9 models across OpenAI, Anthropic, and Google (last verified: 2026-03-21). Prices are configurable via config/pricing/cloud-pricing.yaml.

Cloud comparison is honest. When cloud is cheaper than on-prem, InferCost shows negative savings. This helps you make informed decisions about which workloads belong on-prem vs. cloud.

Sources: openai.com/pricing | platform.claude.com/pricing | ai.google.dev/pricing

Standards Alignment

OpenTelemetry GenAI: Metric naming follows gen_ai.usage.* semantic conventions
FOCUS Spec: Compatible export format with x-Infer* extension columns for on-prem dimensions
OpenCost: Designed to complement OpenCost (infrastructure cost allocation + InferCost inference economics = full picture)
Kubernetes-native: CRDs, controller-runtime, Kubebuilder scaffolding

Development

make manifests       # Generate CRDs, RBAC
make generate        # Generate DeepCopy methods
make build           # Build controller + CLI
make test            # Unit tests (envtest)
make lint            # golangci-lint

See CONTRIBUTING.md for development setup, coding standards, and PR process.

Companion Project

InferCost works with any Kubernetes inference stack. It has first-class integration with LLMKube, a Kubernetes operator for deploying and managing local LLM inference with llama.cpp.

License

Apache License 2.0. See LICENSE.

Contributing

Contributions are welcome. Please read CONTRIBUTING.md and sign off your commits (git commit -s) per the Developer Certificate of Origin.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.devcontainer		.devcontainer
.github		.github
api/v1alpha1		api/v1alpha1
charts/infercost		charts/infercost
cmd		cmd
config		config
docs/images		docs/images
hack		hack
internal		internal
pkg/cli		pkg/cli
test		test
.custom-gcl.yml		.custom-gcl.yml
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yaml		.goreleaser.yaml
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.goreleaser		Dockerfile.goreleaser
LICENSE		LICENSE
Makefile		Makefile
PROJECT		PROJECT
README.md		README.md
SECURITY.md		SECURITY.md
go.mod		go.mod
go.sum		go.sum
release-please-config.json		release-please-config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InferCost

The Problem

What InferCost Does

Features

Quick Start

Prerequisites

Install CRDs and Controller

Declare Your Hardware Economics

See Your Costs

CLI

REST API

Grafana Dashboard

Architecture

CRDs

Roadmap

Cloud Pricing

Standards Alignment

Development

Companion Project

License

Contributing

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InferCost

The Problem

What InferCost Does

Features

Quick Start

Prerequisites

Install CRDs and Controller

Declare Your Hardware Economics

See Your Costs

CLI

REST API

Grafana Dashboard

Architecture

CRDs

Roadmap

Cloud Pricing

Standards Alignment

Development

Companion Project

License

Contributing

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages