Skip to content
@defilantech

Defilan Technologies

All Things Automation

Defilan Technologies

Open-source AI infrastructure for teams that need to own their stack.

We're a software company in Washington State building tools that make self-hosted AI practical on Kubernetes. Our work is open source, Apache 2.0 licensed, and designed for production use.


Our Projects

LLMKube — Kubernetes Operator for LLM Inference

A Kubernetes operator that turns LLM deployment into a two-line YAML problem. Define a Model and an InferenceService, and the operator handles the rest: downloading, caching, GPU scheduling, health checks, and exposing an OpenAI-compatible API.

  • Heterogeneous GPU support: NVIDIA CUDA and Apple Silicon Metal in the same cluster
  • OpenAI-compatible API: Drop-in replacement, works with LangChain, LlamaIndex, any OpenAI SDK
  • Full observability: Prometheus metrics, OpenTelemetry tracing, Grafana dashboards
  • Air-gap ready: Built for environments where cloud APIs aren't an option

Website · Documentation · Install via Homebrew


InferCost — Cost Intelligence for On-Prem AI Inference

A Kubernetes-native platform that computes the true cost of running AI on your own hardware. InferCost combines GPU hardware amortization, real-time electricity costs, and token-level attribution to answer the question no other tool can: "What does inference actually cost us, and how does that compare to cloud APIs?"

  • True cost-per-token: Computed from hardware amortization, DCGM power draw, and electricity rates
  • Cloud comparison: Verified pricing across OpenAI, Anthropic, and Google, including when cloud is cheaper
  • Per-team attribution: Costs broken down by Kubernetes namespace with zero configuration
  • Multiple surfaces: Prometheus metrics, REST API, CLI, and a pre-built Grafana dashboard

Website · Documentation · Install via Homebrew


How They Work Together

LLMKube and InferCost are independent projects that complement each other. LLMKube deploys and manages your inference workloads. InferCost tracks what those workloads cost. Together, they give platform teams full control over both the deployment and the economics of self-hosted AI.

InferCost works with any Kubernetes inference stack, not just LLMKube.


How We Work

Everything we build is open source first. We believe the best infrastructure software gets built in the open, with input from the people who actually use it.

We welcome contributions at every level, from filing issues and improving docs to adding new features. If you're interested in Kubernetes, GPU orchestration, AI FinOps, or LLM infrastructure, we'd love to work with you.

LLMKube: Issues · Discussions · Contributing

InferCost: Issues · Contributing


Get in Touch

Star LLMKube · Star InferCost · Join the Discussion

Popular repositories Loading

  1. LLMKube LLMKube Public

    Kubernetes operator for GPU-accelerated LLM inference - air-gapped, edge-native, production-ready

    Go 40 6

  2. infercost infercost Public

    Kubernetes-native cost intelligence for on-premises AI inference. Computes true cost-per-token from GPU amortization, electricity, and real power draw.

    Go 2

  3. .github .github Public

  4. homebrew-tap homebrew-tap Public

    Homebrew tap for LLMKube

    Ruby

  5. issueparser issueparser Public

    LLM-powered GitHub issue theme analyzer. Scan repositories for issues and use AI to identify common pain points, recurring themes, and actionable insights.

    Go

  6. vecsmith vecsmith Public

    Open-source text-to-SVG generation pipeline. Turn text prompts into production-ready vector graphics using LLM prompt enhancement, Flux image generation, and vtracer vectorization.

    Python

Repositories

Showing 6 of 6 repositories
  • LLMKube Public

    Kubernetes operator for GPU-accelerated LLM inference - air-gapped, edge-native, production-ready

    defilantech/LLMKube’s past year of commit activity
    Go 40 Apache-2.0 6 22 4 Updated Apr 3, 2026
  • homebrew-tap Public

    Homebrew tap for LLMKube

    defilantech/homebrew-tap’s past year of commit activity
    Ruby 0 0 0 0 Updated Apr 1, 2026
  • infercost Public

    Kubernetes-native cost intelligence for on-premises AI inference. Computes true cost-per-token from GPU amortization, electricity, and real power draw.

    defilantech/infercost’s past year of commit activity
    Go 2 Apache-2.0 0 1 4 Updated Mar 30, 2026
  • vecsmith Public

    Open-source text-to-SVG generation pipeline. Turn text prompts into production-ready vector graphics using LLM prompt enhancement, Flux image generation, and vtracer vectorization.

    defilantech/vecsmith’s past year of commit activity
    Python 0 Apache-2.0 0 0 1 Updated Mar 24, 2026
  • .github Public
    defilantech/.github’s past year of commit activity
    0 0 0 0 Updated Mar 23, 2026
  • issueparser Public

    LLM-powered GitHub issue theme analyzer. Scan repositories for issues and use AI to identify common pain points, recurring themes, and actionable insights.

    defilantech/issueparser’s past year of commit activity
    Go 0 Apache-2.0 0 0 0 Updated Nov 29, 2025

Top languages

Loading…

Most used topics

Loading…