llm-infrastructure

Here are 32 public repositories matching this topic...

VectorInstitute / vector-inference

Efficient LLM inference on Slurm clusters.

inference speech-to-text vlm text-embedding multimodal audio-transcription llm vllm reward-model llm-infernece sglang llm-infrastructure

Updated Apr 13, 2026
Python

nshkrdotcom / json_remedy

Sponsor

Star

A practical, multi-layered JSON repair library for Elixir that intelligently fixes malformed JSON strings commonly produced by LLMs, legacy systems, and data pipelines.

json elixir otp functional-programming data-validation beam json-parser erlang-vm error-recovery data-quality json-validation json-repair json-fix llm-infrastructure malformed-json nshkr-ai-infra

Updated Apr 4, 2026
Elixir

Multi-model AI agent runtime. Define agents in YAML, connect 6 LLM providers, orchestrate with ReAct/Plan&Execute/Fan-Out/Pipeline/Supervisor/Swarm patterns, and deploy as REST/WebSocket API with RAG, memory, MCP tools, guardrails, and OpenTelemetry observability.

Updated Apr 7, 2026
Python

iBacklight / PipelineLLM

Star

PipelineLLM 是一个系统性的大语言模型（LLM）后训练学习项目，涵盖从监督微调（SFT）到偏好优化（DPO）、强化学习（RLHF/PPO/GRPO）再到持续学习（Continual Learning)的完整技术栈。

reinforcement-learning lora fine-tuning post-training continual-learning sft rlhf llm-reasoning preference-optimization llm-infrastructure llm-processing

Updated Jan 16, 2026
Python

Cognary / Aionis

Star

Standalone Aionis Lite: local single-user runtime, SQLite-backed memory, replay, and playbook-driven automation for coding agents.

Updated Mar 24, 2026
TypeScript

Aagam-Bothara / Joule

Star

Joule is a budget-controlled AI agent runtime that minimizes energy and token usage through hierarchical routing and deterministic tool execution.

ai ai-agents llm agent-runtime llm-infrastructure

Updated Mar 19, 2026
TypeScript

Krako-Labs / Krako-2.0

Star

Krako 2.0 – Energy-efficient, triadic multi-tier inference infrastructure enabling adaptive routing across heterogeneous edge–cloud nodes.

distributed-computing multi-agent-systems inference-engine edge-computing cost-optimization edge-cloud adaptive-routing heterogeneous-compute decentralized-ai ai-infrastructure task-decomposition llm-infrastructure triadic-inference krako

Updated Feb 25, 2026
Python

nullata / llamaman

Star

A browser-based UI for launching, monitoring, and managing multiple llama.cpp server instances from inside a Docker container. Includes an Ollama-compatible API proxy

frontend rest-api proxy llm llamacpp llm-inference llm-proxy llm-manager llm-infrastructure

Updated Apr 17, 2026
Python

nshkrdotcom / mcp_client

Sponsor

Star

Full-featured Elixir client for the Model Context Protocol (MCP) with multi-transport support, resources, prompts, tools, and telemetry.

Updated Nov 8, 2025
Elixir

Testune-AI / express-template

Star

A lightweight Bun + Express template that connects to the Testune AI API and streams chat responses in real time using Server-Sent Events (SSE)

infrastructure ai llm llm-infra llm-infrastructure

Updated Aug 26, 2025
TypeScript

lyonsno / mlx-quant-toolkit

Star

Local-first Python pipeline for analyzing Mixture-of-Experts (MoE) expert weight matrices from .safetensors and .npz, computing per-expert stats, and optionally simulating MLX quantization/dequantization error.

data-visualization data-analysis quantization mlx local-inference llm-infrastructure

Updated Apr 16, 2026
Python

KG-Strategist / agentsocket

Star

A Compute-Agnostic, WebSocket-first protocol for AI Agents. The high-performance alternative to MCP. Runs on Serverless or stateful servers with sub-30ms latency.

agent distributed-systems serverless protocol websockets low-latency ai-agents event-driven-architecture bidirectional-streaming protocol-design agentic agent-protocol real-time-ai llm-infrastructure mcp-alternative

Updated Dec 10, 2025

informatico-madrid / Sovereign-Blackwell-vLLM-Stack

Star

Enterprise-grade Sovereign AI Stack optimized for NVIDIA Blackwell (sm_120) & vLLM. Features 256K context window, 5.8k tok/s prefill, and integrated observability via Langfuse.

cuda blackwell vllm langfuse litellm rtx-5090 qwen3 sovereign-ai self-hosted-llm llm-infrastructure

Updated Jan 21, 2026
Python

ppppangu / dechaos

Star

【非结构化数据pipeline】目标是自动化原始数据—>特定信息提取。first_example:收集任何的文档将其可视化为思维导图（进度1/3）

document-processing llm-infrastructure

Updated Sep 4, 2025
Python

asyu17 / Agent-Orchestra

Star

Usable alpha Python runtime for multi-team agent orchestration, session continuity, and backend-flexible execution.

python multi-agent self-hosting control-plane agent-framework agent-orchestration agent-runtime llm-infrastructure multi-team session-continuity

Updated Apr 13, 2026
Python

ivanGrzegorczyk / ai-infra-lab

Star

llm-serving ai-infrastructure llm-integration llm-infrastructure

Updated Feb 2, 2026
Go

0xdippo / infra-model-switch

Star

Self-hosted model switcher for agents with profile-based provider accounts, encrypted credentials, OpenAI-compatible proxying, and per-agent manual model selection.

automation ai-agents model-switching ai-infrastructure llm-infrastructure model-orchestration model-routing agent-infrastructure

Updated Mar 6, 2026
TypeScript

fasilmveloor / malayalam-morpho-hierarchical-tokenizer

Star

A high-fidelity hybrid tokenizer for Malayalam combining FST-based linguistic rigor with Bi-LSTM-CRF neural intuition. Optimized for morphological integrity and vocabulary efficiency in LLM pre-training.

nlp deep-learning morphology tokenizer pytorch malayalam bilstm-crf llm-infrastructure

Updated Mar 14, 2026
Python

bitsandbrainsai / agentic-rag-n8n-ingestion-pipeline

Star

A production-ready, enterprise-grade Agentic RAG ingestion pipeline built with n8n, Supabase (pgvector), and AI embeddings. Implements event-driven orchestration, hybrid RAG for structured and unstructured data, vector similarity search, and multi-tenant architecture to deliver client-isolated, retrieval-ready knowledge bases.

Updated Jan 10, 2026
PLpgSQL

KonkovDV / Lime

Star

Docs-first blueprint for a standalone, OpenAI-compatible AI chat backend with a verified H0-H5 architecture roadmap extracted from MicroPhoenix.

typescript domain-driven-design expressjs event-sourcing clean-architecture api-design event-driven-architecture sse-streaming llm-infrastructure

Updated Apr 14, 2026

Improve this page

Add a description, image, and links to the llm-infrastructure topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-infrastructure topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-infrastructure

Here are 32 public repositories matching this topic...

VectorInstitute / vector-inference

nshkrdotcom / json_remedy

monaccode / astromesh

iBacklight / PipelineLLM

Cognary / Aionis

Aagam-Bothara / Joule

Krako-Labs / Krako-2.0

nullata / llamaman

nshkrdotcom / mcp_client

Testune-AI / express-template

lyonsno / mlx-quant-toolkit

KG-Strategist / agentsocket

informatico-madrid / Sovereign-Blackwell-vLLM-Stack

ppppangu / dechaos

asyu17 / Agent-Orchestra

ivanGrzegorczyk / ai-infra-lab

0xdippo / infra-model-switch

fasilmveloor / malayalam-morpho-hierarchical-tokenizer

bitsandbrainsai / agentic-rag-n8n-ingestion-pipeline

KonkovDV / Lime

Improve this page

Add this topic to your repo