Efficient LLM inference on Slurm clusters.
-
Updated
Apr 13, 2026 - Python
Efficient LLM inference on Slurm clusters.
A practical, multi-layered JSON repair library for Elixir that intelligently fixes malformed JSON strings commonly produced by LLMs, legacy systems, and data pipelines.
Multi-model AI agent runtime. Define agents in YAML, connect 6 LLM providers, orchestrate with ReAct/Plan&Execute/Fan-Out/Pipeline/Supervisor/Swarm patterns, and deploy as REST/WebSocket API with RAG, memory, MCP tools, guardrails, and OpenTelemetry observability.
PipelineLLM 是一个系统性的大语言模型(LLM)后训练学习项目,涵盖从监督微调(SFT)到偏好优化(DPO)、强化学习(RLHF/PPO/GRPO)再到持续学习(Continual Learning)的完整技术栈。
Standalone Aionis Lite: local single-user runtime, SQLite-backed memory, replay, and playbook-driven automation for coding agents.
Joule is a budget-controlled AI agent runtime that minimizes energy and token usage through hierarchical routing and deterministic tool execution.
Krako 2.0 – Energy-efficient, triadic multi-tier inference infrastructure enabling adaptive routing across heterogeneous edge–cloud nodes.
A browser-based UI for launching, monitoring, and managing multiple llama.cpp server instances from inside a Docker container. Includes an Ollama-compatible API proxy
Full-featured Elixir client for the Model Context Protocol (MCP) with multi-transport support, resources, prompts, tools, and telemetry.
A lightweight Bun + Express template that connects to the Testune AI API and streams chat responses in real time using Server-Sent Events (SSE)
Local-first Python pipeline for analyzing Mixture-of-Experts (MoE) expert weight matrices from .safetensors and .npz, computing per-expert stats, and optionally simulating MLX quantization/dequantization error.
A Compute-Agnostic, WebSocket-first protocol for AI Agents. The high-performance alternative to MCP. Runs on Serverless or stateful servers with sub-30ms latency.
Enterprise-grade Sovereign AI Stack optimized for NVIDIA Blackwell (sm_120) & vLLM. Features 256K context window, 5.8k tok/s prefill, and integrated observability via Langfuse.
【非结构化数据pipeline】目标是自动化原始数据—>特定信息提取。first_example:收集任何的文档将其可视化为思维导图(进度1/3)
Usable alpha Python runtime for multi-team agent orchestration, session continuity, and backend-flexible execution.
Self-hosted model switcher for agents with profile-based provider accounts, encrypted credentials, OpenAI-compatible proxying, and per-agent manual model selection.
A high-fidelity hybrid tokenizer for Malayalam combining FST-based linguistic rigor with Bi-LSTM-CRF neural intuition. Optimized for morphological integrity and vocabulary efficiency in LLM pre-training.
A production-ready, enterprise-grade Agentic RAG ingestion pipeline built with n8n, Supabase (pgvector), and AI embeddings. Implements event-driven orchestration, hybrid RAG for structured and unstructured data, vector similarity search, and multi-tenant architecture to deliver client-isolated, retrieval-ready knowledge bases.
Docs-first blueprint for a standalone, OpenAI-compatible AI chat backend with a verified H0-H5 architecture roadmap extracted from MicroPhoenix.
Add a description, image, and links to the llm-infrastructure topic page so that developers can more easily learn about it.
To associate your repository with the llm-infrastructure topic, visit your repo's landing page and select "manage topics."