Skip to content

Add HuggingFace Hub Python SDK integration for chat completion, text generation, and embedding instrumentation #278

@AbhiPrasad

Description

Summary

The HuggingFace Hub Python SDK (huggingface_hub) is the official Python client for the Hugging Face platform. Its InferenceClient provides execution APIs for chat completions, text generation, embeddings, and more, with support for 20+ inference providers (including HF Inference, AWS, Google, Together, Fireworks, Groq, etc.). This repository has zero instrumentation for any HuggingFace Inference SDK surface — no integration, no wrapper, no patcher, no auto-instrumentation config. Users who call the HuggingFace Inference SDK directly get no Braintrust spans.

Unlike providers such as Groq or Together AI that can be partially traced through the OpenAI wrapper (since they expose OpenAI-compatible endpoints), the HuggingFace InferenceClient has its own native API surface with distinct methods, request/response schemas, and a built-in multi-provider routing layer. While the SDK exposes an OpenAI-compatible alias (client.chat.completions.create), this is just a thin proxy to client.chat_completion() and does not produce an OpenAI client instance — wrapOpenAI() cannot be used with it.

What needs to be instrumented

The huggingface_hub package (latest: v1.10.2) exposes these execution surfaces via InferenceClient and AsyncInferenceClient, none of which are instrumented:

Core LLM methods (highest priority)

SDK Method Description Streaming Return type
client.chat_completion() Chat completions with conversation history, tool use, structured output stream=True returns Iterable[ChatCompletionStreamOutput] ChatCompletionOutput
client.text_generation() Text generation from prompts stream=True returns Iterable[TextGenerationStreamOutput] str or TextGenerationOutput (when details=True)

OpenAI compatibility alias: client.chat.completions.create() is an alias for client.chat_completion() — instrumenting chat_completion covers both.

Response shapes are OpenAI-like for chat: ChatCompletionOutput has choices, usage (prompt_tokens, completion_tokens, total_tokens), model, id — standard span metrics extraction should be straightforward.

text_generation is HuggingFace-specific: Returns a plain str by default. When details=True, returns TextGenerationOutput with generated_tokens count but no prompt token count. Token usage extraction will require special handling.

Embedding methods

SDK Method Description Return type
client.feature_extraction() Text embeddings / feature vectors np.ndarray
client.sentence_similarity() Semantic similarity between sentences list[float]

Additional generation methods (lower priority, follow-up candidates)

SDK Method Description Return type
client.text_to_image() Image generation from text prompts PIL.Image
client.text_to_speech() Text-to-speech audio synthesis bytes
client.summarization() Text summarization SummarizationOutput
client.translation() Text translation TranslationOutput

All methods above also have async variants on AsyncInferenceClient with identical signatures.

Implementation notes

Multi-provider architecture: The InferenceClient constructor accepts a provider parameter that routes requests to one of 22 inference providers:
black-forest-labs, cerebras, clarifai, cohere, fal-ai, featherless-ai, fireworks-ai, groq, hf-inference, hyperbolic, nebius, novita, nscale, openai, ovhcloud, publicai, replicate, sambanova, scaleway, together, wavespeed, zai-org.

The selected provider should be captured in span metadata when available.

Streaming: Both chat_completion and text_generation support streaming via stream=True. The integration must handle streaming span lifecycle (start span on call, accumulate chunks, finalize on stream exhaustion).

chat_completion parameters relevant for span metadata: model, temperature, max_tokens, top_p, frequency_penalty, presence_penalty, seed, tools, response_format, stop.

No coverage in any instrumentation layer

  • No integration directory (py/src/braintrust/integrations/huggingface/)
  • No wrapper function (e.g. wrap_huggingface())
  • No patcher in any existing integration
  • No nox test session (test_huggingface)
  • No version entry in py/src/braintrust/integrations/versioning.py
  • No mention in py/src/braintrust/integrations/__init__.py

A grep for huggingface or hugging.face or hugging_face (case-insensitive) across py/src/braintrust/ returns zero matches.

Braintrust docs status

not_found — HuggingFace is not listed on the Braintrust tracing guide or the integrations directory.

Upstream references

Local repo files inspected

  • py/src/braintrust/integrations/ — no huggingface/ directory exists on main
  • py/src/braintrust/wrappers/ — no HuggingFace wrapper
  • py/noxfile.py — no test_huggingface session
  • py/src/braintrust/integrations/__init__.py — HuggingFace not listed in integration registry
  • py/src/braintrust/integrations/versioning.py — no HuggingFace version matrix
  • Full repo grep for "huggingface", "hugging.face", "hugging_face" — zero matches

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions