Summary
The HuggingFace Hub Python SDK (huggingface_hub) is the official Python client for the Hugging Face platform. Its InferenceClient provides execution APIs for chat completions, text generation, embeddings, and more, with support for 20+ inference providers (including HF Inference, AWS, Google, Together, Fireworks, Groq, etc.). This repository has zero instrumentation for any HuggingFace Inference SDK surface — no integration, no wrapper, no patcher, no auto-instrumentation config. Users who call the HuggingFace Inference SDK directly get no Braintrust spans.
Unlike providers such as Groq or Together AI that can be partially traced through the OpenAI wrapper (since they expose OpenAI-compatible endpoints), the HuggingFace InferenceClient has its own native API surface with distinct methods, request/response schemas, and a built-in multi-provider routing layer. While the SDK exposes an OpenAI-compatible alias (client.chat.completions.create), this is just a thin proxy to client.chat_completion() and does not produce an OpenAI client instance — wrapOpenAI() cannot be used with it.
What needs to be instrumented
The huggingface_hub package (latest: v1.10.2) exposes these execution surfaces via InferenceClient and AsyncInferenceClient, none of which are instrumented:
Core LLM methods (highest priority)
| SDK Method |
Description |
Streaming |
Return type |
client.chat_completion() |
Chat completions with conversation history, tool use, structured output |
stream=True returns Iterable[ChatCompletionStreamOutput] |
ChatCompletionOutput |
client.text_generation() |
Text generation from prompts |
stream=True returns Iterable[TextGenerationStreamOutput] |
str or TextGenerationOutput (when details=True) |
OpenAI compatibility alias: client.chat.completions.create() is an alias for client.chat_completion() — instrumenting chat_completion covers both.
Response shapes are OpenAI-like for chat: ChatCompletionOutput has choices, usage (prompt_tokens, completion_tokens, total_tokens), model, id — standard span metrics extraction should be straightforward.
text_generation is HuggingFace-specific: Returns a plain str by default. When details=True, returns TextGenerationOutput with generated_tokens count but no prompt token count. Token usage extraction will require special handling.
Embedding methods
| SDK Method |
Description |
Return type |
client.feature_extraction() |
Text embeddings / feature vectors |
np.ndarray |
client.sentence_similarity() |
Semantic similarity between sentences |
list[float] |
Additional generation methods (lower priority, follow-up candidates)
| SDK Method |
Description |
Return type |
client.text_to_image() |
Image generation from text prompts |
PIL.Image |
client.text_to_speech() |
Text-to-speech audio synthesis |
bytes |
client.summarization() |
Text summarization |
SummarizationOutput |
client.translation() |
Text translation |
TranslationOutput |
All methods above also have async variants on AsyncInferenceClient with identical signatures.
Implementation notes
Multi-provider architecture: The InferenceClient constructor accepts a provider parameter that routes requests to one of 22 inference providers:
black-forest-labs, cerebras, clarifai, cohere, fal-ai, featherless-ai, fireworks-ai, groq, hf-inference, hyperbolic, nebius, novita, nscale, openai, ovhcloud, publicai, replicate, sambanova, scaleway, together, wavespeed, zai-org.
The selected provider should be captured in span metadata when available.
Streaming: Both chat_completion and text_generation support streaming via stream=True. The integration must handle streaming span lifecycle (start span on call, accumulate chunks, finalize on stream exhaustion).
chat_completion parameters relevant for span metadata: model, temperature, max_tokens, top_p, frequency_penalty, presence_penalty, seed, tools, response_format, stop.
No coverage in any instrumentation layer
- No integration directory (
py/src/braintrust/integrations/huggingface/)
- No wrapper function (e.g.
wrap_huggingface())
- No patcher in any existing integration
- No nox test session (
test_huggingface)
- No version entry in
py/src/braintrust/integrations/versioning.py
- No mention in
py/src/braintrust/integrations/__init__.py
A grep for huggingface or hugging.face or hugging_face (case-insensitive) across py/src/braintrust/ returns zero matches.
Braintrust docs status
not_found — HuggingFace is not listed on the Braintrust tracing guide or the integrations directory.
Upstream references
Local repo files inspected
py/src/braintrust/integrations/ — no huggingface/ directory exists on main
py/src/braintrust/wrappers/ — no HuggingFace wrapper
py/noxfile.py — no test_huggingface session
py/src/braintrust/integrations/__init__.py — HuggingFace not listed in integration registry
py/src/braintrust/integrations/versioning.py — no HuggingFace version matrix
- Full repo grep for "huggingface", "hugging.face", "hugging_face" — zero matches
Summary
The HuggingFace Hub Python SDK (
huggingface_hub) is the official Python client for the Hugging Face platform. ItsInferenceClientprovides execution APIs for chat completions, text generation, embeddings, and more, with support for 20+ inference providers (including HF Inference, AWS, Google, Together, Fireworks, Groq, etc.). This repository has zero instrumentation for any HuggingFace Inference SDK surface — no integration, no wrapper, no patcher, no auto-instrumentation config. Users who call the HuggingFace Inference SDK directly get no Braintrust spans.Unlike providers such as Groq or Together AI that can be partially traced through the OpenAI wrapper (since they expose OpenAI-compatible endpoints), the HuggingFace
InferenceClienthas its own native API surface with distinct methods, request/response schemas, and a built-in multi-provider routing layer. While the SDK exposes an OpenAI-compatible alias (client.chat.completions.create), this is just a thin proxy toclient.chat_completion()and does not produce an OpenAI client instance —wrapOpenAI()cannot be used with it.What needs to be instrumented
The
huggingface_hubpackage (latest: v1.10.2) exposes these execution surfaces viaInferenceClientandAsyncInferenceClient, none of which are instrumented:Core LLM methods (highest priority)
client.chat_completion()stream=TruereturnsIterable[ChatCompletionStreamOutput]ChatCompletionOutputclient.text_generation()stream=TruereturnsIterable[TextGenerationStreamOutput]strorTextGenerationOutput(whendetails=True)OpenAI compatibility alias:
client.chat.completions.create()is an alias forclient.chat_completion()— instrumentingchat_completioncovers both.Response shapes are OpenAI-like for chat:
ChatCompletionOutputhaschoices,usage(prompt_tokens,completion_tokens,total_tokens),model,id— standard span metrics extraction should be straightforward.text_generationis HuggingFace-specific: Returns a plainstrby default. Whendetails=True, returnsTextGenerationOutputwithgenerated_tokenscount but no prompt token count. Token usage extraction will require special handling.Embedding methods
client.feature_extraction()np.ndarrayclient.sentence_similarity()list[float]Additional generation methods (lower priority, follow-up candidates)
client.text_to_image()PIL.Imageclient.text_to_speech()bytesclient.summarization()SummarizationOutputclient.translation()TranslationOutputAll methods above also have async variants on
AsyncInferenceClientwith identical signatures.Implementation notes
Multi-provider architecture: The
InferenceClientconstructor accepts aproviderparameter that routes requests to one of 22 inference providers:black-forest-labs,cerebras,clarifai,cohere,fal-ai,featherless-ai,fireworks-ai,groq,hf-inference,hyperbolic,nebius,novita,nscale,openai,ovhcloud,publicai,replicate,sambanova,scaleway,together,wavespeed,zai-org.The selected provider should be captured in span metadata when available.
Streaming: Both
chat_completionandtext_generationsupport streaming viastream=True. The integration must handle streaming span lifecycle (start span on call, accumulate chunks, finalize on stream exhaustion).chat_completionparameters relevant for span metadata:model,temperature,max_tokens,top_p,frequency_penalty,presence_penalty,seed,tools,response_format,stop.No coverage in any instrumentation layer
py/src/braintrust/integrations/huggingface/)wrap_huggingface())test_huggingface)py/src/braintrust/integrations/versioning.pypy/src/braintrust/integrations/__init__.pyA grep for
huggingfaceorhugging.faceorhugging_face(case-insensitive) acrosspy/src/braintrust/returns zero matches.Braintrust docs status
not_found— HuggingFace is not listed on the Braintrust tracing guide or the integrations directory.Upstream references
Local repo files inspected
py/src/braintrust/integrations/— nohuggingface/directory exists onmainpy/src/braintrust/wrappers/— no HuggingFace wrapperpy/noxfile.py— notest_huggingfacesessionpy/src/braintrust/integrations/__init__.py— HuggingFace not listed in integration registrypy/src/braintrust/integrations/versioning.py— no HuggingFace version matrix