Add HuggingFace Hub Python SDK integration for chat completion, text generation, and embedding instrumentation

## Summary

The HuggingFace Hub Python SDK (`huggingface_hub`) is the official Python client for the Hugging Face platform. Its `InferenceClient` provides execution APIs for chat completions, text generation, embeddings, and more, with support for 20+ inference providers (including HF Inference, AWS, Google, Together, Fireworks, Groq, etc.). This repository has zero instrumentation for any HuggingFace Inference SDK surface — no integration, no wrapper, no patcher, no auto-instrumentation config. Users who call the HuggingFace Inference SDK directly get no Braintrust spans.

Unlike providers such as Groq or Together AI that can be partially traced through the OpenAI wrapper (since they expose OpenAI-compatible endpoints), the HuggingFace `InferenceClient` has its own native API surface with distinct methods, request/response schemas, and a built-in multi-provider routing layer. While the SDK exposes an OpenAI-compatible alias (`client.chat.completions.create`), this is just a thin proxy to `client.chat_completion()` and does not produce an OpenAI client instance — `wrapOpenAI()` cannot be used with it.

## What needs to be instrumented

The `huggingface_hub` package (latest: v1.10.2) exposes these execution surfaces via `InferenceClient` and `AsyncInferenceClient`, none of which are instrumented:

### Core LLM methods (highest priority)

| SDK Method | Description | Streaming | Return type |
|---|---|---|---|
| `client.chat_completion()` | Chat completions with conversation history, tool use, structured output | `stream=True` returns `Iterable[ChatCompletionStreamOutput]` | `ChatCompletionOutput` |
| `client.text_generation()` | Text generation from prompts | `stream=True` returns `Iterable[TextGenerationStreamOutput]` | `str` or `TextGenerationOutput` (when `details=True`) |

**OpenAI compatibility alias:** `client.chat.completions.create()` is an alias for `client.chat_completion()` — instrumenting `chat_completion` covers both.

**Response shapes are OpenAI-like for chat:** `ChatCompletionOutput` has `choices`, `usage` (`prompt_tokens`, `completion_tokens`, `total_tokens`), `model`, `id` — standard span metrics extraction should be straightforward.

**`text_generation` is HuggingFace-specific:** Returns a plain `str` by default. When `details=True`, returns `TextGenerationOutput` with `generated_tokens` count but no prompt token count. Token usage extraction will require special handling.

### Embedding methods

| SDK Method | Description | Return type |
|---|---|---|
| `client.feature_extraction()` | Text embeddings / feature vectors | `np.ndarray` |
| `client.sentence_similarity()` | Semantic similarity between sentences | `list[float]` |

### Additional generation methods (lower priority, follow-up candidates)

| SDK Method | Description | Return type |
|---|---|---|
| `client.text_to_image()` | Image generation from text prompts | `PIL.Image` |
| `client.text_to_speech()` | Text-to-speech audio synthesis | `bytes` |
| `client.summarization()` | Text summarization | `SummarizationOutput` |
| `client.translation()` | Text translation | `TranslationOutput` |

All methods above also have async variants on `AsyncInferenceClient` with identical signatures.

## Implementation notes

**Multi-provider architecture:** The `InferenceClient` constructor accepts a `provider` parameter that routes requests to one of 22 inference providers:
`black-forest-labs`, `cerebras`, `clarifai`, `cohere`, `fal-ai`, `featherless-ai`, `fireworks-ai`, `groq`, `hf-inference`, `hyperbolic`, `nebius`, `novita`, `nscale`, `openai`, `ovhcloud`, `publicai`, `replicate`, `sambanova`, `scaleway`, `together`, `wavespeed`, `zai-org`.

The selected provider should be captured in span metadata when available.

**Streaming:** Both `chat_completion` and `text_generation` support streaming via `stream=True`. The integration must handle streaming span lifecycle (start span on call, accumulate chunks, finalize on stream exhaustion).

**`chat_completion` parameters relevant for span metadata:** `model`, `temperature`, `max_tokens`, `top_p`, `frequency_penalty`, `presence_penalty`, `seed`, `tools`, `response_format`, `stop`.

## No coverage in any instrumentation layer

- No integration directory (`py/src/braintrust/integrations/huggingface/`)
- No wrapper function (e.g. `wrap_huggingface()`)
- No patcher in any existing integration
- No nox test session (`test_huggingface`)
- No version entry in `py/src/braintrust/integrations/versioning.py`
- No mention in `py/src/braintrust/integrations/__init__.py`

A grep for `huggingface` or `hugging.face` or `hugging_face` (case-insensitive) across `py/src/braintrust/` returns zero matches.

## Braintrust docs status

`not_found` — HuggingFace is not listed on the [Braintrust tracing guide](https://www.braintrust.dev/docs/guides/tracing) or the [integrations directory](https://www.braintrust.dev/docs/integrations).

## Upstream references

- HuggingFace Hub Python SDK on PyPI: https://pypi.org/project/huggingface-hub/
- HuggingFace Hub on GitHub: https://github.com/huggingface/huggingface_hub
- InferenceClient docs: https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client
- Inference Providers docs: https://huggingface.co/docs/inference-providers/en/index
- Chat Completions guide: https://huggingface.co/docs/huggingface_hub/en/guides/inference#openai-compatibility

## Local repo files inspected

- `py/src/braintrust/integrations/` — no `huggingface/` directory exists on `main`
- `py/src/braintrust/wrappers/` — no HuggingFace wrapper
- `py/noxfile.py` — no `test_huggingface` session
- `py/src/braintrust/integrations/__init__.py` — HuggingFace not listed in integration registry
- `py/src/braintrust/integrations/versioning.py` — no HuggingFace version matrix
- Full repo grep for "huggingface", "hugging.face", "hugging_face" — zero matches

SDK Method	Description	Return type
`client.text_to_image()`	Image generation from text prompts	`PIL.Image`
`client.text_to_speech()`	Text-to-speech audio synthesis	`bytes`
`client.summarization()`	Text summarization	`SummarizationOutput`
`client.translation()`	Text translation	`TranslationOutput`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HuggingFace Hub Python SDK integration for chat completion, text generation, and embedding instrumentation #278

Summary

What needs to be instrumented

Core LLM methods (highest priority)

Embedding methods

Additional generation methods (lower priority, follow-up candidates)

Implementation notes

No coverage in any instrumentation layer

Braintrust docs status

Upstream references

Local repo files inspected

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SDK Method	Description	Streaming	Return type
`client.chat_completion()`	Chat completions with conversation history, tool use, structured output	`stream=True` returns `Iterable[ChatCompletionStreamOutput]`	`ChatCompletionOutput`
`client.text_generation()`	Text generation from prompts	`stream=True` returns `Iterable[TextGenerationStreamOutput]`	`str` or `TextGenerationOutput` (when `details=True`)

SDK Method	Description	Return type
`client.feature_extraction()`	Text embeddings / feature vectors	`np.ndarray`
`client.sentence_similarity()`	Semantic similarity between sentences	`list[float]`

Add HuggingFace Hub Python SDK integration for chat completion, text generation, and embedding instrumentation #278

Description

Summary

What needs to be instrumented

Core LLM methods (highest priority)

Embedding methods

Additional generation methods (lower priority, follow-up candidates)

Implementation notes

No coverage in any instrumentation layer

Braintrust docs status

Upstream references

Local repo files inspected

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions