Skip to content

feat(datasets): multimodal dataset support (LFE-10288)#840

Merged
wochinge merged 15 commits into
mainfrom
feature/lfe-10288-js-sdk-changes
Jun 23, 2026
Merged

feat(datasets): multimodal dataset support (LFE-10288)#840
wochinge merged 15 commits into
mainfrom
feature/lfe-10288-js-sdk-changes

Conversation

@wochinge

@wochinge wochinge commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Adds multimodal dataset support to the JS SDK — V1 of the Datasets Multimodal Support RFC, mirroring feat(datasets): support multimodal dataset items langfuse-python#1710.
  • Create (dataset.createItem): uploads any LangfuseMedia in input/expectedOutput/metadata (nested, deduped) against its (dataset, item, field), and stores @@@langfuseMedia:…@@@ reference strings. The dataset id is resolved only when the item actually carries media.
  • Consume (dataset.get): always hydrates the reference strings into LangfuseMediaReference objects (signed URL + fetchBytes/fetchBase64/fetchDataUri + isUrlExpired). Use api.datasetItems.list for the raw strings.
  • Core: reusable trace-free uploadMedia; LangfuseMediaReference (with toJSON → reference string, so resolved items round-trip losslessly). Re-exported from @langfuse/client.
  • Drive-by: bound the deprecated v3-compat client aliases to their owners (latent this crash).

Linear

Major Decisions

  • No new dependency — the dataset media JSONPath is set with a tiny hand-rolled parser (jsonPath.ts, ported from the Python SDK) instead of jsonpath-plus. It handles exactly the bracket-normalized paths the backend emits ($, ['key'], [int]).
  • Always hydrate referencesdataset.get resolves media unconditionally (no opt-in flag); the backend resolves them by default.
  • Generated packages/core/src/api changes are hand-applied ahead of the spec bot — drop/rebase that part once the bot regenerates.

Review Focus

  • Public SDK surface — what's intentionally public vs internal: dataset.createItem, dataset.get (now always-resolving), LangfuseMedia/LangfuseMediaReference. MediaManager.uploadMedia was removed to avoid an unintended public method; the deprecated createDatasetItem alias changed signature (no .withRawResponse()/requestOptions) — documented on the field.
  • E2E requires the widened dataset-media backend, so it's red in CI until that ships where CI runs.

Hand-applied ahead of the upstream Fern spec sync so the SDK can build
against the new endpoints. These files live under the generated
packages/core/src/api tree and will be reproduced verbatim by the API
spec bot once the langfuse/langfuse spec change merges:

- DatasetItem.mediaReferences + DatasetItemMediaReference{,Field,Media}
- GetDatasetItemsRequest.includeMediaReferences (+ list query wiring)
- GetMediaUploadUrlRequest: traceId and field are now optional, enabling
  media uploads that are not associated with a trace (dataset items)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 16, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
langfuse-js Ready Ready Preview Jun 23, 2026 8:51am

Request Review

@github-actions

Copy link
Copy Markdown

@claude review

Comment thread packages/core/src/mediaUpload.ts
Comment thread packages/client/src/dataset/index.ts
Comment thread packages/core/src/mediaUpload.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a490e1446

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/client/src/dataset/index.ts Outdated
Comment thread packages/core/src/mediaUpload.ts
Comment thread packages/client/src/dataset/index.ts Outdated
Comment thread packages/client/src/LangfuseClient.ts Outdated
Comment thread packages/client/src/dataset/index.ts Outdated
Comment thread packages/client/src/LangfuseClient.ts
Mirror langfuse/langfuse-python#1710 to add media support to datasets:

- LangfuseMediaReference (core): a signed-URL media handle returned when
  resolving dataset media, with urlIsExpired / fetchBytes / fetchBase64 /
  fetchDataUri helpers for feeding media to LLM providers
- uploadMedia (core): reusable upload routine that works without a trace
  context; MediaService now delegates to it instead of duplicating the
  upload + backoff logic
- DatasetManager.createItem: uploads any LangfuseMedia found in input,
  expectedOutput, or metadata (deduped) and replaces it with a reference
  string before creating the item; createDatasetItem now routes here
- DatasetManager.get(resolveMediaReferences): requests includeMediaReferences
  and hydrates reference strings into LangfuseMediaReference objects using
  jsonpath-plus (eval-free, so it is safe under strict CSP / edge runtimes)
- re-export LangfuseMedia and LangfuseMediaReference from @langfuse/client

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread packages/client/src/LangfuseClient.ts Outdated
The Fern-generated resource methods dispatch to a private `this.__<name>`
on their first line, so the unbound aliases (fetchTrace, fetchTraces,
fetchObservation, fetchObservations, fetchSessions, getDatasetRun,
getDatasetRuns, createDataset, getDatasetItem, fetchMedia,
resolveMediaReferences) threw a TypeError when called off the client.
Bind each to its owning resource and add regression tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Covers create with LangfuseMedia (stored as reference strings, deduped),
get with resolveMediaReferences (hydration + fetch helpers), and
re-using a resolved item through dataset.createItem (lossless round-trip).
The API spec renamed the DatasetItemMediaReferenceField enum value from
expected_output to expectedOutput. Update the generated enum and the
hydration field-allowlist (now an identity map) to match.
Comment thread packages/client/src/dataset/index.ts Outdated
jsonpath-plus returns undefined (not []) when the root is null/undefined,
so reading matches.length threw a TypeError (caught, surfaced as a
misleading warning) when a media reference pointed at a null field.
Comment thread packages/client/src/dataset/index.ts
The media upload endpoint now requires the (dataset, item) a media belongs
to instead of a trace context. createItem settles the item id up front (the
item need not exist yet), resolves the dataset id, and uploads each
LangfuseMedia with its field (input/expectedOutput/metadata). Mirrors
langfuse/langfuse-python#1710.
Comment thread packages/client/src/dataset/index.ts Outdated
createItem now walks each field collecting media (reference strings are
content-derived, no upload needed), then resolves the dataset id and uploads
only if media was found — media-free items make no extra request. The walk is
extracted to a module-level replaceDatasetItemMedia function for readability.
Comment thread packages/client/src/dataset/index.ts
Comment thread packages/client/src/media/index.ts Outdated
dataset.get now always returns LangfuseMediaReference objects instead of an
opt-in resolveMediaReferences flag — the backend resolves them unconditionally,
so the includeMediaReferences param is dropped from the SDK. Use
api.datasetItems.list for the raw reference strings. Also rename
LangfuseMediaReference.urlIsExpired to isUrlExpired.
Comment thread packages/client/package.json Outdated
Comment thread tests/unit/dataset.test.ts
Comment thread packages/client/src/media/index.ts Outdated

@hassiebp hassiebp left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Tobi - same questions as with Python PR on the dependency

Comment thread packages/client/src/dataset/index.ts
wochinge added 4 commits June 22, 2026 18:53
Unit tests were co-located under packages/*/src and not executed by any CI
job (the workspace only defined integration + e2e projects). Move the dataset
media unit tests to tests/unit, add a src-aliased `unit` vitest project +
`test:unit` script + a CI job, and DRY the per-project @langfuse/* alias blocks
into one helper.
Relocate the pre-existing otel (span-filter, span-processor) and experiment
(RunnerContext) unit tests from packages/*/src to tests/unit, rewriting their
relative imports to public @langfuse/* package imports. All unit tests now live
in tests/unit and run via the unit project in CI.
Drop the jsonpath-plus dependency in favor of a tiny parse/set helper
(jsonPath.ts) that handles exactly the bracket-normalized paths the backend
emits ($, ['key'], [int]) — ported from the Python SDK. Add unit tests mirroring
the Python json_path table and extend the e2e to cover key/list-index/nested-list
path shapes, verifying each resolves to its own media via fetchBytes.
Drop the thin MediaManager.uploadMedia wrapper (an unintended public method on
langfuse.media); createItem now calls the core uploadMedia helper directly.
…sdk-changes

# Conflicts:
#	vitest.workspace.ts

@hassiebp hassiebp left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

…sdk-changes

# Conflicts:
#	packages/core/src/api/api/resources/commons/types/DatasetItem.ts
#	packages/core/src/api/api/resources/commons/types/DatasetItemMediaReference.ts
#	packages/core/src/api/api/resources/media/types/GetMediaUploadUrlRequest.ts
@wochinge wochinge enabled auto-merge (squash) June 23, 2026 09:33
@wochinge wochinge merged commit c67225d into main Jun 23, 2026
17 of 19 checks passed
@wochinge wochinge deleted the feature/lfe-10288-js-sdk-changes branch June 23, 2026 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants