Skip to content

fix: JSONL files not opening in preview (ASTD-261)#427

Draft
marcusds wants to merge 1 commit into
mainfrom
astd-261-bug-investigate-why-jsonl-file-is-not-opening-in-preview/mschwab
Draft

fix: JSONL files not opening in preview (ASTD-261)#427
marcusds wants to merge 1 commit into
mainfrom
astd-261-bug-investigate-why-jsonl-file-is-not-opening-in-preview/mschwab

Conversation

@marcusds

@marcusds marcusds commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Problem

JSONL files were showing 'Text preview not available for binary files' in the fileset preview panel.

Root Cause

The backend HEAD endpoint returns application/octet-stream for all files. useIsBinaryFile relied solely on the Content-Type header to determine if a file is text, so .jsonl (and other text files like .json, .csv, .py, etc.) were incorrectly classified as binary.

Fix

Added a KNOWN_TEXT_EXTENSIONS allowlist (50+ extensions) that short-circuits the HEAD request for recognized text file extensions. The detection strategy is now four-tiered:

  1. Extension in KNOWN_TEXT_EXTENSIONS → text immediately (no network)
  2. Extension in BINARY_FILE_EXTENSIONS → binary immediately (no network)
  3. Unknown extension → HEAD request; Content-Type decides
  4. HEAD fails → assume text (fail-open)

Changes

  • useIsBinaryFile.ts: Added KNOWN_TEXT_EXTENSIONS set and isKnownTextExtension helper. Restructured hook to check text extensions before binary blocklist.
  • useIsBinaryFile.test.ts: New test file with 10 tests covering JSONL, JSON, CSV, PY, YAML, MD, undefined path, PNG, ZIP, and unknown extensions.

Summary by CodeRabbit

  • New Features

    • Optimized file type detection with fast recognition of common text file formats (.json, .csv, .py, .yaml, .md), reducing unnecessary server requests.
  • Tests

    • Added comprehensive test coverage for file type detection logic.

The backend HEAD endpoint returns application/octet-stream for all files,
causing useIsBinaryFile to incorrectly classify .jsonl and other text files
as binary. Added KNOWN_TEXT_EXTENSIONS allowlist that short-circuits the
HEAD request for recognized text extensions.

- Add 50+ known text extensions (json, jsonl, csv, py, yaml, md, etc.)
- Restructure hook to check text extensions before binary blocklist
- Add unit tests for useIsBinaryFile hook

Signed-off-by: mschwab <mschwab@nvidia.com>
@marcusds marcusds requested review from a team as code owners June 23, 2026 23:26
@github-actions github-actions Bot added the fix label Jun 23, 2026
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

useIsBinaryFile gains a KNOWN_TEXT_EXTENSIONS allowlist and isKnownTextExtension helper, adding a synchronous fast path that skips the HEAD request for known text extensions. The useQuery enabled flag and early-return order are updated to reflect a four-tier resolution strategy. A new test file covers all tiers.

Changes

useIsBinaryFile known-text fast path and tests

Layer / File(s) Summary
Known-text allowlist and four-tier resolution logic
web/packages/studio/src/components/filesets/hooks/useIsBinaryFile.ts
Adds KNOWN_TEXT_EXTENSIONS allowlist and isKnownTextExtension helper; removes the internal filePath guard from queryFn; updates useQuery enabled to exclude known-text and blocklisted files; reorders early returns to: undefined → known-text → blocklist → HEAD.
Unit tests for all resolution tiers
web/packages/studio/src/components/filesets/hooks/useIsBinaryFile.test.ts
New test suite mocking useQuery and axios; asserts { isBinary: false, isLoading: false } for undefined and known-text paths, { isBinary: true, isLoading: false } for blocklisted extensions, and enabled: true passed to useQuery for unknown extensions.

Possibly related PRs

  • NVIDIA-NeMo/nemo-platform#181: Originally introduced useIsBinaryFile and the binary blocklist logic that this PR extends with the known-text allowlist and test coverage.

Suggested reviewers

  • steramae-nvidia
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title directly describes the main fix: JSONL files not opening in preview. Aligns perfectly with PR's core objective of solving the JSONL file classification issue.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch astd-261-bug-investigate-why-jsonl-file-is-not-opening-in-preview/mschwab

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@web/packages/studio/src/components/filesets/hooks/useIsBinaryFile.test.ts`:
- Line 5: The import statement on line 5 mixes value and type imports from
`@tanstack/react-query`. Separate UseQueryResult into a type-only import using
import type syntax, keeping useQuery in the regular import statement. This
follows coding guidelines by explicitly marking type-only imports and allows the
compiler to properly tree-shake unused types.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b07f9cd4-47a7-49c2-946c-0a532a0b5fee

📥 Commits

Reviewing files that changed from the base of the PR and between d8ea9e2 and 4dc4cdc.

📒 Files selected for processing (2)
  • web/packages/studio/src/components/filesets/hooks/useIsBinaryFile.test.ts
  • web/packages/studio/src/components/filesets/hooks/useIsBinaryFile.ts

// SPDX-License-Identifier: Apache-2.0

import { useIsBinaryFile } from '@studio/components/filesets/hooks/useIsBinaryFile';
import { useQuery, UseQueryResult } from '@tanstack/react-query';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Separate type-only imports.

UseQueryResult is only used as a type. Per coding guidelines, use import type for type-only imports.

📦 Suggested fix
-import { useQuery, UseQueryResult } from '`@tanstack/react-query`';
+import { useQuery } from '`@tanstack/react-query`';
+import type { UseQueryResult } from '`@tanstack/react-query`';
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import { useQuery, UseQueryResult } from '@tanstack/react-query';
import { useQuery } from '`@tanstack/react-query`';
import type { UseQueryResult } from '`@tanstack/react-query`';
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@web/packages/studio/src/components/filesets/hooks/useIsBinaryFile.test.ts` at
line 5, The import statement on line 5 mixes value and type imports from
`@tanstack/react-query`. Separate UseQueryResult into a type-only import using
import type syntax, keeping useQuery in the regular import statement. This
follows coding guidelines by explicitly marking type-only imports and allows the
compiler to properly tree-shake unused types.

Source: Coding guidelines

@marcusds marcusds marked this pull request as draft June 23, 2026 23:35
@github-actions

Copy link
Copy Markdown
Contributor
Suite Lines Covered Line Rate Branch Rate
Unit Tests 20908/27478 76.1% 61.2%
Integration Tests 12108/26247 46.1% 19.5%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant