Skip to content

Avoid full ROOT file downloads for metadata queries (use JSROOT remote access or partial reads) #58

@wdconinc

Description

@wdconinc

Background

Currently, the server downloads the entire remote ROOT file to answer questions about tree/event/collection metadata (e.g., entry counts, branch info, etc.) using ROOTAnalyzer. This happens whenever methods like analyzeFile, getEventStatistics, etc. are called, even though only a small portion of the file is needed for such queries.

This approach is inefficient, especially for large files or datasets with many files, since only the file header, directory structure, and key TTree objects need to be read. Downloading full files can be slow and resource‐intensive.

Proposal

1. Use JSROOT's Remote-File Support

JSROOT supports reading ROOT files directly via HTTP/HTTPS or XRootD URLs, issuing byte-range requests to retrieve only the required metadata blocks. Instead of downloading the file into a buffer and passing a blob to openFile, pass the file URL directly to JSROOT:

// Instead of:
const fileData = await this.xrootdClient.readFile(remotePath);
const blob = new Blob([new Uint8Array(fileData)]);
const file = await openFile(blob);

// Use:
const file = await openFile('https://xrootd-server.org/path/to/file.root'); // or root:// url, if JSROOT supports

This results in JSROOT fetching only the bytes necessary for metadata queries, significantly reducing transfer time and load.

2. (Alternative) Use xrdcp --range or other byte-range approaches

If HTTP endpoints are unavailable, implement logic to read only the file header and key/streamer/TTree objects using partial reads over root://, reusing existing range-support in xrdcp. This requires a more complex parser, but is possible.

3. Cache ROOT File Analysis Results

To avoid repeat downloads for the same (unchanged) file, implement a cache keyed on file path and modification time.

4. (Optional) Parallelize Dataset-Wide Operations

For dataset-wide stats, process files in parallel (up to a safe concurrency limit) to improve wall-clock performance.

References

Impact

  • Large reduction in bandwidth and latency for all metadata queries
  • Makes interactive metadata browsing with large datasets practical
  • If combined with caching and (if needed) parallel fetches, the server would be much more scalable for production use

Summary: Instead of always downloading full ROOT files for metadata queries, support partial-IO approaches (via JSROOT with remote URLs or range reads), and cache results for repeated queries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions