Skip to content

[pull] main from apache:main#88

Merged
pull[bot] merged 1 commit intoburaksenn:mainfrom
apache:main
Apr 8, 2026
Merged

[pull] main from apache:main#88
pull[bot] merged 1 commit intoburaksenn:mainfrom
apache:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Apr 8, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

This is an alternative approach to
- #19687

Instead of reading the entire range in the json FileOpener, implement an
AlignedBoundaryStream which scans the range for newlines as the
FileStream
requests data from the stream, by wrapping the original stream returned
by the
ObjectStore.

This eliminated the overhead of the extra two get_opts requests needed
by
calculate_range and more importantly, it allows for efficient read-ahead
implementations by the underlying ObjectStore. Previously this was
inefficient
because the streams opened by calculate_range included a stream from 
`(start - 1)` to file_size and another one from `(end - 1)` to
end_of_file, just to
find the two relevant newlines.


## What changes are included in this PR?
Added the AlignedBoundaryStream which wraps a stream returned by the
object
store and finds the delimiting newlines for a particular file range.
Notably it doesn't
do any standalone reads (unlike the calculate_range function),
eliminating two calls
to get_opts.

## Are these changes tested?
Yes, added unit tests.
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?
No

---------

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@pull pull bot locked and limited conversation to collaborators Apr 8, 2026
@pull pull bot added the ⤵️ pull label Apr 8, 2026
@pull pull bot merged commit 8a48a87 into buraksenn:main Apr 8, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant