RATIS-2546. Add input stream to DataStreamApi for read operations in Client#1481
RATIS-2546. Add input stream to DataStreamApi for read operations in Client#1481peterxcli wants to merge 16 commits into
Conversation
Signed-off-by: peterxcli <peterxcli@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
|
cc @szetszwo |
|
The no-MD5 sequential reads comparison:
The no-MD5 random reads comparison:
I'm looking into why larger buffer would cause throughput to degrade. |
There was a problem hiding this comment.
@peterxcli , thanks a lot for working on this!
The PR is quite big. Let's separate the DataStreamReply change first. Please see the comments inlined and also https://issues.apache.org/jira/secure/attachment/13082807/1481_review_DataStreamReply.patch (updated the link in the next comment.)
Signed-off-by: peterxcli <peterxcli@gmail.com>
szetszwo
left a comment
There was a problem hiding this comment.
@peterxcli , thanks for updating this!
Let's have one more simple change to add an executor and terminalReply (RATIS-2564). Please see the comments inlined and also https://issues.apache.org/jira/secure/attachment/13082846/1481_review3_refactoring.patch
Signed-off-by: peterxcli <peterxcli@gmail.com>
|
|
||
| /** Async call to send a request and receive multiple replies for the request. */ | ||
| default CompletableFuture<DataStreamReply> streamAsync( | ||
| DataStreamRequest request, Consumer<DataStreamReply> replyConsumer) { |
There was a problem hiding this comment.
@peterxcli , Let's borrow the idea from gRPC StreamObserver and add a similar interface. It is more flexible for future changes.
//ratis-common
package org.apache.ratis.datastream;
/** An interface similar to gRPC {@link org.apache.ratis.thirdparty.io.grpc.stub.StreamObserver}. */
public interface DataStreamObserver<V> {
void onNext(V value);
// see if onError(Throwable) and onCompleted() are useful. Or we may add them later.
}Then our interface will be similar to gRPC service https://github.com/apache/ratis/blob/master/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolClient.java#L174-L181
//DataStreamClientRpc
/** Async call to send a request and receive multiple replies for the request. */
default CompletableFuture<DataStreamReply> streamAsync(DataStreamRequest request,
DataStreamObserver<DataStreamReplyByteBuf> replyHandler) {
throw new UnsupportedOperationException(getClass() + " does not support "
+ JavaUtils.getCurrentStackTraceElement().getMethodName());
}Signed-off-by: peterxcli <peterxcli@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
|
@szetszwo thanks for the grpc observer suggestion, just added a initial version of it. I also add onError and onComplete interface method for the observer, because we can decide which hook to call base on the terminal reply. Please take a look, Thanks! |
| failReads(exception); | ||
| } else { | ||
| markEndOfStream(); | ||
| } |
There was a problem hiding this comment.
Do we need to call close() or mark closed = true; here?
There was a problem hiding this comment.
no, the caller might have not counsumed the fetched content yet, if we set close, the readAsync would fail,
There was a problem hiding this comment.
I think I can refactor this to get better readability.
| for (DataStreamReply reply; (reply = replies.poll()) != null;) { | ||
| DataStreamReplyByteBuf.release(reply); | ||
| } | ||
| failReads(new AlreadyClosedException(clientId + ": stream already closed, request=" + header)); |
There was a problem hiding this comment.
I do not understand why we need to do a failReads upon a ordinary close. Do we not expect close be call in the happy path and we only call close in bad cases?
There was a problem hiding this comment.
close() could happen while readAsync() calls are pending.
| public static void release(DataStreamReply reply) { | ||
| if (reply instanceof DataStreamReplyByteBuf) { | ||
| ((DataStreamReplyByteBuf) reply).release(); | ||
| } |
There was a problem hiding this comment.
What about else? Is that possible other instances are passed here? If no maybe add an assertion, otherwise handle other instances?
There was a problem hiding this comment.
There are two classes implement the DataStreamReply, which is DataStreamReplyByteBuf and DataStreamReplyByteBuffer.
And only DataStreamReplyByteBuf need to release the Netty byteBuffer reference count
Co-Authored-By: amaliujia <amaliujia@apache.org> Signed-off-by: peterxcli <peterxcli@gmail.com>
aa3ab09 to
7b15a13
Compare
Signed-off-by: peterxcli <peterxcli@gmail.com>
szetszwo
left a comment
There was a problem hiding this comment.
@peterxcli , thanks for the update!
In DataStreamInput, readAsync() should return a ReferenceCountedObject so the user can retain/release the buffer.
Please see the comments inlined and also https://issues.apache.org/jira/secure/attachment/13082885/1481_review4.patch
| default void onError(Throwable t) { | ||
| } | ||
|
|
||
| default void onCompleted() { | ||
| } |
There was a problem hiding this comment.
If we add them, we should enforce implementing them. Let's remove default.
void onCompleted();
void onError(Throwable throwable);| /** | ||
| * Read the next chunk in the stream asynchronously. | ||
| * The caller owns the returned {@link DataStreamReply} and should call | ||
| * {@link DataStreamReply#release()} after consuming it. | ||
| * | ||
| * @return a future of the reply. | ||
| */ | ||
| CompletableFuture<DataStreamReply> readAsync(); |
There was a problem hiding this comment.
We should use reference-counted:
/**
* Read the next chunk in the stream asynchronously.
* The caller owns the returned {@link DataStreamReply} which is a {@link ReferenceCountedObject}.
* It must call {@link ReferenceCountedObject#release()} after consuming it.
*
* @return a future of the reference-counted reply.
*/
CompletableFuture<ReferenceCountedObject<DataStreamReply>> readAsync();| /** Async call to send a request and receive multiple replies for the request. */ | ||
| default CompletableFuture<DataStreamReply> streamAsync( | ||
| DataStreamRequest request, DataStreamObserver<DataStreamReplyByteBuf> replyHandler) { | ||
| throw new UnsupportedOperationException(getClass() + " does not support " | ||
| + JavaUtils.getCurrentStackTraceElement().getMethodName()); | ||
| } |
There was a problem hiding this comment.
We should use reference-counted:
/**
* Async call to send a request to receive a stream of intermediate replies and a final reply.
*
* @param request the request
* @param replyHandler to handle intermediate replies
* @return a future the final reply
*/
default CompletableFuture<DataStreamReply> streamAsync(DataStreamRequest request,
DataStreamObserver<ReferenceCountedObject<DataStreamReply>> replyHandler) {
throw new UnsupportedOperationException(getClass() + " does not support "
+ JavaUtils.getCurrentStackTraceElement().getMethodName());
}| try (DataStreamReplyByteBuf reply = (DataStreamReplyByteBuf) msg) { | ||
| process(reply); | ||
| } |
There was a problem hiding this comment.
This will release the reply even before user application reading it. We should use reference-counted:
final DataStreamReplyByteBuf reply = (DataStreamReplyByteBuf) msg;
final ReferenceCountedObject<DataStreamReply> ref = ReferenceCountedObject.<DataStreamReply>newBuilder()
.setValue((DataStreamReplyByteBuf) msg)
.setReleaseMethod(r -> {
if (r != null) {
Preconditions.assertSame(reply, r, "reply");
reply.release();
}
}).build();
try (UncheckedAutoCloseableSupplier<DataStreamReply> ignored = ref.retainAndReleaseOnClose()) {
process(ref);
}| new DataStreamObserver<DataStreamReplyByteBuf>() { | ||
| @Override | ||
| public void onNext(DataStreamReplyByteBuf reply) { | ||
| receive(reply.copy()); |
There was a problem hiding this comment.
We should use reference-counted in the queues and avoid copying the reply.
What changes were proposed in this pull request?
DataStreamInputinterface and its implementationDataStreamInputImplfor asynchronous, zero-copy read-only data streaming, including thereadAsync()method for consuming replies and proper resource release on close.DataStreamApiinterface and its implementation to providestreamReadOnly()methods for creating read-only streams.DataStreamReplyByteBufclass to support replies backed by NettyByteBuf.streamAsync()in theDataStreamClientRpcinterface that takes a reply consumer, supporting multiple replies per request.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/RATIS-2546
How was this patch tested?
(Please explain how this patch was tested. Ex: unit tests, manual tests)
(If this patch involves UI changes, please attach a screen-shot; otherwise, remove this)