-
Notifications
You must be signed in to change notification settings - Fork 237
Send GRPC errors in trailers in all cases #486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send GRPC errors in trailers in all cases #486
Conversation
This avoids a foot-gun where the app dev does not know which status to use and only sets the message (and/or potentially the details). Without a default value, the status field becomes `nil` which blows up when serializing which is not an easy bug to trace back to the right place in the application.
This exposes a `send_error/2` function in the adapter behaviour.
| end | ||
|
|
||
| def send_reply(%{adapter: adapter} = stream, {:error, %GRPC.RPCError{} = error}, _opts) do | ||
| adapter.send_error(stream.payload, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a new send_error callback? What does send_reply not have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function already exists, it's just not accessible via the Adaptor behaviour.
The difference to send_reply is that the error path doesn't try to GRPC-encode the message, puts the error information into the trailers, and ends the connection immediately.
It would be possible to change the contract of Adapter.send_reply/3 so that implementations must pattern match on errors being passed in as the data, but that means overloading the purpose of send_reply: sometimes it sends a reply, sometimes it sends an error.
I felt it cleaner to keep these two paths clearly separated, and as the adapter will need to track its own errors anyways and have a way to deal with those, this is isn't actually introducing new complexity.
But if you'd prefer to overload send_reply and make it a requirement for adapters to handle RPCError structs there, that can also be done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second paragraph sold me on the new callback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second paragraph sold me on the new callback!
| do_send_reply(stream, [], opts) | ||
| end | ||
|
|
||
| def send_reply(%{adapter: adapter} = stream, {:error, %GRPC.RPCError{} = error}, _opts) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we need to check is if the only errors that can reach here are in the form of this struct, of if we need to add another clause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very good question.
I've oscillated between having one and not having one, as it would be "easy" enough to try and match on any error .. or even just errors with strings (an error message) and wrap those in an RPCError.
The flip side of that is should that occur, it is almost certainly a developer error. Either an RPCError , a thrown exception, or a protobuf-encodable struct should be the result. Returning Something Else(tm) is probably something like a raw Ecto query or whatever that has been returned, and simply passing that back silently to the client, even as an error, feels somewhere between dangerous and too much magic.
In those cases, I'd rather see a request crash on the server side that can be tracked and addressed.
@aseigo Thank you for bringing this up! I'd like to clarify that the current behavior is actually intentional by design, not a bug. Let me explain the flow and the reasoning behind it. Current Error Flow (Working as Designed)When using
Why We Terminate the Stream on Error
GRPC.Stream.from([1, 2, 3])
|> GRPC.Stream.map(fn x -> if x == 2, do: raise("error"), else: xend)
|> GRPC.Stream.map_error(fn _ -> GRPC.RPCError.exception(status: :internal, message: "Processing failed")end)The client receives: ✅ msg1 (successful) The alternative (continuing after error) would be confusing: ❌ What does receiving more messages after an error mean? Alternative: In-Band Error Signaling Define a response type that includes success/error variants, in other words, map the errors to common data types within your application's business logic. This way: ✅ Errors are part of the message protocol The current behavior is: raise → adapter catches → sends trailers works perfectly for both Cowboy and ThousandIsland Does this clarify the design? I'm happy to discuss further if you have a specific use case that this pattern doesn't address well! |
|
@aseigo Remember that |
I agree that this is a fine way to handle errors and quite like the ergonomics of it. The issue is that it currently does not work. When returning [error] ** (UndefinedFunctionError) function GRPC.RPCError.transform_module/0 is undefined or private
(grpc_core 1.0.0-rc.1) GRPC.RPCError.transform_module()
(protobuf 0.15.0) lib/protobuf/encoder.ex:157: Protobuf.Encoder.transform_module/2
(protobuf 0.15.0) lib/protobuf/encoder.ex:12: Protobuf.Encoder.encode_to_iodata/1
(grpc_server 1.0.0-rc.1) lib/grpc/server/stream.ex:93: GRPC.Server.Stream.send_reply/3
(grpc_server 1.0.0-rc.1) lib/grpc/stream.ex:179: GRPC.Stream.run/1If returning [error] ** (FunctionClauseError) no function clause matching in Protobuf.Encoder.encode_to_iodata/1
(protobuf 0.15.0) lib/protobuf/encoder.ex:10: Protobuf.Encoder.encode_to_iodata({:error, %GRPC.RPCError{status: 2, message: "Oh no!", details: nil}})
(grpc_server 1.0.0-rc.1) lib/grpc/server/stream.ex:93: GRPC.Server.Stream.send_reply/3
(grpc_server 1.0.0-rc.1) lib/grpc/stream.ex:179: GRPC.Stream.run/1On the client side, a trailer is sent but it does not contain the contents of the This can be seen as well in the tests changed in this PR where messages that test RPCError exceptions being thrown see the exception in the log rather than the message. So this doesn't change the workflow currently in the library, it just makes it work such that the RPCError ends up on the client. We noticed this when |
Okay, let me explore a bit more. I created a suite of integration tests about this here: 1fb869b Could you please run it like this: All I had to do was adjust the patterns of the |
With that adjustment, on the server-side I see the exception in the logs rather than random errors (yes! nice!) but I am still not seeing them on the client side. I applied the patch to the So maybe handling this in |
Did you run the tests? They're important to know if it's something the clients are handling or a server error. In my case, I believe it's a client issue because the integration tests worked with the standard elixir client, correctly retrieving the errors. @aseigo How can I replicate your problem scenario here on my end? |
The tests in the I say this because the same change to If the cowboy adapter is going to be removed soonishly, or if it is also changed in the I could not test with the app I have here, unfortunately, as it does not start correctly with the |
Those tests ran with Cowboy; run_server([MyService], fn port ->
# ...
end, 0, adapter: GRPC.Server.Adapters.ThousandIsland)I only used that branch because it was convenient to be able to commit to it without impacting anything in production. |
In the documentation, there are several examples such as this one from the error handling guide:
However, we have found that this does not actually work in practice and instead errors are thrown on the server side with no errors arriving on the client side. Which is unfortunate, because this would be a very nice API pattern :)
Thankfully, it's a pretty simple fix:
GRPC.Server.Adapter.send_reply/2needs to send errors to the client. This does mean adding one more function to theAdapterbehaviour to facilitate access to the error sending capabilities of the adapter, however.While pursuing this problem, I stumbled on a semi-related "foot-gun" in that the error exception did not require a status, and that would result in yet other errors being thrown. To address that, the status field is set to an unknown error by default, so that developers who only set a message don't end up with things blowing up on them.
With this PR, we now see errors appear in trailers. I think there are still discussions to be had in terms of multi-message streams and how errors could/should handled there. As it currently stands, the stream finishes when an error occurs. Of course, the developer can use an in-band error signalling message instead, leaving "actual" GRPC errors a communication-terminating event.