Send GRPC errors in trailers in all cases #486

aseigo · 2025-12-16T10:50:07Z

In the documentation, there are several examples such as this one from the error handling guide:

GRPC.Stream.from([1, 2])
|> GRPC.Stream.map(fn
  2 -> raise "boom"
  x -> x
end)
|> GRPC.Stream.map_error(fn
  {:error, {:exception, _reason}} ->
    {:error, GRPC.RPCError.exception(message: "Booomm")}
end)

However, we have found that this does not actually work in practice and instead errors are thrown on the server side with no errors arriving on the client side. Which is unfortunate, because this would be a very nice API pattern :)

Thankfully, it's a pretty simple fix: GRPC.Server.Adapter.send_reply/2 needs to send errors to the client. This does mean adding one more function to the Adapter behaviour to facilitate access to the error sending capabilities of the adapter, however.

While pursuing this problem, I stumbled on a semi-related "foot-gun" in that the error exception did not require a status, and that would result in yet other errors being thrown. To address that, the status field is set to an unknown error by default, so that developers who only set a message don't end up with things blowing up on them.

With this PR, we now see errors appear in trailers. I think there are still discussions to be had in terms of multi-message streams and how errors could/should handled there. As it currently stands, the stream finishes when an error occurs. Of course, the developer can use an in-band error signalling message instead, leaving "actual" GRPC errors a communication-terminating event.

This avoids a foot-gun where the app dev does not know which status to use and only sets the message (and/or potentially the details). Without a default value, the status field becomes `nil` which blows up when serializing which is not an easy bug to trace back to the right place in the application.

This exposes a `send_error/2` function in the adapter behaviour.

polvalente · 2025-12-16T10:59:53Z

grpc_server/lib/grpc/server/stream.ex

  end

+  def send_reply(%{adapter: adapter} = stream, {:error, %GRPC.RPCError{} = error}, _opts) do
+    adapter.send_error(stream.payload, error)


Do we really need a new send_error callback? What does send_reply not have?

The function already exists, it's just not accessible via the Adaptor behaviour.

The difference to send_reply is that the error path doesn't try to GRPC-encode the message, puts the error information into the trailers, and ends the connection immediately.

It would be possible to change the contract of Adapter.send_reply/3 so that implementations must pattern match on errors being passed in as the data, but that means overloading the purpose of send_reply: sometimes it sends a reply, sometimes it sends an error.

I felt it cleaner to keep these two paths clearly separated, and as the adapter will need to track its own errors anyways and have a way to deal with those, this is isn't actually introducing new complexity.

But if you'd prefer to overload send_reply and make it a requirement for adapters to handle RPCError structs there, that can also be done!

Second paragraph sold me on the new callback!

polvalente · 2025-12-16T11:00:54Z

grpc_server/lib/grpc/server/stream.ex

    do_send_reply(stream, [], opts)
  end

+  def send_reply(%{adapter: adapter} = stream, {:error, %GRPC.RPCError{} = error}, _opts) do


One thing we need to check is if the only errors that can reach here are in the form of this struct, of if we need to add another clause

That's a very good question.

I've oscillated between having one and not having one, as it would be "easy" enough to try and match on any error .. or even just errors with strings (an error message) and wrap those in an RPCError.

The flip side of that is should that occur, it is almost certainly a developer error. Either an RPCError , a thrown exception, or a protobuf-encodable struct should be the result. Returning Something Else(tm) is probably something like a raw Ecto query or whatever that has been returned, and simply passing that back silently to the client, even as an error, feels somewhere between dangerous and too much magic.

In those cases, I'd rather see a request crash on the server side that can be tracked and addressed.

sleipnir · 2025-12-16T13:37:15Z

In the documentation, there are several examples such as this one from the error handling guide:
GRPC.Stream.from([1, 2])
|> GRPC.Stream.map(fn
  2 -> raise "boom"
  x -> x
end)
|> GRPC.Stream.map_error(fn
  {:error, {:exception, _reason}} ->
    {:error, GRPC.RPCError.exception(message: "Booomm")}
end)
However, we have found that this does not actually work in practice and instead errors are thrown on the server side with no errors arriving on the client side. Which is unfortunate, because this would be a very nice API pattern :)

@aseigo Thank you for bringing this up! I'd like to clarify that the current behavior is actually intentional by design, not a bug. Let me explain the flow and the reasoning behind it.

Current Error Flow (Working as Designed)

When using GRPC.Stream.map_error/2 in streaming handlers, here's what happens:

Handler calls run_with (inside GRPC.Server.send_reply for streaming)
Flow executes and map_error transforms errors into {:error, GRPC.RPCError{}}
run_with detects the error and calls send_response
send_response raises the GRPC.RPCError (intentional control flow mechanism)
Adapter catches the exception:
- Cowboy: catches via catch and does exit({:shutdown, {error, []}})
- Stream process/handler sends error trailers to client
- Stream terminates with proper gRPC error status
- This flow successfully sends errors to clients through HTTP/2 trailers with the correct grpc-status and grpc-message.

Why We Terminate the Stream on Error
The decision to terminate the stream when an error occurs is intentional for several important reasons:

Trailer Accumulation Problem
In HTTP/2 and gRPC protocol, trailers are sent once at the end of a stream with the END_STREAM flag. If we were to continue streaming after an error:

Problematic scenario:[msg1, ERROR, msg2, ERROR2, msg3] # Which trailers do we send? # grpc-status from ERROR or ERROR2?# We can't send multiple trailers - protocol limitation!
HTTP/2 only allows one HEADERS frame with END_STREAM at the end. We cannot send:
Multiple trailer frames (protocol violation)
Accumulate errors indefinitely (memory/state management nightmare)

gRPC Protocol Semantics
Per gRPC protocol specification:
"Status is sent as a single trailer at the end of the stream"
The protocol is designed around the concept of one status per RPC. Errors are meant to be terminating events.
Clear Error Semantics
When an error occurs in a stream transformation:

GRPC.Stream.from([1, 2, 3])
|> GRPC.Stream.map(fn x ->   if x == 2, do: raise("error"), else: xend)
|> GRPC.Stream.map_error(fn _ ->   GRPC.RPCError.exception(status: :internal, message: "Processing failed")end)

The client receives:

✅ msg1 (successful)
✅ Trailers: grpc-status=13, grpc-message="Processing failed"
✅ Stream closes

The alternative (continuing after error) would be confusing:

❌ What does receiving more messages after an error mean?
❌ Was the error "recovered"?
❌ How does the client know the final status?

Alternative: In-Band Error Signaling
If you need to handle recoverable errors within a stream, the recommended pattern is in-band error signaling:

Define a response type that includes success/error variants, in other words, map the errors to common data types within your application's business logic.

This way:

✅ Errors are part of the message protocol
✅ Stream continues after recoverable errors
✅ Final grpc-status=0 indicates successful completion
✅ Client can handle per-message errors

The current behavior is:

raise → adapter catches → sends trailers works perfectly for both Cowboy and ThousandIsland
Stream termination on error prevents trailer accumulation issues and follows gRPC protocol semantics
map_error does work - it transforms errors before sending them to clients as terminating trailers
For non-terminating errors, use in-band error messages in your protobuf definitions
The key insight is: gRPC errors are meant to be stream-terminating events. If you need to continue streaming despite errors, those "errors" should be modeled as regular response messages, not gRPC status codes.

Does this clarify the design? I'm happy to discuss further if you have a specific use case that this pattern doesn't address well!
But do you have any evidence to show us that the client didn't receive the error via the trailer? Couldn't it be a bug in the clients? What you're implementing seems to be exactly the original design, just in a different way. I'd like to discuss this a bit more before proceeding with the review.

sleipnir · 2025-12-16T13:40:43Z

@aseigo Remember that map_error was created precisely for scenarios where you intend to capture errors and transform them into valid business messages, and if an error is critical, you can also map it to a valid GRPC.Error status of your interest.

aseigo · 2025-12-16T14:03:19Z

Let me explain the flow and the reasoning behind it.

I agree that this is a fine way to handle errors and quite like the ergonomics of it.

The issue is that it currently does not work.

When returning GRPC.RPCError.exception from GRPC.Stream.map (or map_error) one gets:

[error] ** (UndefinedFunctionError) function GRPC.RPCError.transform_module/0 is undefined or private
    (grpc_core 1.0.0-rc.1) GRPC.RPCError.transform_module()
    (protobuf 0.15.0) lib/protobuf/encoder.ex:157: Protobuf.Encoder.transform_module/2
    (protobuf 0.15.0) lib/protobuf/encoder.ex:12: Protobuf.Encoder.encode_to_iodata/1
    (grpc_server 1.0.0-rc.1) lib/grpc/server/stream.ex:93: GRPC.Server.Stream.send_reply/3
    (grpc_server 1.0.0-rc.1) lib/grpc/stream.ex:179: GRPC.Stream.run/1

If returning {:error, %GRPC.RPCError{....} one gets:

[error] ** (FunctionClauseError) no function clause matching in Protobuf.Encoder.encode_to_iodata/1
    (protobuf 0.15.0) lib/protobuf/encoder.ex:10: Protobuf.Encoder.encode_to_iodata({:error, %GRPC.RPCError{status: 2, message: "Oh no!", details: nil}})
    (grpc_server 1.0.0-rc.1) lib/grpc/server/stream.ex:93: GRPC.Server.Stream.send_reply/3
    (grpc_server 1.0.0-rc.1) lib/grpc/stream.ex:179: GRPC.Stream.run/1

On the client side, a trailer is sent but it does not contain the contents of the RPCError the developer created. It contains the generic trailers with a status of 0 (!) and no message.

This can be seen as well in the tests changed in this PR where messages that test RPCError exceptions being thrown see the exception in the log rather than the message.

So this doesn't change the workflow currently in the library, it just makes it work such that the RPCError ends up on the client. We noticed this when map_error calls that generated RPCError responses, copy-and-paste from the docs even, did not make it to the client while these transform_module/encode_to_iodata errors were showing up in logs.

sleipnir · 2025-12-16T14:35:11Z

On the client side, a trailer is sent but it does not contain the contents of the RPCError the developer created. It contains the generic trailers with a status of 0 (!) and no message.

This can be seen as well in the tests changed in this PR where messages that test RPCError exceptions being thrown see the > exception in the log rather than the message.

So this doesn't change the workflow currently in the library, it just makes it work such that the RPCError ends up on the client. > We noticed this when map_error calls that generated RPCError responses, copy-and-paste from the docs even, did not make it > to the client while these transform_module/encode_to_iodata errors were showing up in logs.

Okay, let me explore a bit more. I created a suite of integration tests about this here: 1fb869b Could you please run it like this: mix test test/grpc/integration/stream_test.exs --only map_error on the feat/new-server-adapter branch?

All I had to do was adjust the patterns of the send_response/3 function. Sorry for insisting, but I just want to understand the bug.

aseigo · 2025-12-16T14:51:28Z

All I had to do was adjust the patterns of the send_response/3 function

With that adjustment, on the server-side I see the exception in the logs rather than random errors (yes! nice!) but I am still not seeing them on the client side. I applied the patch to the master branch as well as my grpcweb-with-trailers branch, and tested with a native client and a grpcweb client and I'm still seeing grpc-message: '', grpc-status: 0

So maybe handling this in Stream.send_response is a good way forward, but there's still getting the response to the client it seems ...

sleipnir · 2025-12-16T14:53:09Z

All I had to do was adjust the patterns of the send_response/3 function

With that adjustment, on the server-side I see the exception in the logs rather than random errors (yes! nice!) but I am still not seeing them on the client side. I applied the patch to the master branch as well as my grpcweb-with-trailers branch, and tested with a native client and a grpcweb client and I'm still seeing grpc-message: '', grpc-status: 0

So maybe handling this in Stream.send_response is a good way forward, but there's still getting the response to the client it seems ...

Did you run the tests? They're important to know if it's something the clients are handling or a server error. In my case, I believe it's a client issue because the integration tests worked with the standard elixir client, correctly retrieving the errors.

@aseigo How can I replicate your problem scenario here on my end?

aseigo · 2025-12-16T15:18:32Z

Did you run the tests?

The tests in the feat/new-server-adapter work. I suspect (though haven't confirmed due to time restrictions here) that it is due to the ThousandIslands-based adapter doing this correctly while the existing cowboy adapter in the master branch is not?

I say this because the same change to Stream.send_response does not work in the master branch with the existing cowboy-based adapter. :/

If the cowboy adapter is going to be removed soonishly, or if it is also changed in the feat/new-server-adapter branch, then we just have to wait for that branch to be merged, I suppose. If the cowboy adapter will remain, then this will remain an issue. And, of course, if the new adapter isn't merged and released, then it also remains an issue :)

I could not test with the app I have here, unfortunately, as it does not start correctly with the feat/new-server-adapter branch. (Will comment about that on the other PR...)

sleipnir · 2025-12-16T15:25:52Z

Did you run the tests?

The tests in the feat/new-server-adapter work. I suspect (though haven't confirmed due to time restrictions here) that it is due to the ThousandIslands-based adapter doing this correctly while the existing cowboy adapter in the master branch is not?

Those tests ran with Cowboy; run_server without passing options will use the default adapter, which in this case is Cowboy. To run the tests with ThousandIsland, you would need to pass the option:

run_server([MyService], fn port ->

# ...
end, 0, adapter: GRPC.Server.Adapters.ThousandIsland)

I only used that branch because it was convenient to be able to commit to it without impacting anything in production.
I also will continue investigating the issue before reviewing your PR. I don't like approving things without understanding the real reason why they are trying to solve. 😸

sleipnir · 2025-12-16T15:46:16Z

@aseigo Can you try again here? #487

aseigo · 2025-12-16T17:20:47Z

Fixed in #487, closing! Nice work @sleipnir !

sleipnir · 2025-12-16T17:23:47Z

Fixed in #487, closing! Nice work @sleipnir !

I'm the one who should be thanking everyone for their hard work.

Aaron Seigo added 2 commits December 16, 2025 11:36

If an error is returned as the response, send it in the trailers.

ef64dcc

This exposes a `send_error/2` function in the adapter behaviour.

polvalente reviewed Dec 16, 2025

View reviewed changes

aseigo mentioned this pull request Dec 16, 2025

[Fix]: Stream map error #487

Merged

aseigo closed this Dec 16, 2025

Send GRPC errors in trailers in all cases #486

Send GRPC errors in trailers in all cases #486

Uh oh!

Conversation

aseigo commented Dec 16, 2025

Uh oh!

polvalente Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

aseigo Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

polvalente Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

polvalente Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

polvalente Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

aseigo Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sleipnir commented Dec 16, 2025

Current Error Flow (Working as Designed)

Uh oh!

sleipnir commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aseigo commented Dec 16, 2025

Uh oh!

sleipnir commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aseigo commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sleipnir commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aseigo commented Dec 16, 2025

Uh oh!

sleipnir commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sleipnir commented Dec 16, 2025

Uh oh!

aseigo commented Dec 16, 2025

Uh oh!

sleipnir commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aseigo Dec 16, 2025 •

edited

Loading

sleipnir commented Dec 16, 2025 •

edited

Loading

sleipnir commented Dec 16, 2025 •

edited

Loading

aseigo commented Dec 16, 2025 •

edited

Loading

sleipnir commented Dec 16, 2025 •

edited

Loading

sleipnir commented Dec 16, 2025 •

edited

Loading