Title
Generated Fast Marshal Panics with Buffer Overflow in EncodeNested for proto3 Fields with Implicit Presence
Version
v0.35.0
Description
EncodeNested comes into play when the messages are nested more than one level deep.
A proto3 message with implicit presence will correctly report its Size as 0 when its members are equal to the default values
However, EncodeNested will still write the tag and size into the buffer, taking up two bytes for each message that should have implicit presence and therefore not be written into the buffer at all.
Those bytes were not accounted for in the initial buffer size allocation by Marshal before it calls into MarshalTo, so if there are any other fields to be serialized in the message, it will panic.
Note that this does not end up being true when dealing with implicit/explicit presence: https://protobuf.dev/reference/go/size/
Checking if proto.Size returns 0 is an easy way to recognize empty messages:
if proto.Size(m) == 0 {
// No fields set (or, in proto3, all fields matching the default);
// skip processing this message, or return an error, or similar.
}
Go's reflect-based marshaling itself has extensive code and full internal presence package to track field presence and resize the marshaling buffer if needed.
Repro steps
Full working code @ https://github.com/francoposa/proto-demo/tree/francoposa/prometheus-rw2-csproto-bug-repro
Simplified Prometheus Remote Write V2 Proto
syntax = "proto3";
package io.prometheus.write.v2;
option go_package = "github.com/prometheus/prometheus/write/v2;writev2";
// Request represents a request to write the given timeseries to a remote destination.
message Request {
reserved 1 to 3;
repeated string symbols = 4;
repeated TimeSeries timeseries = 5;
}
// TimeSeries represents a single series.
message TimeSeries {
repeated uint32 labels_refs = 1 [packed=false];
repeated Sample samples = 2;
reserved 3;
reserved 4;
Metadata metadata = 5;
int64 created_timestamp = 6;
}
// Sample represents series sample.
message Sample {
double value = 1;
int64 timestamp = 2;
}
// Metadata represents the metadata associated with the given series' samples.
message Metadata {
enum MetricType {
METRIC_TYPE_UNSPECIFIED = 0;
}
MetricType type = 1;
uint32 help_ref = 3;
uint32 unit_ref = 4;
}
Test to Repro Against the Generated fastmarshal Code
package writev2
import (
"testing"
"github.com/CrowdStrike/csproto"
"github.com/stretchr/testify/assert"
)
func TestMarshalZeroValues(t *testing.T) {
for _, tt := range []struct {
name string
msg csproto.Marshaler
panics bool
}{
{
name: "nonempty-zero-value-single-field-member",
// does not panic because total size for entire message reports as zero;
// there is no buffer and no serialization is performed
msg: &TimeSeries{
Samples: []*Sample{
{
Value: 0,
Timestamp: 0,
},
},
},
panics: false,
},
{
name: "nonempty-zero-value-single-nested-member-with-member-before",
// panics because total size for each Sample message is report as zero
// so bytes were not accounted for in the initial buffer size calculation,
// but EncodeNested will still encode the tag and length for each Sample message;
// buffer overflows when the Sample tag is written to it
msg: &TimeSeries{
LabelsRefs: []uint32{0, 1},
Samples: []*Sample{
{
Value: 0,
Timestamp: 0,
},
},
},
panics: true,
},
{
name: "nonempty-zero-value-single-nested-member-with-member-after",
// panics because total size for each Sample message is report as zero
// so bytes were not accounted for in the initial buffer size calculation,
// but EncodeNested will still encode the tag and length for each Sample message;
// buffer overflows when the next (Metadata) field is written to it
msg: &TimeSeries{
Samples: []*Sample{
{
Value: 0,
Timestamp: 0,
},
},
Metadata: &Metadata{
Type: Metadata_METRIC_TYPE_UNSPECIFIED,
},
},
panics: true,
},
} {
t.Run(tt.name, func(t *testing.T) {
// csproto marshal
if tt.panics {
assert.Panics(t, func() {
_, _ = tt.msg.Marshal()
})
} else {
_, err := tt.msg.Marshal()
assert.NoError(t, err)
}
})
}
}
Title
Generated Fast Marshal Panics with Buffer Overflow in
EncodeNestedfor proto3 Fields with Implicit PresenceVersion
v0.35.0Description
EncodeNestedcomes into play when the messages are nested more than one level deep.A proto3 message with implicit presence will correctly report its
Sizeas0when its members are equal to the default valuesHowever,
EncodeNestedwill still write the tag and size into the buffer, taking up two bytes for each message that should have implicit presence and therefore not be written into the buffer at all.Those bytes were not accounted for in the initial buffer size allocation by
Marshalbefore it calls intoMarshalTo, so if there are any other fields to be serialized in the message, it will panic.Note that this does not end up being true when dealing with implicit/explicit presence: https://protobuf.dev/reference/go/size/
Go's reflect-based marshaling itself has extensive code and full internal presence package to track field presence and resize the marshaling buffer if needed.
Repro steps
Full working code @ https://github.com/francoposa/proto-demo/tree/francoposa/prometheus-rw2-csproto-bug-repro
Simplified Prometheus Remote Write V2 Proto
Test to Repro Against the Generated fastmarshal Code