diff --git a/content/best-practices/1-1-1.md b/content/best-practices/1-1-1.md index 7f5fcc935..512f281bd 100644 --- a/content/best-practices/1-1-1.md +++ b/content/best-practices/1-1-1.md @@ -34,10 +34,10 @@ dependencies that may only be used by another message defined in the same file. There are cases where the 1-1-1 ideal is not possible (circular dependencies), not ideal (extremely conceptually coupled messages which have readability -benefits by being co-located), or where the some of the downsides don't apply -(when a .proto file has no imports, then there are no technical concerns about -the size of transitive dependencies). As with any best practice, use good -judgment for when to diverge from the guideline. +benefits by being co-located), or where some of the downsides don't apply (when +a .proto file has no imports, then there are no technical concerns about the +size of transitive dependencies). As with any best practice, use good judgment +for when to diverge from the guideline. One place that modularity of proto schema files is important is when creating gRPC diff --git a/content/best-practices/_index.md b/content/best-practices/_index.md index 79c782e28..2d7cb8a63 100644 --- a/content/best-practices/_index.md +++ b/content/best-practices/_index.md @@ -12,3 +12,4 @@ topics: * [Proto Best Practices](/best-practices/dos-donts) * [API Best Practices](/best-practices/api) * [1-1-1 Rule](/best-practices/1-1-1) +* [Avoid Cargo Culting](/best-practices/no-cargo-cults) diff --git a/content/best-practices/api.md b/content/best-practices/api.md index 85f09bf39..a18d7ede2 100644 --- a/content/best-practices/api.md +++ b/content/best-practices/api.md @@ -6,15 +6,10 @@ type = "docs" aliases = "/programming-guides/api" +++ -Updated for proto3. Patches welcome! - This doc is a complement to [Proto Best Practices](/best-practices/dos-donts). It's not a prescription for Java/C++/Go and other APIs. -If you see a proto straying from these guidelines in a code review, point the -author to this topic and help spread the word. - {{% alert title="Note" color="note" %}} These guidelines are just that and many have documented exceptions. For example, if you're writing a performance-critical backend, you might want to sacrifice diff --git a/content/news/2025-06-27.md b/content/news/2025-06-27.md new file mode 100644 index 000000000..173239156 --- /dev/null +++ b/content/news/2025-06-27.md @@ -0,0 +1,227 @@ ++++ +title = "Changes Announced on June 27, 2025" +linkTitle = "June 27, 2025" +toc_hide = "true" +description = "Changes announced for Protocol Buffers on June 27, 2025." +type = "docs" ++++ + +## Edition 2024 + +We are planning to release Protobuf Editions in 32.x in Q3 2025. + +These describe changes as we anticipate them being implemented, but due to the +flexible nature of software some of these changes may not land or may vary from +how they are described in this topic. + +More documentation on Edition 2024 will be published in +[Feature Settings for Editions](/editions/features), +including information on migrating from Edition 2023. + +## Changes to Existing Features {#changes} + +This section details any existing features whose default settings will change in +Edition 2024. + +### C++ string_type {#string_type} + +The default for `string_type` feature, originally released in Edition 2023, will +change from `STRING` to `VIEW` in Edition 2024. + +See +[`features.(pb.cpp).string_type`](/editions/features#string_type) +and [String View APIs](/reference/cpp/string-view) for +more information on this feature and its feature values. + +## New Features {#new-features} + +This section details any new features that will be introduced in Edition 2024. + +### `enforce_naming_style` {#enforce_naming} + +[`feature.enforce_naming_style`](/editions/features#enforce_naming) +enables strict naming style enforcement to ensure protos are round-trippable by +default with a feature value to opt-out to use legacy naming style. + +### `default_symbol_visibility` {#default_symbol} + +This feature controls whether the default symbol visibility of importable +symbols (such as messages and enums) is either: + +* `export`: referenceable from other protos via import statements +* `local`: un-referenceable outside of current file + +The default feature value `EXPORT_TOP_LEVEL` in Edition 2024 ensures top-level +symbols are export by default, whereas nested symbols are local by default. + +This can be used along with the [`export` and `local` keywords](#export-local) +to explicitly annotate symbol visibility, also added in Edition 2024. + +Symbol visibility only controls which symbols can be imported from other proto +files, but does not affect code-generation. + +### C++: `enum_name_uses_string_view` {#enum_name} + +Previously, all generated enum types provide the following function to obtain +the label out of an enum value, which has significant overhead to construct the +`std::string` instances at runtime: + +```cpp +const std::string& Foo_Name(int); +``` + +The default feature value in Edition 2024 instead migrates this signature to +return `absl::string_view` to allow for better storage decoupling and potential +memory/CPU savings. + +```cpp +absl::string_view Foo_Name(int); +``` + +Users should migrate their code to handle the new return-type following +[the migration guide](/support/migration#string_view-return-type). + +See [String View APIs](/reference/cpp/string-view) for +more information. + +### Java: `nest_in_file_class` {#nest_in} + +This feature controls whether the Java generator will nest the generated class +in the Java generated file class. + +The default value in Edition 2024 generates classes in their own files by +default, which is also the default behavior of the previous +`java_multiple_files = true` file option. This replaces `java_multiple_files` +which is removed in Edition 2024. + +The default outer classname is also updated to always be the camel-cased +`.proto` filename suffixed with Proto by default (for example, +`foo/bar_baz.proto -> BarBazProto`). You can override this using the +`java_outer_classname` file option and replace the pre-Edition 2024 default of +`BarBaz` or `BarBazOuterClass` depending on the presence of conflicts. + +### Java: `large_enum` {#large_enum} + +This feature allows creation of large Java enums, extending beyond the enum +limit due to standard constant limits imposed by the Java language. + +Creation of large enums is not enabled by default, but you can explicitly enable +it using this feature. Note that this feature replicates enum-like behavior but +has some notable differences (for example, switch statements are not supported). + +## Grammar Changes {#grammar-changes} + +### `import option` {#import-option} + +Edition 2024 adds support for option imports using the syntax `import option`. + +Unlike normal `import` statements, option imports import only custom options +defined in a `.proto` file, without importing other symbols. + +This means that messages and enums are excluded from the option import. In the +following example, the `Bar` message cannot be used as a field type in +`foo.proto`, but options with type `Bar` can still be set. + +Option imports must also come after any other import statements. + +Example: + +```proto +// bar.proto +edition = "2024"; + +import "google/protobuf/descriptor.proto"; + +message Bar { + bool bar = 1; +} + +extend proto2.FileOptions { + bool file_opt1 = 5000; + Bar file_opt2 = 5001; +} +``` + +```proto +// foo.proto: +edition = "2024"; + +import option "bar.proto"; + +option (file_opt1) = true; +option (file_opt2) = {bar: true}; + +message Foo { + // Bar bar = 1; // This is not allowed +} +``` + +Option imports do not require generated code for its symbols and thus must be +provided as `option_deps` in `proto_library` instead of `deps`. This avoids +generating unreachable code. + +```build +proto_library( + name = "foo", + srcs = ["foo.proto"], + option_deps = [":custom_option"] +) +``` + +Option imports and `option_deps` are strongly recommended when importing +protobuf language features and other custom options to avoid generating +unnecessary code. + +`option_deps` requires Bazel 8 or later since the `native.proto_library` in +Bazel 7 does not support this. + +This also replaces `import weak`, which is removed in Edition 2024. + +### `export` / `local` Keywords {#export-local} + +`export` and `local` keywords are added in Edition 2024 as modifiers for the +symbol visibility of importable symbols, from the default behavior specified by +the +[`default_symbol_visibility` feature](/editions/features#symbol-vis). + +This controls which symbols can be imported from other proto files, but does not +affect code-generation. + +In Edition 2024, these can be set on all message and enum symbols by default. +However, some values of the `default_symbol_visibility` feature further restrict +which symbols are exportable. + +Example: + +```proto +// Top-level symbols are exported by default in Edition 2024 +local message LocalMessage { + // Nested symbols are local by default in Edition 2024 + export enum ExportedNestedEnum { + UNKNOWN_EXPORTED_NESTED_ENUM_VALUE = 0; + } +} +``` + +### "import weak" and `weak` Field Option {#import-weak} + +Weak imports are no longer allowed as of Edition 2024. + +Most users previously relying on `import weak` to declare a “weak dependency” to +import custom options without generated code for C++ and Go should instead +migrate to use +[`import option`](/editions/overview#import-option) +instead. + +### `ctype` Field Option {#ctype} + +`ctype` field option is no longer allowed as of Edition 2024. Use the +[`features.(pb.cpp).string_type`](/editions/features#string_type) +feature, instead. + +### `java_multiple_files` File Option {#java_multiple} + +The `java_multiple_files` file option is removed in Edition 2024 in favor of the +new +[`features.nest_in_file_class`](/editions/features#java-nest_in_file) +Java feature. diff --git a/content/news/_index.md b/content/news/_index.md index 73bbe19fa..17c8c87ce 100644 --- a/content/news/_index.md +++ b/content/news/_index.md @@ -20,6 +20,7 @@ New news topics will also be published to the The following news topics provide information in the reverse order in which it was released. +* [June 27, 2025](/news/2025-06-27) - Edition 2024 * [March 18, 2025](/news/2025-03-18) - Dropping support for Ruby 3.0 * [January 23, 2025](/news/2025-01-23) - Poison Java @@ -96,6 +97,8 @@ release notes will be more complete. Also, not everything from the chronological listing will be in these topics, as some content is not specific to a particular release. +* [Version 32.x](/news/v32) +* [Version 31.x](/news/v31) * [Version 30.x](/news/v30) * [Version 29.x](/news/v29) * [Version 26.x](/news/v26) diff --git a/content/news/v32.md b/content/news/v32.md new file mode 100644 index 000000000..9bf9c4816 --- /dev/null +++ b/content/news/v32.md @@ -0,0 +1,215 @@ ++++ +title = "News Announcements for Version 32.x" +linkTitle = "Version 32.x" +toc_hide = "true" +description = "Changes announced for Protocol Buffers version 32.x." +type = "docs" ++++ + +## Changes to Existing Features + +This section details any existing features whose default settings will change in +Edition 2024. + +### C++ string_type {#string_type} + +The default for `string_type` feature, originally released in Edition 2023, will +change from `STRING` to `VIEW` in Edition 2024. + +See +[`features.(pb.cpp).string_type`](/editions/features#string_type) +and [String View APIs](/reference/cpp/string-view) for +more information on this feature and its feature values. + +## New Features {#new-features} + +This section details any new features that will be introduced in Edition 2024. + +### `enforce_naming_style` {#enforce_naming} + +[`feature.enforce_naming_style`](/editions/features#enforce_naming) +enables strict naming style enforcement to ensure protos are round-trippable by +default with a feature value to opt-out to use legacy naming style. + +### `default_symbol_visibility` {#default_symbol} + +This feature controls whether the default symbol visibility of importable +symbols (such as messages and enums) is either: + +* `export`: referenceable from other protos via import statements +* `local`: un-referenceable outside of current file + +The default feature value `EXPORT_TOP_LEVEL` in Edition 2024 ensures top-level +symbols are export by default, whereas nested symbols are local by default. + +This can be used along with the [`export` and `local` keywords](#export-local) +to explicitly annotate symbol visibility, also added in Edition 2024. + +Symbol visibility only controls which symbols can be imported from other proto +files, but does not affect code-generation. + +### C++: `enum_name_uses_string_view` {#enum_name} + +Previously, all generated enum types provide the following function to obtain +the label out of an enum value, which has significant overhead to construct the +`std::string` instances at runtime: + +```cpp +const std::string& Foo_Name(int); +``` + +The default feature value in Edition 2024 instead migrates this signature to +return `absl::string_view` to allow for better storage decoupling and potential +memory/CPU savings. + +```cpp +absl::string_view Foo_Name(int); +``` + +Users should migrate their code to handle the new return-type following +[the migration guide](/support/migration#string_view-return-type). + +See [String View APIs](/reference/cpp/string-view) for +more information. + +### Java: `nest_in_file_class` {#nest_in} + +This feature controls whether the Java generator will nest the generated class +in the Java generated file class. + +The default value in Edition 2024 generates classes in their own files by +default, which is also the default behavior of the previous +`java_multiple_files = true` file option. This replaces `java_multiple_files` +which is removed in Edition 2024. + +The default outer classname is also updated to always be the camel-cased +`.proto` filename suffixed with Proto by default (for example, +`foo/bar_baz.proto -> BarBazProto`). You can override this using the +`java_outer_classname` file option and replace the pre-Edition 2024 default of +`BarBaz` or `BarBazOuterClass` depending on the presence of conflicts. + +### Java: `large_enum` {#large_enum} + +This feature allows creation of large Java enums, extending beyond the enum +limit due to standard constant limits imposed by the Java language. + +Creation of large enums is not enabled by default, but you can explicitly enable +it using this feature. Note that this feature replicates enum-like behavior but +has some notable differences (for example, switch statements are not supported). + +## Grammar Changes {#grammar-changes} + +### `import option` {#import-option} + +Edition 2024 adds support for option imports using the syntax `import option`. + +Unlike normal `import` statements, option imports import only custom options +defined in a `.proto` file, without importing other symbols. + +This means that messages and enums are excluded from the option import. In the +following example, the `Bar` message cannot be used as a field type in +`foo.proto`, but options with type `Bar` can still be set. + +Option imports must also come after any other import statements. + +Example: + +```proto +// bar.proto +edition = "2024"; + +import "google/protobuf/descriptor.proto"; + +message Bar { + bool bar = 1; +} + +extend proto2.FileOptions { + bool file_opt1 = 5000; + Bar file_opt2 = 5001; +} +``` + +```proto +// foo.proto: +edition = "2024"; + +import option "bar.proto"; + +option (file_opt1) = true; +option (file_opt2) = {bar: true}; + +message Foo { + // Bar bar = 1; // This is not allowed +} +``` + +Option imports do not require generated code for its symbols and thus must be +provided as `option_deps` in `proto_library` instead of `deps`. This avoids +generating unreachable code. + +```build +proto_library( + name = "foo", + srcs = ["foo.proto"], + option_deps = [":custom_option"] +) +``` + +Option imports and `option_deps` are strongly recommended when importing +protobuf language features and other custom options to avoid generating +unnecessary code. + +`option_deps` requires Bazel 8 or later since the `native.proto_library` in +Bazel 7 does not support this. + +This also replaces `import weak`, which is removed in Edition 2024. + +### `export` / `local` Keywords {#export-local} + +`export` and `local` keywords are added in Edition 2024 as modifiers for the +symbol visibility of importable symbols, from the default behavior specified by +the +[`default_symbol_visibility` feature](/editions/features#symbol-vis). + +This controls which symbols can be imported from other proto files, but does not +affect code-generation. + +In Edition 2024, these can be set on all message and enum symbols by default. +However, some values of the `default_symbol_visibility` feature further restrict +which symbols are exportable. + +Example: + +```proto +// Top-level symbols are exported by default in Edition 2024 +local message LocalMessage { + // Nested symbols are local by default in Edition 2024 + export enum ExportedNestedEnum { + UNKNOWN_EXPORTED_NESTED_ENUM_VALUE = 0; + } +} +``` + +### "import weak" and `weak` Field Option {#import-weak} + +Weak imports are no longer allowed as of Edition 2024. + +Most users previously relying on `import weak` to declare a “weak dependency” to +import custom options without generated code for C++ and Go should instead +migrate to use +[`import option`](/editions/overview#import-option) +instead. + +### `ctype` Field Option {#ctype} + +`ctype` field option is no longer allowed as of Edition 2024. Use the +[`features.(pb.cpp).string_type`](/editions/features#string_type) +feature, instead. + +### `java_multiple_files` File Option {#java_multiple} + +The `java_multiple_files` file option is removed in Edition 2024 in favor of the +new +[`features.nest_in_file_class`](/editions/features#java-nest_in_file) +Java feature. diff --git a/content/programming-guides/editions.md b/content/programming-guides/editions.md index 3f44e7f75..e9c8d2342 100644 --- a/content/programming-guides/editions.md +++ b/content/programming-guides/editions.md @@ -971,80 +971,121 @@ message types without breaking any of your existing code when you use the binary wire format. {{% alert title="Note" color="note" %}} If -you use JSON or +you use ProtoJSON or [proto text format](/reference/protobuf/textformat-spec) to store your protocol buffer messages, the changes that you can make in your -proto definition are different. {{% /alert %}} +proto definition are different. The ProtoJSON wire format safe changes are +described +[here](/programming-guides/json#json-wire-safety). +{{% /alert %}} Check [Proto Best Practices](/best-practices/dos-donts) and the following rules: -* Don't change the field numbers for any existing fields. "Changing" the field - number is equivalent to deleting the field and adding a new field with the - same type. If you want to renumber a field, see the instructions for - [deleting a field](#deleting). -* If you add new fields, any messages serialized by code using your "old" - message format can still be parsed by your new generated code. You should - keep in mind the [default values](#default) for these elements so that new - code can properly interact with messages generated by old code. Similarly, - messages created by your new code can be parsed by your old code: old - binaries simply ignore the new field when parsing. See the - [Unknown Fields](#unknowns) section for details. -* Fields can be removed, as long as the field number is not used again in your - updated message type. You may want to rename the field instead, perhaps - adding the prefix "OBSOLETE_", or make the field number - [reserved](#fieldreserved), so that future users of your `.proto` can't - accidentally reuse the number. -* `int32`, `uint32`, `int64`, `uint64`, and `bool` are all compatible – this - means you can change a field from one of these types to another without - breaking forwards- or backwards-compatibility. If a number is parsed from - the wire which doesn't fit in the corresponding type, you will get the same - effect as if you had cast the number to that type in C++ (for example, if a - 64-bit number is read as an int32, it will be truncated to 32 bits). +### Binary Wire-unsafe Changes {#wire-unsafe-changes} + +Wire-unsafe changes are schema changes that will break if you use parse data +that was serialized using the old schema with a parser that is using the new +schema (or vice versa). Only make wire-unsafe changes if you know that all +serializers and deserializers of the data are using the new schema. + +* Changing field numbers for any existing field is not safe. + * Changing the field number is equivalent to deleting the field and adding + a new field with the same type. If you want to renumber a field, see the + instructions for [deleting a field](#deleting). +* Moving fields into an existing `oneof` is not safe. + +### Binary Wire-safe Changes {#wire-safe-changes} + +Wire-safe changes are ones where it is fully safe to evolve the schema in this +way without risk of data loss or new parse failures. + +Note that any wire-safe changes may be a breaking change to application code in +a given language. For example, adding a value to a preexisting enum would be a +compilation break for any code with an exhaustive switch on that enum. For that +reason, Google may avoid making some of these types of changes on public +messages: the AIPs contain guidance for which of these changes are safe to make +there. + +* Adding new fields is safe. + * If you add new fields, any messages serialized by code using your "old" + message format can still be parsed by your new generated code. You + should keep in mind the [default values](#default) for these elements so + that new code can properly interact with messages generated by old code. + Similarly, messages created by your new code can be parsed by your old + code: old binaries simply ignore the new field when parsing. See the + [Unknown Fields](#unknowns) section for details. +* Removing fields is safe. + * The same field number must not used again in your updated message type. + You may want to rename the field instead, perhaps adding the prefix + "OBSOLETE_", or make the field number [reserved](#fieldreserved), so + that future users of your `.proto` can't accidentally reuse the number. +* Adding additional values to an enum is safe. +* Changing a single explicit presence field or extension into a member of a + **new** `oneof` is safe. +* Changing a `oneof` which contains only one field to an explicit presence + field is safe. +* Changing a field into an extension of same number and type is safe. + +### Binary Wire-compatible Changes (Conditionally Safe) {#conditionally-safe-changes} + +Unlike Wire-safe changes, wire-compatible means that the same data can be parsed +both before and after a given change. However, a parse of the data may be lossy +under this shape of change. For example, changing an int32 to an int64 is a +compatible change, but if a value larger than INT32_MAX is written, a client +that reads it as an int32 will discard the high order bits of the number. + +You can make compatible changes to your schema only if you manage the roll out +to your system carefully. For example, you may change an int32 to an int64 but +ensure you continue to only write legal int32 values until the new schema is +deployed to all endpoints, and then subsequently start writing larger values +after that. + +If your schema is published outside of your organization, you should generally +not make wire-compatible changes, as you cannot manage the deployment of the new +schema to know when the different range of values may be safe to use. + +* `int32`, `uint32`, `int64`, `uint64`, and `bool` are all compatible. + * If a number is parsed from the wire which doesn't fit in the + corresponding type, you will get the same effect as if you had cast the + number to that type in C++ (for example, if a 64-bit number is read as + an int32, it will be truncated to 32 bits). * `sint32` and `sint64` are compatible with each other but are *not* - compatible with the other integer types. If the value written was between - INT_MIN and INT_MAX inclusive it will parse as the same value with either - type. If an sint64 value was written outside of that range and parsed as an - sint32, the varint is truncated to 32 bits and then zigzag decoding occurs - (which will cause a different value to be observed). + compatible with the other integer types. + * If the value written was between INT_MIN and INT_MAX inclusive it will + parse as the same value with either type. If an sint64 value was written + outside of that range and parsed as an sint32, the varint is truncated + to 32 bits and then zigzag decoding occurs (which will cause a different + value to be observed). * `string` and `bytes` are compatible as long as the bytes are valid UTF-8. * Embedded messages are compatible with `bytes` if the bytes contain an encoded instance of the message. * `fixed32` is compatible with `sfixed32`, and `fixed64` with `sfixed64`. * For `string`, `bytes`, and message fields, singular is compatible with - `repeated`. Given serialized data of a repeated field as input, clients that - expect this field to be singular will take the last input value if it's a - primitive type field or merge all input elements if it's a message type - field. Note that this is **not** generally safe for numeric types, including - bools and enums. Repeated fields of numeric types are serialized in the - [packed](/programming-guides/encoding#packed) format - by default, which will not be parsed correctly when a singular field is - expected. -* `enum` is compatible with `int32`, `uint32`, `int64`, and `uint64` in terms - of wire format (note that values will be truncated if they don't fit). - However, be aware that client code may treat them differently when the - message is deserialized: for example, unrecognized `enum` values will be - preserved in the message, but how this is represented when the message is - deserialized is language-dependent. Int fields always just preserve their - value. -* Changing a *single* `optional` field or extension into a member of a **new** - `oneof` is binary compatible, however for some languages (notably, Go) the - generated code's API will change in incompatible ways. For this reason, - Google does not make such changes in its public APIs, as documented in - [AIP-180](https://google.aip.dev/180#moving-into-oneofs). With - the same caveat about source-compatibility, moving *multiple* fields into a - **new** `oneof` may be safe if you are sure that no code sets more than one - at a time. Likewise, changing a *single* field `oneof` to an `optional` - field or extension is safe. Moving *any* fields into an **existing** `oneof` - is **not** safe. + `repeated`. + * Given serialized data of a repeated field as input, clients that expect + this field to be singular will take the last input value if it's a + primitive type field or merge all input elements if it's a message type + field. Note that this is **not** generally safe for numeric types, + including bools and enums. Repeated fields of numeric types are + serialized in the + [packed](/programming-guides/encoding#packed) + format by default, which will not be parsed correctly when a singular + field is expected. +* `enum` is compatible with `int32`, `uint32`, `int64`, and `uint64` + * Be aware that client code may treat them differently when the message is + deserialized: for example, unrecognized proto3 `enum` values will be + preserved in the message, but how this is represented when the message + is deserialized is language-dependent. * Changing a field between a `map` and the corresponding `repeated` message field is binary compatible (see [Maps](#maps), below, for the - message layout and other restrictions). However, the safety of the change is - application-dependent: when deserializing and reserializing a message, - clients using the `repeated` field definition will produce a semantically - identical result; however, clients using the `map` field definition may - reorder entries and drop entries with duplicate keys. + message layout and other restrictions). + * However, the safety of the change is application-dependent: when + deserializing and reserializing a message, clients using the `repeated` + field definition will produce a semantically identical result; however, + clients using the `map` field definition may reorder entries and drop + entries with duplicate keys. ## Unknown Fields {#unknowns} diff --git a/content/programming-guides/encoding.md b/content/programming-guides/encoding.md index 2a44879f1..b4c70acc1 100644 --- a/content/programming-guides/encoding.md +++ b/content/programming-guides/encoding.md @@ -340,10 +340,10 @@ records for the same field with respect to each other is preserved. Thus, this could look like the following: ```proto -5: 1 -5: 2 +6: 1 +6: 2 4: {"hello"} -5: 3 +6: 3 ``` Only repeated fields of primitive numeric types can be declared "packed". These diff --git a/content/programming-guides/enum.md b/content/programming-guides/enum.md index bff84ada3..66207534b 100644 --- a/content/programming-guides/enum.md +++ b/content/programming-guides/enum.md @@ -112,8 +112,8 @@ There are two options for moving to conformant behavior: end up stored in the field cast to the enum type instead of being put into the unknown field set. * Change the enum to closed. This is discouraged, and can cause runtime - behavior if *anybody else* is using the enum. Unrecognized integers will end - up in the unknown field set instead of those fields. + behavior changes if *anybody else* is using the enum. Unrecognized integers + will end up in the unknown field set instead of those fields. ### C# {#csharp} diff --git a/content/programming-guides/json.md b/content/programming-guides/json.md index a56fa6733..26f275b50 100644 --- a/content/programming-guides/json.md +++ b/content/programming-guides/json.md @@ -117,7 +117,8 @@ The following table shows how data is represented in JSON files. number 1, -10, 0 JSON value will be a decimal number. Either numbers or strings are - accepted. Empty strings are invalid. + accepted. Empty strings are invalid. Exponent notation (such as `1e2`) is + accepted in both quoted and unquoted forms. @@ -125,7 +126,8 @@ The following table shows how data is represented in JSON files. string "1", "-10" JSON value will be a decimal string. Either numbers or strings are - accepted. Empty strings are invalid. + accepted. Empty strings are invalid. Exponent notation (such as `1e2`) is + accepted in both quoted and unquoted forms. @@ -152,7 +154,7 @@ The following table shows how data is represented in JSON files. Timestamp string "1972-01-01T10:00:20.021Z" - Uses RFC 3339, where generated output will always be Z-normalized + Uses RFC 3339 (see clarification), where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. @@ -218,7 +220,136 @@ The following table shows how data is represented in JSON files. -### JSON Options {#json-options} +## ProtoJSON Wire Safety {#json-wire-safety} + +When using ProtoJSON, only some schema changes are safe to make in a distributed +system. This contrasts with the same concepts applied to the +[the binary wire format](/programming-guides/editions#updating). + +### JSON Wire-unsafe Changes {#wire-unsafe} + +Wire-unsafe changes are schema changes that will break if you parse data that +was serialized using the old schema with a parser that is using the new schema +(or vice versa). You should almost never do this shape of schema change. + +* Changing a field to or from an extension of same number and type is not + safe. +* Changing a field between `string` and `bytes` is not safe. +* Changing a field between a message type and `bytes` is not safe. +* Changing any field from `optional` to `repeated` is not safe. +* Changing a field between a `map` and the corresponding `repeated` + message field is not safe. +* Moving fields into an existing `oneof` is not safe. + +### JSON Wire-safe Changes {#wire-safe} + +Wire-safe changes are ones where it is fully safe to evolve the schema in this +way without risk of data loss or new parse failures. + +Note that nearly all wire-safe changes may be a breaking change to application +code. For example, adding a value to a preexisting enum would be a compilation +break for any code with an exhaustive switch on that enum. For that reason, +Google may avoid making some of these types of changes on public messages. The +AIPs contain guidance for which of these changes are safe to make there. + +* Changing a single `optional` field into a member of a **new** `oneof` is + safe. +* Changing a `oneof` which contains only one field to an `optional` field is + safe. +* Changing a field between any of `int32`, `sint32`, `sfixed32`, `fixed32` is + safe. +* Changing a field between any of `int64`, `sint64`, `sfixed64`, `fixed64` is + safe. +* Changing a field number is safe (as the field numbers are not used in the + ProtoJSON format), but still strongly discouraged since it is very unsafe in + the binary wire format. +* Adding values to an enum is safe if the "Emit enum values as integers" is + set on all relevant clients (see [options](#json-options)) + +### JSON Wire-compatible Changes (Conditionally safe) {#conditionally-safe} + +Unlike wire-safe changes, wire-compatible means that the same data can be parsed +both before and after a given change. However, a client that reads it will get +lossy data under this shape of change. For example, changing an int32 to an +int64 is a compatible change, but if a value larger than INT32_MAX is written, a +client that reads it as an int32 will discard the high order bits. + +You can make compatible changes to your schema only if you manage the roll out +to your system carefully. For example, you may change an int32 to an int64 but +ensure you continue to only write legal int32 values until the new schema is +deployed to all endpoints, and then start writing larger values after that. + +#### Compatible But With Unknown Field Handling Problems {#compatible-ish} + +Unlike the binary wire format, ProtoJSON implementations generally do not +propagate unknown fields. This means that adding to schemas is generally +compatible but will result in parse failures if a client using the old schema +observes the new content. + +This means you can add to your schema, but you cannot safely start writing them +until you know the schema has been deployed to the relevant client or server (or +that the relevant clients set an Ignore Unknown Fields flag, discussed +[below](#json-options)). + +* Adding and removing fields is considered compatible with this caveat. +* Removing enum values is considered compatible with this caveat. + +#### Compatible But Potentially Lossy {#compatible-lossy} + +* Changing between any of the 32-bit integers (`int32`, `uint32`, `sint32`, + `sfixed32`, `fixed32`) and any of the 64-bit integers ( `int64`, `uint64`, + `sint64`, `sfixed32`) is a compatible change. + * If a number is parsed from the wire that doesn't fit in the + corresponding type, you will get the same effect as if you had cast the + number to that type in C++ (for example, if a 64-bit number is read as + an int32, it will be truncated to 32 bits). + * Unlike binary wire format, `bool` is not compatible with integers. + * Note that the int64 types are quoted by default to avoid precision loss + when handled as a double or JavaScript number, and the 32 bit types are + unquoted by default. Conformant implementations will accept either case + for all integer types, but nonconformant implementations may mishandle + this case and not handle quoted int32s or unquoted int64s which may + break under this change. +* `enum` may be conditionally compatible with `string` + * If "enums-as-ints" flag is used by any client, then enums will instead + be compatible with the integer types instead. + +## RFC 3339 Clarification {#rfc3339} + +[RFC 3339](https://www.rfc-editor.org/rfc/rfc3339) intends to declare a strict +subset of ISO-8601 format, and unfortunately some ambiguity was created since +RFC 3339 was published in 2002 and then ISO-8601 was subsequently revised +without any corresponding revisions of RFC 3339. + +Most notably, ISO-8601-1988 contains this note: + +> In date and time representations lower case characters may be used when upper +> case characters are not available. + +It is ambiguous whether this note is suggesting that parsers should accept +lowercase letters in general, or if it is only suggesting that lowercase letters +may be used as a substitute in environments where uppercase cannot be +technically used. RFC 3339 contains a note that intends to clarify the +interpretation to be that lowercase letters should be accepted in general. + +ISO-8601-2019 does not contain the corresponding note and is unambiguous that +lowercase letters are not allowed. This created some confusion for all libraries +that declare they support RFC 3339: today RFC 3339 declares it is a profile of +ISO-8601 but contains a note that is in reference to something that is no longer +in the latest ISO-8601 spec. + +ProtoJSON spec takes the decision that the timestamp format is the stricter +definition of "RFC 3339 as a profile of ISO-8601-2019". Some Protobuf +implementations may be non-conformant by using a timestamp parsing +implementation that is implemented as "RFC 3339 as a profile of ISO-8601-1988," +which will accept a few additional edge cases. + +For consistent interoperability, parsers should only accept the stricter subset +format where possible. When using a non-conformant implementation that accepts +the laxer definition, strongly avoid relying on the additional edge cases being +accepted. + +## JSON Options {#json-options} A conformant protobuf JSON implementation may provide the following options: diff --git a/content/programming-guides/proto2.md b/content/programming-guides/proto2.md index 5c40e40f5..c1267d53f 100644 --- a/content/programming-guides/proto2.md +++ b/content/programming-guides/proto2.md @@ -825,7 +825,7 @@ enum PhoneType { PHONE_TYPE_UNSPECIFIED = 0; PHONE_TYPE_MOBILE = 1; PHONE_TYPE_HOME = 2; - PHONE_TYPE_WORK = 3 [deprecated=true]; + PHONE_TYPE_WORK = 3 [deprecated = true]; reserved 4,5; } ``` @@ -1018,102 +1018,121 @@ message types without breaking any of your existing code when you use the binary wire format. {{% alert title="Note" color="note" %}} If -you use JSON or +you use ProtoJSON or [proto text format](/reference/protobuf/textformat-spec) to store your protocol buffer messages, the changes that you can make in your -proto definition are different. {{% /alert %}} +proto definition are different. The ProtoJSON wire format safe changes are +described +[here](/programming-guides/json#json-wire-safety). +{{% /alert %}} Check [Proto Best Practices](/best-practices/dos-donts) and the following rules: -* Don't change the field numbers for any existing fields. "Changing" the field - number is equivalent to deleting the field and adding a new field with the - same type. If you want to renumber a field, see the instructions for - [deleting a field](#deleting). -* Any new fields that you add should be `optional` or `repeated`. This means - that any messages serialized by code using your "old" message format can - still be parsed by your new generated code, as they won't be missing any - `required` elements. You should keep in mind the [default values](#optional) - for these elements so that new code can properly interact with messages - generated by old code. Similarly, messages created by your new code can be - parsed by your old code: old binaries simply ignore the new field when - parsing. However, the unknown fields are not discarded, and if the message - is later serialized, the unknown fields are serialized along with it – so if - the message is passed on to new code, the new fields are still available. - See the [Unknown Fields](#unknowns) section for details. -* Non-required fields can be removed, as long as the field number is not used - again in your updated message type. You may want to rename the field - instead, perhaps adding the prefix "OBSOLETE_", or make the field number - [reserved](#fieldreserved), so that future users of your `.proto` can't - accidentally reuse the number. -* A non-required field can be converted to an [extension](#extensions) and - vice versa, as long as the type and number stay the same. -* `int32`, `uint32`, `int64`, `uint64`, and `bool` are all compatible – this - means you can change a field from one of these types to another without - breaking forwards- or backwards-compatibility. If a number is parsed from - the wire which doesn't fit in the corresponding type, you will get the same - effect as if you had cast the number to that type in C++ (for example, if a - 64-bit number is read as an int32, it will be truncated to 32 bits). +### Binary Wire-unsafe Changes {#wire-unsafe-changes} + +Wire-unsafe changes are schema changes that will break if you use parse data +that was serialized using the old schema with a parser that is using the new +schema (or vice versa). Only make wire-unsafe changes if you know that all +serializers and deserializers of the data are using the new schema. + +* Changing field numbers for any existing field is not safe. + * Changing the field number is equivalent to deleting the field and adding + a new field with the same type. If you want to renumber a field, see the + instructions for [deleting a field](#deleting). +* Moving fields into an existing `oneof` is not safe. + +### Binary Wire-safe Changes {#wire-safe-changes} + +Wire-safe changes are ones where it is fully safe to evolve the schema in this +way without risk of data loss or new parse failures. + +Note that any wire-safe changes may be a breaking change to application code in +a given language. For example, adding a value to a preexisting enum would be a +compilation break for any code with an exhaustive switch on that enum. For that +reason, Google may avoid making some of these types of changes on public +messages: the AIPs contain guidance for which of these changes are safe to make +there. + +* Adding new fields is safe. + * If you add new fields, any messages serialized by code using your "old" + message format can still be parsed by your new generated code. You + should keep in mind the [default values](#default) for these elements so + that new code can properly interact with messages generated by old code. + Similarly, messages created by your new code can be parsed by your old + code: old binaries simply ignore the new field when parsing. See the + [Unknown Fields](#unknowns) section for details. +* Removing fields is safe. + * The same field number must not used again in your updated message type. + You may want to rename the field instead, perhaps adding the prefix + "OBSOLETE_", or make the field number [reserved](#fieldreserved), so + that future users of your `.proto` can't accidentally reuse the number. +* Adding additional values to an enum is safe. +* Changing a single explicit presence field or extension into a member of a + **new** `oneof` is safe. +* Changing a `oneof` which contains only one field to an explicit presence + field is safe. +* Changing a field into an extension of same number and type is safe. + +### Binary Wire-compatible Changes (Conditionally Safe) {#conditionally-safe-changes} + +Unlike Wire-safe changes, wire-compatible means that the same data can be parsed +both before and after a given change. However, a parse of the data may be lossy +under this shape of change. For example, changing an int32 to an int64 is a +compatible change, but if a value larger than INT32_MAX is written, a client +that reads it as an int32 will discard the high order bits of the number. + +You can make compatible changes to your schema only if you manage the roll out +to your system carefully. For example, you may change an int32 to an int64 but +ensure you continue to only write legal int32 values until the new schema is +deployed to all endpoints, and then subsequently start writing larger values +after that. + +If your schema is published outside of your organization, you should generally +not make wire-compatible changes, as you cannot manage the deployment of the new +schema to know when the different range of values may be safe to use. + +* `int32`, `uint32`, `int64`, `uint64`, and `bool` are all compatible. + * If a number is parsed from the wire which doesn't fit in the + corresponding type, you will get the same effect as if you had cast the + number to that type in C++ (for example, if a 64-bit number is read as + an int32, it will be truncated to 32 bits). * `sint32` and `sint64` are compatible with each other but are *not* - compatible with the other integer types. If the value written was between - INT_MIN and INT_MAX inclusive it will parse as the same value with either - type. If an sint64 value was written outside of that range and parsed as an - sint32, the varint is truncated to 32 bits and then zigzag decoding occurs - (which will cause a different value to be observed). + compatible with the other integer types. + * If the value written was between INT_MIN and INT_MAX inclusive it will + parse as the same value with either type. If an sint64 value was written + outside of that range and parsed as an sint32, the varint is truncated + to 32 bits and then zigzag decoding occurs (which will cause a different + value to be observed). * `string` and `bytes` are compatible as long as the bytes are valid UTF-8. * Embedded messages are compatible with `bytes` if the bytes contain an encoded instance of the message. * `fixed32` is compatible with `sfixed32`, and `fixed64` with `sfixed64`. * For `string`, `bytes`, and message fields, singular is compatible with - `repeated`. Given serialized data of a repeated field as input, clients that - expect this field to be singular will take the last input value if it's a - primitive type field or merge all input elements if it's a message type - field. Note that this is **not** generally safe for numeric types, including - bools and enums. Repeated fields of numeric types may be serialized in the - [packed](/programming-guides/encoding#packed) format, - which will not be parsed correctly when a singular field is expected. -* Changing a default value is generally OK, as long as you remember that - default values are never sent over the wire. Thus, if a program receives a - message in which a particular field isn't set, the program will see the - default value as it was defined in that program's version of the protocol. - It will NOT see the default value that was defined in the sender's code. -* `enum` is compatible with `int32`, `uint32`, `int64`, and `uint64` in terms - of wire format (note that values will be truncated if they don't fit). - However, be aware that client code may treat them differently when the - message is deserialized. Notably, unrecognized `enum` values are discarded - when the message is deserialized, which makes the field's `has..` accessor - return false and its getter return the first value listed in the `enum` - definition, or the default value if one is specified. In the case of - repeated enum fields, any unrecognized values are stripped out of the list. - However, an integer field will always preserve its value. Because of this, - you need to be very careful when upgrading an integer to an `enum` in terms - of receiving out of bounds enum values on the wire. -* In the current Java and C++ implementations, when unrecognized `enum` values - are stripped out, they are stored along with other unknown fields. Note that - this can result in strange behavior if this data is serialized and then - reparsed by a client that recognizes these values. In the case of optional - fields, even if a new value was written after the original message was - deserialized, the old value will be still read by clients that recognize it. - In the case of repeated fields, the old values will appear after any - recognized and newly-added values, which means that order will not be - preserved. -* Changing a single `optional` field or extension into a member of a **new** - `oneof` is binary compatible, however for some languages (notably, Go) the - generated code's API will change in incompatible ways. For this reason, - Google does not make such changes in its public APIs, as documented in - [AIP-180](https://google.aip.dev/180#moving-into-oneofs). With - the same caveat about source-compatibility, moving multiple fields into a - new `oneof` may be safe if you are sure that no code sets more than one at a - time. Moving fields into an existing `oneof` is not safe. Likewise, changing - a single field `oneof` to an `optional` field or extension is safe. + `repeated`. + * Given serialized data of a repeated field as input, clients that expect + this field to be singular will take the last input value if it's a + primitive type field or merge all input elements if it's a message type + field. Note that this is **not** generally safe for numeric types, + including bools and enums. Repeated fields of numeric types are + serialized in the + [packed](/programming-guides/encoding#packed) + format by default, which will not be parsed correctly when a singular + field is expected. +* `enum` is compatible with `int32`, `uint32`, `int64`, and `uint64` + * Be aware that client code may treat them differently when the message is + deserialized: for example, unrecognized proto3 `enum` values will be + preserved in the message, but how this is represented when the message + is deserialized is language-dependent. * Changing a field between a `map` and the corresponding `repeated` message field is binary compatible (see [Maps](#maps), below, for the - message layout and other restrictions). However, the safety of the change is - application-dependent: when deserializing and reserializing a message, - clients using the `repeated` field definition will produce a semantically - identical result; however, clients using the `map` field definition may - reorder entries and drop entries with duplicate keys. + message layout and other restrictions). + * However, the safety of the change is application-dependent: when + deserializing and reserializing a message, clients using the `repeated` + field definition will produce a semantically identical result; however, + clients using the `map` field definition may reorder entries and drop + entries with duplicate keys. ## Unknown Fields {#unknowns} @@ -1221,9 +1240,7 @@ message UserContent { full_name: ".kittens.kitten_videos", type: ".kittens.Video", repeated: true - }, - // Ensures all field numbers in this extension range are declarations. - verification = DECLARATION + } ]; } ``` @@ -2036,7 +2053,7 @@ Here are a few of the most commonly used options: statement. ```proto - optional int32 old_field = 6 [deprecated=true]; + optional int32 old_field = 6 [deprecated = true]; ``` ### Enum Value Options {#enum-value-options} diff --git a/content/programming-guides/proto3.md b/content/programming-guides/proto3.md index 79031d268..974fa2017 100644 --- a/content/programming-guides/proto3.md +++ b/content/programming-guides/proto3.md @@ -979,79 +979,121 @@ message types without breaking any of your existing code when you use the binary wire format. {{% alert title="Note" color="note" %}} If -you use JSON or +you use ProtoJSON or [proto text format](/reference/protobuf/textformat-spec) to store your protocol buffer messages, the changes that you can make in your -proto definition are different. {{% /alert %}} +proto definition are different. The ProtoJSON wire format safe changes are +described +[here](/programming-guides/json#json-wire-safety). +{{% /alert %}} Check [Proto Best Practices](/best-practices/dos-donts) and the following rules: -* Don't change the field numbers for any existing fields. "Changing" the field - number is equivalent to deleting the field and adding a new field with the - same type. If you want to renumber a field, see the instructions for - [deleting a field](#deleting). -* If you add new fields, any messages serialized by code using your "old" - message format can still be parsed by your new generated code. You should - keep in mind the [default values](#default) for these elements so that new - code can properly interact with messages generated by old code. Similarly, - messages created by your new code can be parsed by your old code: old - binaries simply ignore the new field when parsing. See the - [Unknown Fields](#unknowns) section for details. -* Fields can be removed, as long as the field number is not used again in your - updated message type. You may want to rename the field instead, perhaps - adding the prefix "OBSOLETE_", or make the field number - [reserved](#fieldreserved), so that future users of your `.proto` can't - accidentally reuse the number. -* `int32`, `uint32`, `int64`, `uint64`, and `bool` are all compatible – this - means you can change a field from one of these types to another without - breaking forwards- or backwards-compatibility. If a number is parsed from - the wire which doesn't fit in the corresponding type, you will get the same - effect as if you had cast the number to that type in C++ (for example, if a - 64-bit number is read as an int32, it will be truncated to 32 bits). +### Binary Wire-unsafe Changes {#wire-unsafe-changes} + +Wire-unsafe changes are schema changes that will break if you use parse data +that was serialized using the old schema with a parser that is using the new +schema (or vice versa). Only make wire-unsafe changes if you know that all +serializers and deserializers of the data are using the new schema. + +* Changing field numbers for any existing field is not safe. + * Changing the field number is equivalent to deleting the field and adding + a new field with the same type. If you want to renumber a field, see the + instructions for [deleting a field](#deleting). +* Moving fields into an existing `oneof` is not safe. + +### Binary Wire-safe Changes {#wire-safe-changes} + +Wire-safe changes are ones where it is fully safe to evolve the schema in this +way without risk of data loss or new parse failures. + +Note that any wire-safe changes may be a breaking change to application code in +a given language. For example, adding a value to a preexisting enum would be a +compilation break for any code with an exhaustive switch on that enum. For that +reason, Google may avoid making some of these types of changes on public +messages: the AIPs contain guidance for which of these changes are safe to make +there. + +* Adding new fields is safe. + * If you add new fields, any messages serialized by code using your "old" + message format can still be parsed by your new generated code. You + should keep in mind the [default values](#default) for these elements so + that new code can properly interact with messages generated by old code. + Similarly, messages created by your new code can be parsed by your old + code: old binaries simply ignore the new field when parsing. See the + [Unknown Fields](#unknowns) section for details. +* Removing fields is safe. + * The same field number must not used again in your updated message type. + You may want to rename the field instead, perhaps adding the prefix + "OBSOLETE_", or make the field number [reserved](#fieldreserved), so + that future users of your `.proto` can't accidentally reuse the number. +* Adding additional values to an enum is safe. +* Changing a single explicit presence field or extension into a member of a + **new** `oneof` is safe. +* Changing a `oneof` which contains only one field to an explicit presence + field is safe. +* Changing a field into an extension of same number and type is safe. + +### Binary Wire-compatible Changes (Conditionally Safe) {#conditionally-safe-changes} + +Unlike Wire-safe changes, wire-compatible means that the same data can be parsed +both before and after a given change. However, a parse of the data may be lossy +under this shape of change. For example, changing an int32 to an int64 is a +compatible change, but if a value larger than INT32_MAX is written, a client +that reads it as an int32 will discard the high order bits of the number. + +You can make compatible changes to your schema only if you manage the roll out +to your system carefully. For example, you may change an int32 to an int64 but +ensure you continue to only write legal int32 values until the new schema is +deployed to all endpoints, and then subsequently start writing larger values +after that. + +If your schema is published outside of your organization, you should generally +not make wire-compatible changes, as you cannot manage the deployment of the new +schema to know when the different range of values may be safe to use. + +* `int32`, `uint32`, `int64`, `uint64`, and `bool` are all compatible. + * If a number is parsed from the wire which doesn't fit in the + corresponding type, you will get the same effect as if you had cast the + number to that type in C++ (for example, if a 64-bit number is read as + an int32, it will be truncated to 32 bits). * `sint32` and `sint64` are compatible with each other but are *not* - compatible with the other integer types. If the value written was between - INT_MIN and INT_MAX inclusive it will parse as the same value with either - type. If an sint64 value was written outside of that range and parsed as an - sint32, the varint is truncated to 32 bits and then zigzag decoding occurs - (which will cause a different value to be observed). + compatible with the other integer types. + * If the value written was between INT_MIN and INT_MAX inclusive it will + parse as the same value with either type. If an sint64 value was written + outside of that range and parsed as an sint32, the varint is truncated + to 32 bits and then zigzag decoding occurs (which will cause a different + value to be observed). * `string` and `bytes` are compatible as long as the bytes are valid UTF-8. * Embedded messages are compatible with `bytes` if the bytes contain an encoded instance of the message. * `fixed32` is compatible with `sfixed32`, and `fixed64` with `sfixed64`. * For `string`, `bytes`, and message fields, singular is compatible with - `repeated`. Given serialized data of a repeated field as input, clients that - expect this field to be singular will take the last input value if it's a - primitive type field or merge all input elements if it's a message type - field. Note that this is **not** generally safe for numeric types, including - bools and enums. Repeated fields of numeric types are serialized in the - [packed](/programming-guides/encoding#packed) format - by default, which will not be parsed correctly when a singular field is - expected. -* `enum` is compatible with `int32`, `uint32`, `int64`, and `uint64` in terms - of wire format (note that values will be truncated if they don't fit). - However, be aware that client code may treat them differently when the - message is deserialized: for example, unrecognized proto3 `enum` values will - be preserved in the message, but how this is represented when the message is - deserialized is language-dependent. Int fields always just preserve their - value. -* Changing a single `optional` field or extension into a member of a **new** - `oneof` is binary compatible, however for some languages (notably, Go) the - generated code's API will change in incompatible ways. For this reason, - Google does not make such changes in its public APIs, as documented in - [AIP-180](https://google.aip.dev/180#moving-into-oneofs). With - the same caveat about source-compatibility, moving multiple fields into a - new `oneof` may be safe if you are sure that no code sets more than one at a - time. Moving fields into an existing `oneof` is not safe. Likewise, changing - a single field `oneof` to an `optional` field or extension is safe. + `repeated`. + * Given serialized data of a repeated field as input, clients that expect + this field to be singular will take the last input value if it's a + primitive type field or merge all input elements if it's a message type + field. Note that this is **not** generally safe for numeric types, + including bools and enums. Repeated fields of numeric types are + serialized in the + [packed](/programming-guides/encoding#packed) + format by default, which will not be parsed correctly when a singular + field is expected. +* `enum` is compatible with `int32`, `uint32`, `int64`, and `uint64` + * Be aware that client code may treat them differently when the message is + deserialized: for example, unrecognized proto3 `enum` values will be + preserved in the message, but how this is represented when the message + is deserialized is language-dependent. * Changing a field between a `map` and the corresponding `repeated` message field is binary compatible (see [Maps](#maps), below, for the - message layout and other restrictions). However, the safety of the change is - application-dependent: when deserializing and reserializing a message, - clients using the `repeated` field definition will produce a semantically - identical result; however, clients using the `map` field definition may - reorder entries and drop entries with duplicate keys. + message layout and other restrictions). + * However, the safety of the change is application-dependent: when + deserializing and reserializing a message, clients using the `repeated` + field definition will produce a semantically identical result; however, + clients using the `map` field definition may reorder entries and drop + entries with duplicate keys. ## Unknown Fields {#unknowns} diff --git a/content/programming-guides/style.md b/content/programming-guides/style.md index 928246f04..3113660ae 100644 --- a/content/programming-guides/style.md +++ b/content/programming-guides/style.md @@ -150,7 +150,7 @@ enum name, so the same name in two sibling enums is not allowed. For example, the following would be rejected by protoc since the `SET` value defined in the two enums are considered to be in the same scope: -```proto +```proto {.bad} enum CollectionType { COLLECTION_TYPE_UNSPECIFIED = 0; SET = 1; @@ -158,6 +158,8 @@ enum CollectionType { ARRAY = 3; } +// Won't compile - `SET` enum name will clash +// with the one defined in `CollectionType` enum. enum TennisVictoryType { TENNIS_VICTORY_TYPE_UNSPECIFIED = 0; GAME = 1; diff --git a/content/reference/cpp/cpp-generated.md b/content/reference/cpp/cpp-generated.md index 9a416ff4e..693c6f02d 100644 --- a/content/reference/cpp/cpp-generated.md +++ b/content/reference/cpp/cpp-generated.md @@ -6,13 +6,13 @@ description = "Describes exactly what C++ code the protocol buffer compiler gene type = "docs" +++ -Any differences between proto2 and proto3 generated code are highlighted - note -that these differences are in the generated code as described in this document, -not the base message classes/interfaces, which are the same in both versions. -You should read the -[proto2 language guide](/programming-guides/proto2) -and/or -[proto3 language guide](/programming-guides/proto3) +Any differences between proto2, proto3, and editions generated code are +highlighted. Note that these differences are in the generated code as described +in this document, not the base message classes/interfaces, which are the same in +all versions. You should read the +[proto2 language guide](/programming-guides/proto2), +[proto3 language guide](/programming-guides/proto3), or +[edition 2023 language guide](/programming-guides/editions) before reading this document. ## Compiler Invocation {#invocation} @@ -233,51 +233,52 @@ any method inherited from `Message` or accessing the message through other ways Correspondingly, the value of the returned pointer is never guaranteed to be the same across two different invocations of the accessor. -### Optional Numeric Fields (proto2 and proto3) {#numeric} +### Explicit Presence Numeric Fields {#numeric} -For either of these field definitions: +For field definitions for numeric fields with +[explicit presence](/programming-guides/field_presence#presence-in-proto2-apis): ```proto -optional int32 foo = 1; -required int32 foo = 1; +int32 foo = 1; ``` The compiler will generate the following accessor methods: - `bool has_foo() const`: Returns `true` if the field is set. -- `int32 foo() const`: Returns the current value of the field. If the field is - not set, returns the default value. -- `void set_foo(int32 value)`: Sets the value of the field. After calling +- `int32_t foo() const`: Returns the current value of the field. If the field + is not set, returns the default value. +- `void set_foo(::int32_t value)`: Sets the value of the field. After calling this, `has_foo()` will return `true` and `foo()` will return `value`. - `void clear_foo()`: Clears the value of the field. After calling this, `has_foo()` will return `false` and `foo()` will return the default value. -For other numeric field types (including `bool`), `int32` is replaced with the +For other numeric field types (including `bool`), `int32_t` is replaced with the corresponding C++ type according to the -[scalar value types table](/programming-guides/proto3#scalar). +[scalar value types table](/programming-guides/editions#scalar). -### Implicit Presence Numeric Fields (proto3) {#implicit-numeric} +### Implicit Presence Numeric Fields {#implicit-numeric} -For the below field definition: +For field definitions for numeric fields with +[implicit presence](/programming-guides/field_presence#presence-in-proto2-apis): ```proto -int32 foo = 1; // no field label specified, defaults to implicit presence. +int32 foo = 1; ``` The compiler will generate the following accessor methods: -- `int32 foo() const`: Returns the current value of the field. If the field is - not set, returns 0. -- `void set_foo(int32 value)`: Sets the value of the field. After calling +- `::int32_t foo() const`: Returns the current value of the field. If the + field is not set, returns 0. +- `void set_foo(::int32_t value)`: Sets the value of the field. After calling this, `foo()` will return `value`. - `void clear_foo()`: Clears the value of the field. After calling this, `foo()` will return 0. -For other numeric field types (including `bool`), `int32` is replaced with the +For other numeric field types (including `bool`), `int32_t` is replaced with the corresponding C++ type according to the -[scalar value types table](/programming-guides/proto3#scalar). +[scalar value types table](/programming-guides/editions#scalar). -### Optional String/Bytes Fields (proto2 and proto3) {#string} +### Explicit Presence String/Bytes Fields {#string} **Note:** As of edition 2023, if [`features.(pb.cpp).string_type`](/editions/features#string_type) @@ -285,13 +286,12 @@ is set to `VIEW`, [string_view](/reference/cpp/string-view#singular-view) APIs will be generated instead. -For any of these field definitions: +For these field definitions with +[explicit presence](/programming-guides/field_presence#presence-in-proto2-apis): ```proto -optional string foo = 1; -required string foo = 1; -optional bytes foo = 1; -required bytes foo = 1; +string foo = 1; +bytes foo = 2; ``` The compiler will generate the following accessor methods: @@ -299,29 +299,19 @@ The compiler will generate the following accessor methods: - `bool has_foo() const`: Returns `true` if the field is set. - `const string& foo() const`: Returns the current value of the field. If the field is not set, returns the default value. -- `void set_foo(::absl::string_view value)`: Sets the value of the field. - After calling this, `has_foo()` will return `true` and `foo()` will return a - copy of `value`. -- `void set_foo(const string& value)`: Sets the value of the field. After - calling this, `has_foo()` will return `true` and `foo()` will return a copy - of `value`. -- `void set_foo(string&& value)`: Sets the value of the field, moving from the - passed string. After calling this, `has_foo()` will return `true` and - `foo()` will return a copy of `value`. -- `void set_foo(const char* value)`: Sets the value of the field using a - C-style null-terminated string. After calling this, `has_foo()` will return - `true` and `foo()` will return a copy of `value`. -- `void set_foo(const char* value, int size)`: Sets the value of the field - using a string with an explicit size specified, rather than determined by - looking for a null-terminator byte. After calling this, `has_foo()` will - return `true` and `foo()` will return a copy of `value`. +- `void set_foo(...)`: Sets the value of the field. After calling this, + `has_foo()` will return `true` and `foo()` will return a copy of `value`. - `string* mutable_foo()`: Returns a pointer to the mutable `string` object that stores the field's value. If the field was not set prior to the call, then the returned string will be empty (*not* the default value). After calling this, `has_foo()` will return `true` and `foo()` will return whatever value is written into the given string. + + **Note:** This method will be removed in the new `string_view` APIs. + - `void clear_foo()`: Clears the value of the field. After calling this, `has_foo()` will return `false` and `foo()` will return the default value. + - `void set_allocated_foo(string* value)`: Sets the `string` object to the field and frees the previous field value if it exists. If the @@ -330,6 +320,7 @@ The compiler will generate the following accessor methods: delete the allocated `string` object at any time, so references to the object may be invalidated. Otherwise, if the `value` is `NULL`, the behavior is the same as calling `clear_foo()`. + - `string* release_foo()`: Releases the ownership of the field and returns the pointer of the `string` object. After @@ -338,7 +329,7 @@ The compiler will generate the following accessor methods: -### Implicit Presence String/Bytes Fields (proto3) {#implicit-string} +### Implicit Presence String/Bytes Fields {#implicit-string} **Note:** As of edition 2023, if [`features.(pb.cpp).string_type`](/editions/features#string_type) @@ -346,30 +337,20 @@ is set to `VIEW`, [string_view](/reference/cpp/string-view#singular-view) APIs will be generated instead. -For either of these field definitions: +For these field definitions with +[implicit presence](/programming-guides/field_presence#presence-in-proto2-apis): ```proto -string foo = 1; // no field label specified, defaults to implicit presence. -bytes foo = 1; +string foo = 1 [features.field_presence = IMPLICIT]; +bytes foo = 1 [features.field_presence = IMPLICIT]; ``` The compiler will generate the following accessor methods: - `const string& foo() const`: Returns the current value of the field. If the field is not set, returns the empty string/empty bytes. -- `void set_foo(::absl::string_view value)`: Sets the value of the field. +- `void set_foo(Arg_&& arg, Args_... args)`: Sets the value of the field. After calling this, `foo()` will return a copy of `value`. -- `void set_foo(const string& value)`: Sets the value of the field. After - calling this, `foo()` will return a copy of `value`. -- `void set_foo(string&& value)`: Sets the value of the field, moving from the - passed string. After calling this, `foo()` will return a copy of `value`. -- `void set_foo(const char* value)`: Sets the value of the field using a - C-style null-terminated string. After calling this, `foo()` will return a - copy of `value`. -- `void set_foo(const char* value, int size)`: Sets the value of the field - using a string with an explicit size specified, rather than determined by - looking for a null-terminator byte. After calling this, `foo()` will return - a copy of `value`. - `string* mutable_foo()`: Returns a pointer to the mutable `string` object that stores the field's value. If the field was not set prior to the call, then the returned string will be empty. After calling this, `foo()` will @@ -401,8 +382,9 @@ To set a singular `bytes` field to store data using `absl::Cord`, use the following syntax: ```proto -optional bytes foo = 25 [ctype=CORD]; -bytes bar = 26 [ctype=CORD]; +// edition (default settings) +bytes foo = 25 [ctype=CORD]; +bytes foo = 26 [ctype=CORD, features.field_presence = IMPLICIT]; ``` Using `cord` is not available for `repeated bytes` fields. Protoc ignores @@ -412,16 +394,18 @@ The compiler will generate the following accessor methods: - `const ::absl::Cord& foo() const`: Returns the current value of the field. If the field is not set, returns an empty `Cord` (proto3) or the default - value (proto2). + value (proto2 and editions). - `void set_foo(const ::absl::Cord& value)`: Sets the value of the field. After calling this, `foo()` will return `value`. - `void set_foo(::absl::string_view value)`: Sets the value of the field. After calling this, `foo()` will return `value` as an `absl::Cord`. - `void clear_foo()`: Clears the value of the field. After calling this, - `foo()` will return an empty `Cord` (proto3) or the default value (proto2). -- `bool has_foo()`: Returns `true` if the field is set. + `foo()` will return an empty `Cord` (proto3) or the default value (proto2 + and editions). +- `bool has_foo()`: Returns `true` if the field is set. Only applies for the + `optional` field in proto3 and the explicit presence field in editions. -### Optional Enum Fields (proto2 and proto3) {#enum_field} +### Explicit Presence Enum Fields {#enum_field} Given the enum type: @@ -433,11 +417,11 @@ enum Bar { } ``` -For either of these field definitions: +For this field definition with +[explicit presence](/programming-guides/field_presence#presence-in-proto2-apis): ```proto -optional Bar bar = 1; -required Bar bar = 1; +Bar bar = 1; ``` The compiler will generate the following accessor methods: @@ -452,7 +436,7 @@ The compiler will generate the following accessor methods: - `void clear_bar()`: Clears the value of the field. After calling this, `has_bar()` will return `false` and `bar()` will return the default value. -### Implicit Presence Enum Fields (proto3) {#implicit-enum} +### Implicit Presence Enum Fields {#implicit-enum} Given the enum type: @@ -464,10 +448,11 @@ enum Bar { } ``` -For this field definition: +For this field definition with +[implicit presence](/programming-guides/field_presence#presence-in-proto2-apis): ```proto -Bar bar = 1; // no field label specified, defaults to implicit presence. +Bar bar = 1; ``` The compiler will generate the following accessor methods: @@ -479,7 +464,7 @@ The compiler will generate the following accessor methods: - `void clear_bar()`: Clears the value of the field. After calling this, `bar()` will return the default value. -### Optional Embedded Message Fields (proto2 and proto3) {#embeddedmessage} +### Explicit Presence Embedded Message Fields {#embeddedmessage} Given the message type: @@ -487,14 +472,10 @@ Given the message type: message Bar {} ``` -For any of these field definitions: +For this field definition with +[explicit presence](/programming-guides/field_presence#presence-in-proto2-apis): ```proto -//proto2 -optional Bar bar = 1; -required Bar bar = 1; - -//proto3 Bar bar = 1; ``` @@ -512,7 +493,7 @@ The compiler will generate the following accessor methods: `Bar`. - `void clear_bar()`: Clears the value of the field. After calling this, `has_bar()` will return `false` and `bar()` will return the default value. -- `void set_allocated_bar(Bar* bar)`: Sets the `Bar` object to the field and +- `void set_allocated_bar(Bar* value)`: Sets the `Bar` object to the field and frees the previous field value if it exists. If the `Bar` pointer is not `NULL`, the message takes ownership of the allocated `Bar` object and `has_bar()` will return `true`. Otherwise, if the `Bar` is `NULL`, the @@ -536,26 +517,26 @@ The compiler will generate the following accessor methods: field. To check for an empty set, consider using the [`empty()`](/reference/cpp/api-docs/google.protobuf.repeated_field#RepeatedPtrField) method in the underlying `RepeatedField` instead of this method. -- `int32 foo(int index) const`: Returns the element at the given zero-based +- `int32_t foo(int index) const`: Returns the element at the given zero-based index. Calling this method with index outside of [0, foo_size()) yields undefined behavior. -- `void set_foo(int index, int32 value)`: Sets the value of the element at the - given zero-based index. -- `void add_foo(int32 value)`: Appends a new element to the end of the field +- `void set_foo(int index, int32_t value)`: Sets the value of the element at + the given zero-based index. +- `void add_foo(int32_t value)`: Appends a new element to the end of the field with the given value. - `void clear_foo()`: Removes all elements from the field. After calling this, `foo_size()` will return zero. -- `const RepeatedField& foo() const`: Returns the underlying +- `const RepeatedField& foo() const`: Returns the underlying [`RepeatedField`](/reference/cpp/api-docs/google.protobuf.repeated_field#RepeatedField) that stores the field's elements. This container class provides STL-like iterators and other methods. -- `RepeatedField* mutable_foo()`: Returns a pointer to the underlying +- `RepeatedField* mutable_foo()`: Returns a pointer to the underlying mutable `RepeatedField` that stores the field's elements. This container class provides STL-like iterators and other methods. -For other numeric field types (including `bool`), `int32` is replaced with the +For other numeric field types (including `bool`), `int32_t` is replaced with the corresponding C++ type according to the -[scalar value types table](/programming-guides/proto2#scalar). +[scalar value types table](/programming-guides/editions#scalar). ### Repeated String Fields {#repeatedstring} @@ -649,8 +630,8 @@ The compiler will generate the following accessor methods: undefined behavior. - `void set_bar(int index, Bar value)`: Sets the value of the element at the given zero-based index. In debug mode (i.e. NDEBUG is not defined), if - `value` does not match any of the values defined for `Bar`, this method will - abort the process. + `value` does not match any of the values defined for `Bar` and it is a + closed enum, this method will abort the process. - `void add_bar(Bar value)`: Appends a new element to the end of the field with the given value. In debug mode (i.e. NDEBUG is not defined), if `value` does not match any of the values defined for `Bar`, this method will abort @@ -732,15 +713,15 @@ The compiler will generate the following accessor methods: `has_foo()` will return `false`, `foo()` will return the default value and `example_name_case()` will return `EXAMPLE_NAME_NOT_SET`. -For other numeric field types (including `bool`),`int32` is replaced with the +For other numeric field types (including `bool`),`int32_t` is replaced with the corresponding C++ type according to the -[scalar value types table](/programming-guides/proto3#scalar). +[scalar value types table](/programming-guides/editions#scalar). ### Oneof String Fields {#oneof-string} -**Note:** As of edition 2023 +**Note:** As of edition 2023, [string_view](/reference/cpp/string-view#oneof-view) APIs -may be generated instead +may be generated instead. For any of these [oneof](#oneof) field definitions: @@ -786,8 +767,8 @@ The compiler will generate the following accessor methods: written into the given string and `example_name_case()` will return `kFoo`. - `void clear_foo()`: - - If the oneof case is not `kFoo`, nothing will be changed . - - If the oneof case is `kFoo`, frees the field and clears the oneof case . + - If the oneof case is not `kFoo`, nothing will be changed. + - If the oneof case is `kFoo`, frees the field and clears the oneof case. `has_foo()` will return `false`, `foo()` will return the default value, and `example_name_case()` will return `EXAMPLE_NAME_NOT_SET`. - `void set_allocated_foo(string* value)`: @@ -800,7 +781,7 @@ The compiler will generate the following accessor methods: `example_name_case()` will return `EXAMPLE_NAME_NOT_SET`. - `string* release_foo()`: - Returns `NULL` if oneof case is not `kFoo`. - - Clears the oneof case, releases the ownership of the field and returns + - Clears the oneof case, releases the ownership of the field, and returns the pointer of the string object. After calling this, caller takes the ownership of the allocated string object, `has_foo()` will return false, `foo()` will return the default value, and `example_name_case()` will @@ -838,8 +819,9 @@ The compiler will generate the following accessor methods: - Sets the value of this field and sets the oneof case to `kBar`. - `has_bar()` will return `true`, `bar()` will return `value` and `example_name_case()` will return `kBar`. - - In debug mode (i.e. NDEBUG is not defined), if `value` does not match - any of the values defined for `Bar`, this method will abort the process. + - In debug mode (that is, NDEBUG is not defined), if `value` does not + match any of the values defined for `Bar` and it is a closed enum, this + method will abort the process. - `void clear_bar()`: - Nothing will be changed if the oneof case is not `kBar`. - If the oneof case is `kBar`, clears the value of the field and the oneof @@ -875,7 +857,7 @@ The compiler will generate the following accessor methods: - Sets the oneof case to `kBar` and returns a pointer to the mutable Bar object that stores the field's value. If the oneof case was not `kBar` prior to the call, then the returned Bar will have none of its fields - set (i.e. it will be identical to a newly-allocated Bar). + set (that is, it will be identical to a newly-allocated Bar). - After calling this, `has_bar()` will return `true`, `bar()` will return a reference to the same instance of `Bar` and `example_name_case()` will return `kBar`. @@ -899,7 +881,7 @@ The compiler will generate the following accessor methods: ownership of the field and returns the pointer of the `Bar` object. After calling this, caller takes the ownership of the allocated `Bar` object, `has_bar()` will return `false`, `bar()` will return the default - value and `example_name_case()` will return `EXAMPLE_NAME_NOT_SET`. + value, and `example_name_case()` will return `EXAMPLE_NAME_NOT_SET`. ### Map Fields {#map} @@ -1018,9 +1000,11 @@ If there are unknown fields in the wire format of a map entry message, they will be discarded. If there is an unknown enum value in the wire format of a map entry message, -it's handled differently in proto2 and proto3. In proto2, the whole map entry -message is put into the unknown field set of the containing message. In proto3, -it is put into a map field as if it is a known enum value. +it's handled differently in proto2, proto3, and editions. In proto2, the whole +map entry message is put into the unknown field set of the containing message. +In proto3, it is put into a map field as if it is a known enum value. With +editions, by default it mirrors the proto3 behavior. If `features.enum_type` is +set to `CLOSED`, then it mirrors the proto2 behavior. ## Any {#any} @@ -1142,13 +1126,15 @@ These semantics have been changed in proto3. It's safe to cast any integer to a proto3 enum value as long as it fits into int32. Invalid enum values will also be kept when parsing a proto3 message and returned by enum field accessors. -**Be careful when using proto3 enums in switch statements.** Proto3 enums are -open enum types with possible values outside the range of specified symbols. -Unrecognized enum values will be kept when parsing a proto3 message and returned -by the enum field accessors. A switch statement on a proto3 enum without a -default case will not be able to catch all cases even if all the known fields -are listed. This could lead to unexpected behavior including data corruption and -runtime crashes. **Always add a default case or explicitly call +**Be careful when using proto3 and editions enums in switch statements.** Proto3 +and editions enums are open enum types with possible values outside the range of +specified symbols. (Editions enums may be set to closed enums using the +[`enum_type`](/editions/features#enum_type) feature.) +Unrecognized enum values for open enums types will be kept when parsing a +message and returned by the enum field accessors. A switch statement on an open +enum without a default case will not be able to catch all cases, even if all the +known fields are listed. This could lead to unexpected behavior, including data +corruption and runtime crashes. **Always add a default case or explicitly call `Foo_IsValid(int)` outside of the switch to handle unknown enum values.** You can define an enum inside a message type. In this case, the protocol buffer @@ -1156,12 +1142,12 @@ compiler generates code that makes it appear that the enum type itself was declared nested inside the message's class. The `Foo_descriptor()` and `Foo_IsValid()` functions are declared as static methods. In reality, the enum type itself and its values are declared at the global scope with mangled names, -and are imported into the class's scope with a typedef and a series of constant -definitions. This is done only to get around problems with declaration ordering. -Do not depend on the mangled top-level names; pretend the enum really is nested -in the message class. +and are imported into the class's scope with a `typedef` and a series of +constant definitions. This is done only to get around problems with declaration +ordering. Do not depend on the mangled top-level names; pretend the enum really +is nested in the message class. -## Extensions (proto2 only) {#extension} +## Extensions (proto2 and editions only) {#extension} Given a message with an extension range: @@ -1175,12 +1161,12 @@ The protocol buffer compiler will generate some additional methods for `Foo`: `HasExtension()`, `ExtensionSize()`, `ClearExtension()`, `GetExtension()`, `SetExtension()`, `MutableExtension()`, `AddExtension()`, `SetAllocatedExtension()` and `ReleaseExtension()`. Each of these methods takes, -as its first parameter, an extension identifier (described below), which -identifies an extension field. The remaining parameters and the return value are -exactly the same as those for the corresponding accessor methods that would be -generated for a normal (non-extension) field of the same type as the extension -identifier. (`GetExtension()` corresponds to the accessors with no special -prefix.) +as its first parameter, an extension identifier (described later in this +section), which identifies an extension field. The remaining parameters and the +return value are exactly the same as those for the corresponding accessor +methods that would be generated for a normal (non-extension) field of the same +type as the extension identifier. (`GetExtension()` corresponds to the accessors +with no special prefix.) Given an extension definition: @@ -1339,9 +1325,9 @@ The following static method is also generated: The protocol buffer compiler also generates a "stub" implementation of every service interface, which is used by clients wishing to send requests to servers -implementing the service. For the `Foo` service (above), the stub implementation -`Foo_Stub` will be defined. As with nested message types, a typedef is used so -that `Foo_Stub` can also be referred to as `Foo::Stub`. +implementing the service. For the `Foo` service (described earlier), the stub +implementation `Foo_Stub` will be defined. As with nested message types, a +`typedef` is used so that `Foo_Stub` can also be referred to as `Foo::Stub`. `Foo_Stub` is a subclass of `Foo` which also implements the following methods: @@ -1383,9 +1369,9 @@ appears in both the `.pb.cc` file and the `.pb.h` file unless otherwise noted. - `global_scope`: Declarations that belong at the top level, outside of the file's namespace. Appears at the very end of the file. - `class_scope:TYPENAME`: Member declarations that belong in a message class. - `TYPENAME` is the full proto name, e.g. `package.MessageType`. Appears after - all other public declarations in the class. This insertion point appears - only in the `.pb.h` file. + `TYPENAME` is the full proto name, such as `package.MessageType`. Appears + after all other public declarations in the class. This insertion point + appears only in the `.pb.h` file. Do not generate code which relies on private class members declared by the standard code generator, as these implementation details may change in future diff --git a/content/reference/go/faq.md b/content/reference/go/faq.md index 4cfe14fef..922d3ae7f 100644 --- a/content/reference/go/faq.md +++ b/content/reference/go/faq.md @@ -31,16 +31,20 @@ and [`github.com/golang/protobuf/ptypes/empty`](https://pkg.go.dev/github.com/golang/protobuf/ptypes/empty) may be used interchangeably. -### What are `proto1`, `proto2`, and `proto3`? {#proto-versions} +### What are `proto1`, `proto2`, `proto3`, and editions? {#proto-versions} These are revisions of the protocol buffer *language*. It is different from the Go *implementation* of protobufs. -* `proto3` is the current version of the language. This is the most commonly - used version of the language. We encourage new code to use proto3. +* **Editions** are the newest and recommended way of writing Protocol Buffers. + New features will be released as part of new editions. For more information, + see [Protocol Buffer Editions](/editions). -* `proto2` is an older version of the language. Despite being superseded by - proto3, proto2 is still fully supported. +* `proto3` is a legacy version of the language. We encourage new code to use + editions. + +* `proto2` is a legacy version of the language. Despite being superseded by + proto3 and editions, proto2 is still fully supported. * `proto1` is an obsolete version of the language. It was never released as open source. @@ -173,6 +177,42 @@ two ways: a particular Go binary can be set with an environment variable: `GOLANG_PROTOBUF_REGISTRATION_CONFLICT=warn ./main` +### How do I use protocol buffer editions? {#using-editions} + +To use a protobuf edition, you must specify the edition in your `.proto` file. +For example, to use the 2023 edition, add the following to the top of your +`.proto` file: + +```proto +edition = "2023"; +``` + +The protocol buffer compiler will then generate Go code that is compatible with +the specified edition. With editions, you can also enable or disable specific +features for your `.proto` file. For more information, see +[Protocol Buffer Editions](/editions/overview). + +### How do I control the behavior of my generated Go code? {#controlling-generated-code} + +With editions, you can control the behavior of the generated Go code by enabling +or disabling specific features in your `.proto` file. For example, to set the +API behavior for your implementation, you would add the following to your +`.proto` file: + +```proto +edition = "2023"; + +option features.(pb.go).api_level = API_OPAQUE; +``` + +When `api_level` is set to `API_OPAQUE`, the Go code generated by the protocol +buffer compiler hides struct fields so they can no longer be directly accessed. +Instead, new accessor methods are created for getting, setting, or clearing a +field. + +For a complete list of available features and their descriptions, see +[Features for Editions](/editions/features). + ### Why does `reflect.DeepEqual` behave unexpectedly with protobuf messages? {#deepequal} Generated protocol buffer message types include internal state which can vary diff --git a/content/reference/rust/_index.md b/content/reference/rust/_index.md index 0565f5da9..0520ea568 100644 --- a/content/reference/rust/_index.md +++ b/content/reference/rust/_index.md @@ -4,5 +4,4 @@ weight = 781 linkTitle = "Rust" description = "Reference documentation for working with protocol buffer classes in Rust." type = "docs" -toc_hide = "true" +++ diff --git a/content/reference/rust/building-rust-protos.md b/content/reference/rust/building-rust-protos.md index 56c8daf88..3b5478795 100644 --- a/content/reference/rust/building-rust-protos.md +++ b/content/reference/rust/building-rust-protos.md @@ -2,11 +2,18 @@ title = "Building Rust Protos" weight = 784 linkTitle = "Building Rust Protos" -description = "Describes using Blaze to build Rust protos." +description = "Describes how to build Rust protos using Cargo or Bazel." type = "docs" -toc_hide = "true" +++ +## Cargo + +See the +[protobuf-example](https://docs.rs/crate/protobuf-example/latest/source/) crate +for an example of how to set up your build. + +## Bazel + The process of building a Rust library for a Protobuf definition is similar to other programming languages: diff --git a/content/reference/rust/rust-design-decisions.md b/content/reference/rust/rust-design-decisions.md index fd57c24df..a26ecbb7e 100644 --- a/content/reference/rust/rust-design-decisions.md +++ b/content/reference/rust/rust-design-decisions.md @@ -4,7 +4,6 @@ weight = 785 linkTitle = "Design Decisions" description = "Explains some of the design choices that the Rust Proto implementation makes." type = "docs" -toc_hide = "true" +++ As with any library, Rust Protobuf is designed considering the needs of both @@ -48,9 +47,9 @@ Protobuf Rust currently supports three kernels: other languages. This is the default in open source builds where we expect static linking with code already using C++ Protobuf to be more rare. -The decision to support multiple non-Rust kernels significantly influences the -our public API decisions, including the types used on getters (discussed later -in this document). +The decision to support multiple non-Rust kernels significantly influences our +public API decisions, including the types used on getters (discussed later in +this document). ### No Pure Rust Kernel {#no-pure-rust} @@ -153,11 +152,11 @@ memory for those instances ahead of time. In some cases the Rust Protobuf API may choose to create our own types where a corresponding std type exists with the same name, where the current implementation may even simply wrap the std type, for example -`proto::UTF-8Error`. +`protobuf::UTF8Error`. Using these types rather than std types gives us more flexibility in optimizing the implementation in the future. While our current implementation uses the Rust -std UTF-8 validation today, by creating our own `proto::Utf8Error` type it +std UTF-8 validation today, by creating our own `protobuf::Utf8Error` type it enables us to change the implementation to use the highly optimized C++ implementation of UTF-8 validation that we use from C++ Protobuf which is faster than Rust's std UTF-8 validation. diff --git a/content/reference/rust/rust-generated.md b/content/reference/rust/rust-generated.md index 033bca5fa..50f39d762 100644 --- a/content/reference/rust/rust-generated.md +++ b/content/reference/rust/rust-generated.md @@ -2,7 +2,6 @@ title = "Rust Generated Code Guide" weight = 782 linkTitle = "Generated Code Guide" -toc_hide = true description = "Describes the API of message objects that the protocol buffer compiler generates for any given protocol definition." type = "docs" +++ @@ -48,14 +47,11 @@ Generated files: * C++ Lite kernel: * <same as C++ kernel> * UPB kernel - * `.u.pb.rs` - generated Rust code. \ - (However, `rust_proto_library` relies on the `.thunks.c` file produced - by `upb_proto_aspect`.) - -If the `proto_library` contains more than one file, the first file is declared a -"primary" file and is treated as the entry point for the crate; that file will -contain both the gencode corresponding to the `.proto` file, and also re-exports -for all symbols defined in the files corresponding to all "secondary" files. + * `.u.pb.rs` - generated Rust code. + +Each `proto_library` will also have a `generated.rs` file which is treated as +the entry point for the crate. That file will re-export the symbols from all the +other Rust files in the crate. ## Packages {#packages} diff --git a/content/reference/rust/rust-redaction.md b/content/reference/rust/rust-redaction.md index 8f430dcf7..a7100f053 100644 --- a/content/reference/rust/rust-redaction.md +++ b/content/reference/rust/rust-redaction.md @@ -4,7 +4,6 @@ weight = 783 linkTitle = "Redaction in Rust" description = "Describes redaction in Rust." type = "docs" -toc_hide = "true" +++