From 4d1a1e79d4cc3c49376ccce1f545ec7bcb27b04b Mon Sep 17 00:00:00 2001 From: Protocol Buffer Team Date: Fri, 25 Jul 2025 15:32:27 +0000 Subject: [PATCH 1/3] This documentation change includes the following: * Removed the API Best Practices topic * Added new entries to the Proto Best Practices topic * Fixed some method names in the Python tutorial * Added two news entries * Updated the Jan 23, 2025 News entry to reflect the canceled plans * Updated various topics to add Editions references * Updated the style guide to add some qualifying information about editions, including: * how `required` fields are handled * that groups are still supported * Simplified the Dart Generated Code topic by removing redundant code samples * Updated the Go Opaque Migration guide to reflect the current state * Removed a reference to "new" plugins, a feature added 15 years ago * Merged sections in several topics that referred to field presence using proto2/proto3/editions nomenclature to simplify them to use explicit/implicit * Updated information about oneofs in PHP * Updated the title to the proto3 spec to make it clearer what it refers to (Proto3 vs. "version 3") * Updated information about `ProtoString` in the Rust Generated Code topic PiperOrigin-RevId: 787122891 Change-Id: Ic0a3054e81fd3670ab08e3915cc348094bf89008 --- content/best-practices/_index.md | 1 - content/best-practices/api.md | 945 ------------------ content/best-practices/dos-donts.md | 47 +- content/editions/features.md | 469 +++++++-- content/editions/overview.md | 207 +++- content/getting-started/pythontutorial.md | 8 +- content/news/2025-01-23.md | 13 +- content/news/2025-07-14.md | 69 ++ content/news/2025-07-16.md | 16 + content/news/_index.md | 6 +- content/news/v31.md | 77 +- content/programming-guides/enum.md | 21 +- content/programming-guides/style.md | 20 +- content/reference/dart/dart-generated.md | 84 +- content/reference/go/go-generated-opaque.md | 71 +- content/reference/go/go-generated.md | 45 +- content/reference/go/opaque-migration.md | 30 +- content/reference/java/java-generated.md | 2 +- content/reference/kotlin/kotlin-generated.md | 81 +- .../objective-c/objective-c-generated.md | 79 +- content/reference/php/php-generated.md | 100 +- content/reference/protobuf/proto3-spec.md | 6 +- content/reference/python/python-generated.md | 132 ++- .../reference/rust/rust-design-decisions.md | 24 +- eng/doc/devguide/proto/ask_proto.md | 52 + eng/doc/devguide/proto/footer.md | 0 26 files changed, 1163 insertions(+), 1442 deletions(-) delete mode 100644 content/best-practices/api.md create mode 100644 content/news/2025-07-14.md create mode 100644 content/news/2025-07-16.md create mode 100644 eng/doc/devguide/proto/ask_proto.md create mode 100644 eng/doc/devguide/proto/footer.md diff --git a/content/best-practices/_index.md b/content/best-practices/_index.md index 2d7cb8a63..caf73bea8 100644 --- a/content/best-practices/_index.md +++ b/content/best-practices/_index.md @@ -10,6 +10,5 @@ Best practices content for defining and using protos exists in the following topics: * [Proto Best Practices](/best-practices/dos-donts) -* [API Best Practices](/best-practices/api) * [1-1-1 Rule](/best-practices/1-1-1) * [Avoid Cargo Culting](/best-practices/no-cargo-cults) diff --git a/content/best-practices/api.md b/content/best-practices/api.md deleted file mode 100644 index a18d7ede2..000000000 --- a/content/best-practices/api.md +++ /dev/null @@ -1,945 +0,0 @@ -+++ -title = "API Best Practices" -weight = 100 -description = "A future-proof API is surprisingly hard to get right. The suggestions in this document make trade-offs to favor long-term, bug-free evolution." -type = "docs" -aliases = "/programming-guides/api" -+++ - -This doc is a complement to -[Proto Best Practices](/best-practices/dos-donts). It's -not a prescription for Java/C++/Go and other APIs. - -{{% alert title="Note" color="note" %}} -These guidelines are just that and many have documented exceptions. For example, -if you're writing a performance-critical backend, you might want to sacrifice -flexibility or safety for speed. This -topic will help you better understand the trade-offs and make a decision that -works for your situation. {{% /alert %}} - -## Precisely, Concisely Document Most Fields and Messages {#precisely-concisely} - -Chances are good your proto will be inherited and used by people who don't know -what you were thinking when you wrote or modified it. Document each field in -terms that will be useful to a new team-member or client with little knowledge -of your system. - -Some concrete examples: - -```proto -// Bad: Option to enable Foo -// Good: Configuration controlling the behavior of the Foo feature. -message FeatureFooConfig { - // Bad: Sets whether the feature is enabled - // Good: Required field indicating whether the Foo feature - // is enabled for account_id. Must be false if account_id's - // FOO_OPTIN Gaia bit is not set. - optional bool enabled; -} - -// Bad: Foo object. -// Good: Client-facing representation of a Foo (what/foo) exposed in APIs. -message Foo { - // Bad: Title of the foo. - // Good: Indicates the user-supplied title of this Foo, with no - // normalization or escaping. - // An example title: "Picture of my cat in a box <3 <3 !!!" - optional string title [(max_length) = 512]; -} - -// Bad: Foo config. -// Less-Bad: If the most useful comment is re-stating the name, better to omit -// the comment. -FooConfig foo_config = 3; -``` - -Document the constraints, expectations and interpretation of each field in as -few words as possible. - -You can use custom proto annotations. -See [Custom Options](/programming-guides/proto2#options) -to define cross-language constants like `max_length` in the example above. -Supported in proto2 and proto3. - -Over time, documentation of an interface can get longer and longer. The length -takes away from the clarity. When the documentation is genuinely unclear, fix it, but -look at it holistically and aim for brevity. - -## Use Different Messages for Wire and Storage {#use-different-messages} - -If a top-level proto you expose to clients is the same one you store on disk, -you're headed for trouble. More and more binaries will depend on your API over -time, making it harder to change. You'll want the freedom to change your storage -format without impacting your clients. Layer your code so that modules deal -either with client protos, storage protos, or translation. - -Why? You might want to swap your underlying storage system. You might want to -normalize—or denormalize—data differently. You might realize that parts of your -client-exposed proto make sense to store in RAM while other parts make sense to -go on disk. - -When it comes to protos nested one or more levels within a top-level request or -response, the case for separating storage and wire protos isn't as strong, and -depends on how closely you're willing to couple your clients to those protos. - -There's a cost in maintaining the translation layer, but it quickly pays off -once you have clients and have to do your first storage changes. - -You might be tempted to share protos and diverge "when you need to." With a -perceived high cost to diverge and no clear place to put internal fields, your -API will accrue fields clients either don't understand or begin to depend on -without your knowledge. - -By starting with separate proto files, your team will know where to add internal -fields without polluting your API. In the early days, the wire proto can be -tag-for-tag identical with an automatic translation layer (think: byte copying -or proto reflection). Proto annotations can also power an automatic translation -layer. - -The following are exceptions to the rule: - -* If the proto field is one of a common type, such as `google.type` or - `google.protobuf`, then using that type both as storage and API is - acceptable. -* If your service is extremely performance-sensitive, it may be worth trading - flexibility for execution speed. If your service - doesn't have millions of QPS with millisecond latency, - you're probably not the exception. -* If all of the following are true: - - * your service *is* the storage system - * your system doesn't make decisions based on your clients' structured - data - * your system simply stores, loads, and perhaps provides queries at your - client's request - - Note that if you are implementing something like a logging system or a - proto-based wrapper around a generic storages system, then you probably want - to aim to have your clients' messages transit into your storage backend as - opaquely as possible so that you don't create a dependency nexus. Consider - using extensions or [Encode Opaque Data in Strings by Web-safe Encoding - Binary Proto - Serialization](/best-practices/api#encode-opaque-data-in-strings). - -## For Mutations, Support Partial Updates or Append-Only Updates, Not Full Replaces {#support-partial-updates} - -Don't make an `UpdateFooRequest` that only takes a `Foo`. - -If a client doesn't preserve unknown fields, they will not have the newest -fields of `GetFooResponse` leading to data loss on a round-trip. Some systems -don't preserve unknown fields. Proto2 and proto3 implementations do preserve -unknown fields unless the application drops the unknown fields explicitly. In -general, public APIs should drop unknown fields on server-side to prevent -security attack via unknown fields. For example, garbage unknown fields may -cause a server to fail when it starts to use them as new fields in the future. - -Absent documentation, handling of optional fields is ambiguous. Will `UpdateFoo` -clear the field? That leaves you open to data loss when the client doesn't know -about the field. Does it not touch a field? Then how can clients clear the -field? Neither are good. - -### Fix #1: Use an Update Field-mask {#use-update-field-mask} - -Have your client pass which fields it wants to modify and include only those -fields in the update request. Your server leaves other fields alone and updates -only those specified by the mask. -In general, the structure of your mask should mirror the -structure of the response proto; that is, if `Foo` contains `Bar`, `FooMask` -contains `BarMask`. - -### Fix #2: Expose More Narrow Mutations That Change Individual Pieces {#expose-more-narrow-mutations} - -For example, instead of `UpdateEmployeeRequest`, you might have: -`PromoteEmployeeRequest`, `SetEmployeePayRequest`, `TransferEmployeeRequest`, -etc. - -Custom update methods are easier to monitor, audit, and secure than a very -flexible update method. They're also easier to implement and call. A *large* -number of them can increase the cognitive load of an API. - -## Don't Include Primitive Types in a Top-level Request or Response Proto {#dont-include-primitive-types} - -Many of the pitfalls described elsewhere in this doc are solved with this rule. -For example: - -Telling clients that a repeated field is unset in storage versus not-populated -in this particular call can be done by wrapping the repeated field in a message. - -Common request options that are shared between requests naturally fall out of -following this rule. Read and write field masks fall out of this. - -Your top-level proto should almost always be a container for other messages that -can grow independently. - -Even when you only need a single primitive type today, having it wrapped in a -message gives you a clear path to expand that type and share the type among -other methods that return the similar values. For example: - -```proto -message MultiplicationResponse { - // Bad: What if you later want to return complex numbers and have an - // AdditionResponse that returns the same multi-field type? - optional double result; - - - // Good: Other methods can share this type and it can grow as your - // service adds new features (units, confidence intervals, etc.). - optional NumericResult result; -} - -message NumericResult { - optional double real_value; - optional double complex_value; - optional UnitType units; -} -``` - -One exception to top-level primitives: Opaque strings (or bytes) that encode a -proto but are only built and parsed on the server. Continuation tokens, version -info tokens and IDs can all be returned as strings *if* the string is actually -an encoding of a structured proto. - -## Never Use Booleans for Something That Has Two States Now, but Might Have More Later {#never-use-booleans-for-two-states} - -If you are using boolean for a field, make sure that the field is indeed -describing just two possible states (for all time, not just now and the near -future). Often, the flexibility of an enum, int, or message turns out to be -worth it. - -For example, in returning a stream of posts a developer may need to indicate -whether a post should be rendered in two-columns or not based on the current -mocks from UX. Even though a boolean is all that's needed today, nothing -prevents UX from introducing two-row posts, three-column posts or four-square -posts in a future version. - -```proto -message GooglePlusPost { - // Bad: Whether to render this post across two columns. - optional bool big_post; - - // Good: Rendering hints for clients displaying this post. - // Clients should use this to decide how prominently to render this - // post. If absent, assume a default rendering. - optional LayoutConfig layout_config; -} - -message Photo { - // Bad: True if it's a GIF. - optional bool gif; - - // Good: File format of the referenced photo (for example, GIF, WebP, PNG). - optional PhotoType type; -} -``` - -Be cautious about adding states to an enum that -conflates concepts. - -If a state introduces a new dimension to the enum or implies multiple -application behaviors, you almost certainly want another field. - -## Rarely Use an Integer Field for an ID {#integer-field-for-id} - -It's tempting to use an int64 as an identifier for an object. Opt instead for a -string. - -This lets you change your ID space if you need to and reduces the chance of -collisions. 2^64 isn't as big as it used to be. - -You can also encode a structured identifier as a string which encourages clients -to treat it as an opaque blob. You still must have a proto backing the string, -but you can serialize the proto to a string field (encoded as web-safe Base64) -which removes any of the internal details from the client-exposed API. In this -case follow the guidelines [below](#encode-opaque-data-in-strings). - -```proto -message GetFooRequest { - // Which Foo to fetch. - optional string foo_id; -} - -// Serialized and websafe-base64-encoded into the GetFooRequest.foo_id field. -message InternalFooRef { - // Only one of these two is set. Foos that have already been - // migrated use the spanner_foo_id and Foos still living in - // Caribou Storage Server have a classic_foo_id. - optional bytes spanner_foo_id; - optional int64 classic_foo_id; -} -``` - -If you start off with your own serialization scheme to represent your IDs as -strings, things can get weird quickly. That's why it's often best to start -with an internal proto that backs your string field. - -## Don’t Encode Data in a String That You Expect a Client to Construct or Parse {#dont-encode-data-in-a-string} - -It's less efficient over the wire, more work for the consumer of the proto, and -confusing for someone reading your documentation. Your clients also have to -wonder about the encoding: Are lists comma-separated? Did I escape this -untrusted data correctly? Are numbers base-10? Better to have clients send an -actual message or primitive type. It's more compact over the wire and clearer -for your clients. - -This gets especially bad when your service acquires clients in several -languages. Now each will have to choose the right parser or builder—or -worse—write one. - -More generally, choose the right primitive type. See the Scalar Value Types -table in the -[Protocol Buffer Language Guide](/programming-guides/proto2#scalar). - -### Don't Return HTML in a Front-End Proto {#returning-html} - -With a JavaScript client, it's tempting to return HTML or -JSON in a field of your API. This is a slippery -slope towards tying your API to a specific UI. Here are three concrete dangers: - -* A "scrappy" non-web client will end up parsing your HTML or JSON to get the - data they want leading to fragility if you change formats and - vulnerabilities if their parsing is bad. -* Your web-client is now vulnerable to an XSS exploit if that HTML is ever - returned unsanitized. -* The tags and classes you're returning expect a particular style-sheet and - DOM structure. From release to release, that structure will change, and you - risk a version-skew problem where the JavaScript client is older than the - server and the HTML the server returns no longer renders properly on old - clients. For projects that release often, this is not an edge case. - -Other than the initial page load, it's usually better to return data and use -client-side templating to construct HTML on the client -. - -## Encode Opaque Data in Strings by Web-Safe Encoding Binary Proto Serialization {#encode-opaque-data-in-strings} - -If you do encode *opaque* data in a client-visible field (continuation tokens, -serialized IDs, version infos, and so on), document that clients should treat it -as an opaque blob. *Always use binary proto serialization, never text-format or -something of your own devising for these fields.* When you need to expand the -data encoded in an opaque field, you'll find yourself reinventing protocol -buffer serialization if you're not already using it. - -Define an internal proto to hold the fields that will go in the opaque field -(even if you only need one field), serialize this internal proto to bytes then -web-safe base-64 encode the result into your string field -. - -One rare exception to using proto serialization: *Very* occasionally, the -compactness wins from a carefully constructed alternative format are worth it. - -## Don't Include Fields that Your Clients Can't Possibly Have a Use for {#dont-include-fields} - -The API you expose to clients should only be for describing how to interact with -your system. Including anything else in it adds cognitive overhead to someone -trying to understand it. - -Returning debug data in response protos used to be a common practice, but we -have a better way. RPC response extensions (also called "side -channels") let you describe your client interface with one proto and your -debugging surface with another. - -Similarly, returning experiment names in response protos used to be a logging -convenience--the unwritten contract was the client would send those experiments -back on subsequent actions. The accepted way of accomplishing the same is to do -log joining in the analysis pipeline. - -One exception: - -If you need continuous, real-time analytics *and* are on a small machine budget, -running log joins might be prohibitive. -In cases where cost is a deciding factor, -denormalizing log data ahead of time can be a win. If you need log data -round-tripped to you, send it to clients as an opaque blob and document the -request and response fields. - -**Caution:** If you need to return or round-trip hidden data on *every* request -, you're hiding the true cost of using your service -and that's not good either. - -## *Rarely* Define a Pagination API Without a Continuation Token {#define-pagination-api} - -```proto -message FooQuery { - // Bad: If the data changes between the first query and second, each of - // these strategies can cause you to miss results. In an eventually - // consistent world (that is, storage backed by Bigtable), it's not uncommon - // to have old data appear after the new data. Also, the offset- and - // page-based approaches all assume a sort-order, taking away some - // flexibility. - optional int64 max_timestamp_ms; - optional int32 result_offset; - optional int32 page_number; - optional int32 page_size; - - // Good: You've got flexibility! Return this in a FooQueryResponse and - // have clients pass it back on the next query. - optional string next_page_token; -} -``` - -The best practice for a pagination API is to use an opaque continuation token -(called next_page_token ) backed by an internal proto that you -serialize and then `WebSafeBase64Escape` (C++) or `BaseEncoding.base64Url().encode` (Java). That internal proto could include many fields. -The important thing is it buys you flexibility and--if you choose--it can buy -your clients stability in the results. - -Do not forget to validate the fields of this proto as untrustworthy inputs (see -note in [Encode opaque data in strings](#encode-opaque-data-in-strings)). - -```proto -message InternalPaginationToken { - // Track which IDs have been seen so far. This gives perfect recall at the - // expense of a larger continuation token--especially as the user pages - // back. - repeated FooRef seen_ids; - - // Similar to the seen_ids strategy, but puts the seen_ids in a Bloom filter - // to save bytes and sacrifice some precision. - optional bytes bloom_filter; - - // A reasonable first cut and it may work for longer. Having it embedded in - // a continuation token lets you change it later without affecting clients. - optional int64 max_timestamp_ms; -} -``` - -## Group Related Fields into a new `message`. Nest Only Fields with High Cohesion {#group-related-fields} - -```proto -message Foo { - // Bad: The price and currency of this Foo. - optional int price; - optional CurrencyType currency; - - // Better: Encapsulates the price and currency of this Foo. - optional CurrencyAmount price; -} -``` - -Only fields with high cohesion should be -nested. If the fields are genuinely related, you'll often want to pass them -around together inside a server. That's easier if they're defined together in a -message. Think: - -```java -CurrencyAmount calculateLocalTax(CurrencyAmount price, Location where) -``` - -If your CL introduces one field, but that field might have related fields later, -preemptively put it in its own message to avoid this: - -```proto -message Foo { - // DEPRECATED! Use currency_amount. - optional int price [deprecated = true]; - - // The price and currency of this Foo. - optional google.type.Money currency_amount; -} -``` - -The problem with a nested message is that while `CurrencyAmount` might be a -popular candidate for reuse in other places of your API, `Foo.CurrencyAmount` -might not. In the worst case, `Foo.CurrencyAmount` *is* reused, but -`Foo`-specific fields leak into it. - -While [loose coupling](https://en.wikipedia.org/wiki/Loose_coupling) -is generally accepted as a best practice when developing systems, that practice -may not always apply when designing `.proto` files. There may be cases in which -tightly coupling two units of information (by nesting one unit inside of the -other) may make sense. For example, if you are creating a set of fields that -appear fairly generic right now but which you anticipate adding specialized -fields into at a later time, nesting the message would dissuade others from -referencing that message from elsewhere in this or other `.proto` files. - -```proto -message Photo { - // Bad: It's likely PhotoMetadata will be reused outside the scope of Photo, - // so it's probably a good idea not to nest it and make it easier to access. - message PhotoMetadata { - optional int32 width = 1; - optional int32 height = 2; - } - optional PhotoMetadata metadata = 1; -} - -message FooConfiguration { - // Good: Reusing FooConfiguration.Rule outside the scope of FooConfiguration - // tightly-couples it with likely unrelated components, nesting it dissuades - // from doing that. - message Rule { - optional float multiplier = 1; - } - repeated Rule rules = 1; -} -``` - -## Include a Field Read Mask in Read Requests {#include-field-read-mask} - -```proto -// Recommended: use google.protobuf.FieldMask - -// Alternative one: -message FooReadMask { - optional bool return_field1; - optional bool return_field2; -} - -// Alternative two: -message BarReadMask { - // Tag numbers of the fields in Bar to return. - repeated int32 fields_to_return; -} -``` - -If you use the recommended `google.protobuf.FieldMask`, you can use the -`FieldMaskUtil` -([Java](/reference/java/api-docs/com/google/protobuf/util/FieldMaskUtil.html)/[C++](/reference/cpp/api-docs/google.protobuf.util.field_mask_util.md)) -libraries to automatically filter a proto. - -Read masks set clear expectations on the client side, give them control of how -much data they want back and allow the backend to only fetch data the client -needs. - -The acceptable alternative is to always populate every field; that is, treat the -request as if there were an implicit read mask with all fields set to true. This -can get costly as your proto grows. - -The worst failure mode is to have an implicit (undeclared) read mask that varies -depending on which method populated the message. This anti-pattern leads to -apparent data loss on clients that build a local cache from response protos. - -## Include a Version Field to Allow for Consistent Reads {#include-version-field} - -When a client does a write followed by a read of the same object, they expect to -get back what they wrote--even when the expectation isn't reasonable for the -underlying storage system. - -Your server will read the local value and if the local version_info is less than -the expected version_info, it will read from remote replicas to find the latest -value. Typically version_info is a -[proto encoded as a string](#encode-opaque-data-in-strings) that includes the -datacenter the mutation went to and the timestamp at which it was committed. - -Even systems backed by consistent storage often want a token to trigger the more -expensive read-consistent path rather than incurring the cost on every read. - -## Use Consistent Request Options for RPCs that Return the Same Data Type {#use-consistent-request-options} - -An example failure pattern is the request options for -a service in which each RPC returns the same -data type, but has separate request options for specifying things like maximum -comments, embeds supported types list, and so on. - -The cost of approaching this ad hoc is increased complexity on the client from -figuring out how to fill out each request and increased complexity on the server -transforming the N request options into a common internal one. A -not-small number of real-life bugs are traceable to -this example. - -Instead, create a single, separate message to hold request options and include -that in each of the top-level request messages. Here's a better-practices -example: - -```proto -message FooRequestOptions { - // Field-level read mask of which fields to return. Only fields that - // were requested will be returned in the response. Clients should only - // ask for fields they need to help the backend optimize requests. - optional FooReadMask read_mask; - - // Up to this many comments will be returned on each Foo in the response. - // Comments that are marked as spam don't count towards the maximum - // comments. By default, no comments are returned. - optional int max_comments_to_return; - - // Foos that include embeds that are not on this supported types list will - // have the embeds down-converted to an embed specified in this list. If no - // supported types list is specified, no embeds will be returned. If an embed - // can't be down-converted to one of the supplied supported types, no embed - // will be returned. Clients are strongly encouraged to always include at - // least the THING_V2 embed type from EmbedTypes.proto. - repeated EmbedType embed_supported_types_list; -} - -message GetFooRequest { - // What Foo to read. If the viewer doesn't have access to the Foo or the - // Foo has been deleted, the response will be empty but will succeed. - optional string foo_id; - - // Clients are required to include this field. Server returns - // INVALID_ARGUMENT if FooRequestOptions is left empty. - optional FooRequestOptions params; -} - -message ListFooRequest { - // Which Foos to return. Searches have 100% recall, but more clauses - // impact performance. - optional FooQuery query; - - // Clients are required to include this field. The server returns - // INVALID_ARGUMENT if FooRequestOptions is left empty. - optional FooRequestOptions params; -} -``` - -## Batch/multi-phase Requests {#batch-multi-phase-requests} - -Where possible, make mutations atomic. Even more important, make -[mutations idempotent](#prefer-idempotency). A full retry of a partial failure -shouldn't corrupt/duplicate data. - -Occasionally, you'll need a single RPC that encapsulates multiple operations for -performance reasons. What to do on a partial failure? If some succeeded and some -failed, it's best to let clients know. - -Consider setting the RPC as failed and return details of both the successes and failures in an RPC status proto. - -In general, you want clients who are unaware of your handling of partial -failures to still behave correctly and clients who are aware to get extra value. - -## Create Methods that Return or Manipulate Small Bits of Data and Expect Clients to Compose UIs from Batching Multiple Such Requests {#create-methods-manipulate-small-bits} - -The ability to query many narrowly specified bits of data in a single round-trip -allows a wider range of UX options without server changes by letting the client -compose what they need. - -This is most relevant for front-end and middle-tier servers. - -Many services expose their own batching API. - -## Make a One-off RPC when the Alternative is Serial Round-trips on Mobile or Web {#make-one-off-rpc} - -In cases where a *web or mobile* client needs to make two queries with a data -dependency between them, the current best practice is to create a new RPC that -protects the client from the round trip. - -In the case of mobile, it's almost always worth saving your client the cost of -an extra round-trip by bundling the two service methods together in one new one. -For server-to-server calls, the case may not be as clear; it depends on how -performance-sensitive your service is and how much cognitive overhead the new -method introduces. - -## Make Repeated Fields Messages, Not Scalars or Enums {#repeated-fields-messages-scalar-types} - -A common evolution is that a single repeated field needs to become multiple -related repeated fields. If you start with a repeated primitive your options are -limited--you either create parallel repeated fields, or define a new repeated -field with a new message that holds the values and migrate clients to it. - -If you start with a repeated message, evolution becomes trivial. - -```proto -// Describes a type of enhancement applied to a photo -enum EnhancementType { - ENHANCEMENT_TYPE_UNSPECIFIED; - RED_EYE_REDUCTION; - SKIN_SOFTENING; -} - -message PhotoEnhancement { - optional EnhancementType type; -} - -message PhotoEnhancementReply { - // Good: PhotoEnhancement can grow to describe enhancements that require - // more fields than just an enum. - repeated PhotoEnhancement enhancements; - - // Bad: If we ever want to return parameters associated with the - // enhancement, we'd have to introduce a parallel array (terrible) or - // deprecate this field and introduce a repeated message. - repeated EnhancementType enhancement_types; -} -``` - -Imagine the following feature request: "We need to know which enhancements were -performed by the user and which enhancements were automatically applied by the -system." - -If the enhancement field in `PhotoEnhancementReply` were a scalar or enum, this -would be much harder to support. - -This applies equally to maps. It is much easier to add additional fields to a -map value if it's already a message rather than having to migrate from -`map` to `map`. - -One exception: - -Latency-critical applications will find parallel arrays of primitive types are -faster to construct and delete than a single array of messages; they can also be -smaller over the wire if you use -[[packed=true]](/programming-guides/encoding#packed) -(eliding field tags). Allocating a fixed number of arrays is less work than -allocating N messages. Bonus: in -[Proto3](/programming-guides/proto3), packing is -automatic; you don't need to explicitly specify it. - -## Use Proto Maps {#use-proto-maps} - -Prior to the introduction in -[Proto3](/programming-guides/proto3) of -[Proto3 maps](/programming-guides/proto3#maps), services -would sometimes expose data as pairs using an ad-hoc KVPair message with scalar -fields. Eventually clients would need a deeper structure and would end up -devising keys or values that need to be parsed in some way. See -[Don't encode data in a string](#dont-encode-data-in-a-string). - -So, using a (extensible) message type for the value is an immediate improvement -over the naive design. - -Maps were back-ported to proto2 in all languages, so using `map` is better than inventing your own KVPair for the same purpose[^3]. - -[^3]: A gotcha with protos that contain `map` fields. Don't use them as - reduce keys in a MapReduce. The wire format and iteration order of proto3 - map items are *unspecified* which leads to inconsistent map shards. - -If you want to represent *arbitrary* data whose structure you don't know ahead -of time, use -[`google.protobuf.Any`](/reference/protobuf/textformat-spec#any). - -## Prefer Idempotency {#prefer-idempotency} - -Somewhere in the stack above you, a client may have retry logic. If the retry is -a mutation, the user could be in for a surprise. Duplicate comments, build -requests, edits, and so on aren't good for anyone. - -A simple way to avoid duplicate writes is to allow clients to specify a -client-created request ID that your server dedupes on (for example, hash of -content or UUID). - -## Be Mindful of Your Service Name, and Make it Globally Unique {#service-name-globally-unique} - -A service name (that is, the part after the `service` keyword in your `.proto` -file) is used in surprisingly many places, not just to generate the service -class name. This makes this name more -important than one might think. - -What's tricky is that these tools make the implicit assumption that your service -name is unique across a network . Worse, the service name they use is the -*unqualified* service name (for example, `MyService`), not the qualified service -name (for example, `my_package.MyService`). - -For this reason, it makes sense to take steps to prevent naming conflicts on -your service name, even if it is defined inside a specific package. For example, -a service named `Watcher` is likely to cause problems; something like -`MyProjectWatcher` would be better. - -## Bound Request and Response Sizes {#bound-req-res-sizes} - -Request and response sizes should be bounded. -We recommend a bound in the ballpark of 8 MiB, and 2 -GiB is a hard limit at which many proto implementations break -. Many storage systems have a limit -on message sizes . - -Also, unbounded messages - -- bloat both client and server, -- cause high and unpredictable latency, -- decrease resiliency by relying on a long-lived connection between a single - client and a single server. - -Here are a few ways to bound all messages in an API: - -- Define RPCs that return bounded messages, where each RPC call is logically - independent from the others. -- Define RPCs that operate on a single object, instead of on an unbounded, - client-specified list of objects. -- Avoid encoding unbounded data in string, byte, or repeated fields. -- Define a long-running operation . Store the result in a - storage system designed for scalable, concurrent reads - . -- Use a pagination API (see - [Rarely define a pagination API without a continuation token](#define-pagination-api)). -- Use streaming RPCs. - -If you are working on a UI, see also -[Create methods that return or manipulate small bits of data](#create-methods-manipulate-small-bits). - -## Propagate Status Codes Carefully {#propagate-status-codes} - -RPC services should take care at RPC boundaries to interrogate errors, and -return meaningful status errors to their callers. - -Let's examine a toy example to illustrate the point: - -Consider a client that calls `ProductService.GetProducts`, which takes no -arguments. As part of `GetProducts`, `ProductService` might get all the -products, and call `LocaleService.LocaliseNutritionFacts` for each product. - -```dot -digraph toy_example { - node [style=filled] - client [label="Client"]; - product [label="ProductService"]; - locale [label="LocaleService"]; - client -> product [label="GetProducts"] - product -> locale [label="LocaliseNutritionFacts"] -} -``` - -If `ProductService` is incorrectly implemented, it might send the wrong -arguments to `LocaleService`, resulting in an `INVALID_ARGUMENT`. - -If `ProductService` carelessly returns errors to its callers, the client will -receive `INVALID_ARGUMENT`, since status codes propagate across RPC boundaries. -But, the client didn't pass any arguments to `ProductService.GetProducts`. So, -the error is worse than useless: it will cause a great deal of confusion! - -Instead, `ProductService` should interrogate errors it receives at the RPC -boundary; that is, the `ProductService` RPC handler it implements. It should -return meaningful errors to users: if *it received* invalid arguments from the -caller, it should return `INVALID_ARGUMENT`. If *something downstream* received -invalid arguments, it should convert the `INVALID_ARGUMENT` to `INTERNAL` before -returning the error to the caller. - -Carelessly propagating status errors leads to confusion, which can be very -expensive to debug. Worse, it can lead to an invisible outage where every -service forwards a client error without causing -any alerts to happen . - -The general rule is: at RPC boundaries, take care to interrogate errors, and -return meaningful status errors to callers, with appropriate status codes. To -convey meaning, each RPC method should document what error codes it returns in -which circumstances. The implementation of each method should conform to the -documented API contract. - -## Create Unique Protos per Method {#unique-protos} - -Create a unique request and response proto for each RPC method. Discovering -later that you need to diverge the top-level request or response can be -expensive. This includes "empty" responses; create a unique empty response proto -rather than reusing the [well-known Empty message type](https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/empty.proto). - -### Reusing Messages {#reuse-messages} - -To reuse messages, create shared "domain" message types to include in multiple -Request and Response protos. Write your application logic in terms of those -types rather than the request and response types. - -This gives you the flexibility to evolve your method request/response types -independently, but share code for logical sub-units. - -## Appendix {#appendix} - -### Returning Repeated Fields {#returning-repeated-fields} - -When a repeated field is empty, the client can't tell if the field just wasn't -populated by the server or if the backing data for the field is genuinely empty. -In other words, there's no `hasFoo` method for repeated fields. - -Wrapping a repeated field in a message is an easy way to get a hasFoo method. - -```proto -message FooList { - repeated Foo foos; -} -``` - -The more holistic way to solve it is with a field -[read mask](#include-field-read-mask). If the field was requested, an empty list -means there's no data. If the field wasn't requested the client should ignore -the field in the response. - -### Updating Repeated Fields {#updating-repeated-fields} - -The worst way to update a repeated field is to force the client to supply a -replacement list. The dangers with forcing the client to supply the entire array -are manyfold. Clients that don't preserve unknown fields cause data loss. -Concurrent writes cause data loss. Even if those problems don't apply, your -clients will need to carefully read your documentation to know how the field is -interpreted on the server side. Does an empty field mean the server won't update -it, or that the server will clear it? - -**Fix #1**: Use a repeated update mask that permits the client to replace, -delete, or insert elements into the array without supplying the entire array on -a write. - -**Fix #2**: Create separate append, replace, delete arrays in the -request proto. - -**Fix #3**: Allow only appending or clearing. You can do this by wrapping the -repeated field in a message. A present, but empty, message means clear, -otherwise, any repeated elements mean append. - -### Order Independence in Repeated Fields {#order-independence-repeated-fields} - -*Try* to avoid order dependence in general. It's an extra layer of fragility. An -especially bad type of order dependence is parallel arrays. Parallel arrays make -it more difficult for clients to interpret the results and make it unnatural to -pass the two related fields around inside your own service. - -```proto -message BatchEquationSolverResponse { - // Bad: Solved values are returned in the order of the equations given in - // the request. - repeated double solved_values; - // (Usually) Bad: Parallel array for solved_values. - repeated double solved_complex_values; -} - -// Good: A separate message that can grow to include more fields and be -// shared among other methods. No order dependence between request and -// response, no order dependence between multiple repeated fields. -message BatchEquationSolverResponse { - // Deprecated, this will continue to be populated in responses until Q2 - // 2014, after which clients must move to using the solutions field below. - repeated double solved_values [deprecated = true]; - - // Good: Each equation in the request has a unique identifier that's - // included in the EquationSolution below so that the solutions can be - // correlated with the equations themselves. Equations are solved in - // parallel and as the solutions are made they are added to this array. - repeated EquationSolution solutions; -} -``` - -### Leaking Features Because Your Proto is in a Mobile Build {#leaking-features} - -Android and iOS runtimes both support reflection. To do that, the unfiltered -names of fields and messages are embedded in the application binary -(APK, IPA) as strings. - -```proto -message Foo { - // This will leak existence of Google Teleport project on Android and iOS - optional FeatureStatus google_teleport_enabled; -} -``` - -Several mitigation strategies: - -* ProGuard obfuscation on Android. As of Q3 2014. iOS has no obfuscation - option: once you have the IPA on a desktop, piping it through `strings` will - reveal field names of included protos. - [iOS Chrome tear-down](https://github.com/Bensge/Chrome-for-iOS-Headers) -* Curate precisely which fields are sent to mobile clients - . -* If plugging the leak isn't feasible on an acceptable timescale, get buy-in - from the feature owner to risk it. - -*Never* use this as an excuse to obfuscate the meaning of a field with a -code-name. Either plug the leak or get buy-in to risk it. - -### Performance Optimizations {#performance-optimizations} - -You can trade type safety or clarity for performance wins in some cases. For -example, a proto with hundreds of fields--particularly message-type fields--is -going to be slower to parse than one with fewer fields. A very deeply-nested -message can be slow to deserialize just from the memory management. A handful of -techniques teams have used to speed deserialization: - -* Create a parallel, trimmed proto that mirrors the larger proto but has only - some of the tags declared. Use this for parsing when you don't need all the - fields. Add tests to enforce that tag numbers continue to match as the - trimmed proto accumulates numbering "holes." -* Annotate the fields as "lazily parsed" with - [[lazy=true]](https://github.com/protocolbuffers/protobuf/blob/cacb096002994000f8ccc6d9b8e1b5b0783ee561/src/google/protobuf/descriptor.proto#L609). -* Declare a field as bytes and document its type. Clients who care to parse - the field can do so manually. The danger with this approach is there's - nothing preventing someone from putting a message of the wrong type in the - bytes field. You should never do this with a proto that's written to any - logs, as it prevents the proto from being vetted for PII or scrubbed for - policy or privacy reasons. diff --git a/content/best-practices/dos-donts.md b/content/best-practices/dos-donts.md index ced1c2c0c..024deed36 100644 --- a/content/best-practices/dos-donts.md +++ b/content/best-practices/dos-donts.md @@ -103,10 +103,10 @@ there is a hard limit on the size of a method ## **Do** Include an Unspecified Value in an Enum {#unspecified-enum} Enums should include a default `FOO_UNSPECIFIED` value as the first value in the -declaration . When new values -are added to a proto2 enum, old clients will see the field as unset and the -getter will return the default value or the first-declared value if no default -exists . For consistent behavior with [proto enums][proto-enums], +declaration. +When new values are added to an enum, old clients will see the field as unset +and the getter will return the default value or the first-declared value if no +default exists . For consistent behavior with [proto enums][proto-enums], the first declared enum value should be a default `FOO_UNSPECIFIED` value and should use tag 0. It may be tempting to declare this default as a semantically meaningful value but as a general rule, do not, to aid in the evolution of your @@ -277,6 +277,37 @@ example, see You should also avoid using keywords in your file paths, as this can also cause problems. +## **Do** Use Different Messages For RPC APIs and Storage {#separate-types-for-storage} + +Reusing the same messages for APIs and long-term storage may seem convenient, +reducing boilerplate and overhead of coversion between messages. + +However, the needs of long-term storage and live RPC services tend to later +diverge. Using separate types even if they are largely duplicative initially +gives freedom to change your storage format without impacting your external +clients. Layer your code so that modules deal either with client protos, storage +protos, or translation. + +There is a cost in maintaining the translation layer, but it quickly pays off +once you have clients and have to do your first storage changes. + +## **Don't** Use Booleans for Something That Has Two States Now, but Might Have More Later {#bad-bools} + +If you are using boolean for a field, make sure that the field is indeed +describing just two possible states (for all time, not just now and the near +future). The future flexibility of using an enum is often worth it, even if it +only has two values when it is first introduced. + +``` +message Photo { + // Bad: True if it's a GIF. + optional bool gif; + + // Good: File format of the referenced photo (for example, GIF, WebP, PNG). + optional PhotoType type; +} +``` + ## **Do** Use java_outer_classname {#java-outer-classname} Every proto schema definition file should set option `java_outer_classname` to @@ -286,11 +317,3 @@ the file `student_record_request.proto` should set: ```java option java_outer_classname = "StudentRecordRequestProto"; ``` - -## Appendix {#appendix} - -### API Best Practices {#api-best-practices} - -This document lists only changes that are extremely likely to cause breakage. -For higher-level guidance on how to craft proto APIs that grow gracefully see -[API Best Practices](/best-practices/api). diff --git a/content/editions/features.md b/content/editions/features.md index 748b0e020..29811fa9a 100644 --- a/content/editions/features.md +++ b/content/editions/features.md @@ -5,9 +5,10 @@ description = "Protobuf Editions features and how they affect protobuf behavior. type = "docs" +++ -This topic provides an overview of the features that are included in Edition -2023. Each subsequent edition's features will be added to this topic. We -announce new editions in [the News section](/news). +This topic provides an overview of the features that are included in the +released edition versions. Subsequent editions' features will be added to this +topic. We announce new editions in +[the News section](/news). Before configuring feature settings in your new schema definition content, make sure you understand why you are using them. Avoid @@ -16,17 +17,17 @@ features. ## Prototiller {#prototiller} -Prototiller is a command-line tool that converts proto2 and proto3 definition -files to Editions syntax. It hasn't been released yet, but is referenced -throughout this topic. +Prototiller is a command-line tool that updates proto schema configuration files +between syntax versions and editions. It hasn't been released yet, but is +referenced throughout this topic. ## Features {#features} The following sections include all of the behaviors that are configurable using -features in Edition 2023. [Preserving proto2 or proto3 Behavior](#preserving) -shows how to override the default behaviors so that your proto definition files -act like proto2 or proto3 files. For more information on how Editions and -Features work together to set behavior, see +features in editions. [Preserving proto2 or proto3 Behavior](#preserving) shows +how to override the default behaviors so that your proto definition files act +like proto2 or proto3 files. For more information on how editions and features +work together to set behavior, see [Protobuf Editions Overview](/editions/overview). Feature settings apply at different levels: @@ -49,7 +50,7 @@ can be applied to. The following sample shows a mock feature applied to each scope: ```proto -edition = "2023"; +edition = "2024"; // File-level scope definition option features.bar = BAZ; @@ -81,6 +82,137 @@ In this example, the setting "`GRAULT"` in the lowest-level scope feature definition overrides the non-nested-scope "`QUUX`" setting. And within the Garply message, "`WALDO`" overrides "`QUUX`." +### `features.default_symbol_visibility` {#symbol-vis} + +This feature enables setting the default visibility for messages and enums, +making them available or unavailable when imported by other protos. Use of this +feature will reduce dead symbols in order to create smaller binaries. + +In addition to setting the defaults for the entire file, you can use the `local` +and `export` keywords to set per-field behavior. Read more about this at +[`export` / `local` Keywords](/editions/overview#export-local). + +**Values available:** + +* `EXPORT_ALL`: This is the default prior to Edition 2024. All messages and + enums are exported by default. +* `EXPORT_TOP_LEVEL`: All top-level symbols default to export; nested default + to local. +* `LOCAL_ALL`: All symbols default to local. +* `STRICT`: All symbols local by default. Nested types cannot be exported, + except for a special-case caveat for message `{ enum {} reserved 1 to max; + }`. This is the recommended setting for new protos. + +**Applicable to the following scope:** Enum, Message + +**Added in:** Edition 2024 + +**Default behavior per syntax/edition:** + +Syntax/edition | Default +-------------- | ------------------ +2024 | `EXPORT_TOP_LEVEL` +2023 | `EXPORT_ALL` +proto3 | `EXPORT_ALL` +proto2 | `EXPORT_ALL` + +**Note:** Feature settings on different schema elements +[have different scopes](#cascading). + +The following sample shows how you can apply the feature to elements in your +proto schema definition files: + +```proto +// foo.proto +edition = "2024"; + +// Symbol visibility defaults to EXPORT_TOP_LEVEL. Setting +// default_symbol_visibility overrides these defaults +option features.default_symbol_visibility = LOCAL_ALL; + +// Top-level symbols are exported by default in Edition 2024; applying the local +// keyword overrides this +export message LocalMessage { + int32 baz = 1; + // Nested symbols are local by default in Edition 2024; applying the export + // keyword overrides this + enum ExportedNestedEnum { + UNKNOWN_EXPORTED_NESTED_ENUM_VALUE = 0; + } +} + +// bar.proto +edition = "2024"; + +import "foo.proto"; + +message ImportedMessage { + // The following is valid because the imported message explicitly overrides + // the visibility setting in foo.proto + LocalMessage bar = 1; + + // The following is not valid because default_symbol_visibility is set to + // `LOCAL_ALL` + // LocalMessage.ExportedNestedEnum qux = 2; +} +``` + +### `features.enforce_naming_style` {#enforce-naming} + +Introduced in Edition 2024, this feature enables strict naming style enforcement +as defined in +[the style guide](/programming-guides/style) to ensure +protos are round-trippable by default with a feature value to opt-out to use + +**Values available:** + +* `STYLE2024`: Enforces strict adherence to the style guide for naming. +* `STYLE_LEGACY`: Applies the pre-Edition 2024 level of style guide + enforcement. + +**Applicable to the following scope:** File + +**Added in:** 2024 + +**Default behavior per syntax/edition:** + +Syntax/edition | Default +-------------- | -------------- +2024 | `STYLE2024` +2023 | `STYLE_LEGACY` +proto3 | `STYLE_LEGACY` +proto2 | `STYLE_LEGACY` + +**Note:** Feature settings on different schema elements +[have different scopes](#cascading). + +The following code sample shows an Edition 2023 file: + +Edition 2023 defaults to `STYLE_LEGACY`, so a non-conformant field name is fine: + +```proto +edition = "2023"; + +message Foo { + // A non-conforming field name is not a problem + int64 bar_1 = 1; +} +``` + +Edition 2025 defaults to `STYLE2024`, so an override is needed to keep the +non-conformant field name: + +```proto +edition = "2024"; + +// To keep the non-conformant field name, override the STYLE2024 setting +option features.enforce_naming_style = "STYLE_LEGACY"; + +message Foo { + int64 bar_1 = 1; +} +``` + ### `features.enum_type` {#enum_type} This feature sets the behavior for how enum values that aren't contained within @@ -99,11 +231,16 @@ and after of a proto3 file. **Applicable to the following scopes:** File, Enum -**Default behavior in Edition 2023:** `OPEN` +**Added in:** 2023 -**Behavior in proto2:** `CLOSED` +**Default behavior per syntax/edition:** -**Behavior in proto3:** `OPEN` +Syntax/edition | Default +-------------- | -------- +2024 | `OPEN` +2023 | `OPEN` +proto3 | `OPEN` +proto2 | `CLOSED` **Note:** Feature settings on different schema elements [have different scopes](#cascading). @@ -124,7 +261,7 @@ After running [Prototiller](#prototiller), the equivalent code might look like this: ```proto -edition = "2023"; +edition = "2024"; enum Foo { // Setting the enum_type feature overrides the default OPEN enum @@ -155,12 +292,19 @@ whether a protobuf field has a value. **Applicable to the following scopes:** File, Field -**Default behavior in the Edition 2023:** `EXPLICIT` +**Added in:** 2023 -**Behavior in proto2:** `EXPLICIT` +**Default behavior per syntax/edition:** -**Behavior in proto3:** `IMPLICIT` unless the field has the `optional` label, in -which case it behaves like `EXPLICIT`. See +Syntax/edition | Default +-------------- | ----------- +2024 | `EXPLICIT` +2023 | `EXPLICIT` +proto3 | `IMPLICIT`* +proto2 | `EXPLICIT` + +\* proto3 is `IMPLICIT` unless the field has the `optional` label, in which case +it behaves like `EXPLICIT`. See [Presence in Proto3 APIs](/programming-guides/field_presence#presence-in-proto3-apis) for more information. @@ -182,7 +326,7 @@ message Foo { After running Prototiller, the equivalent code might look like this: ```proto -edition = "2023"; +edition = "2024"; message Foo { // Setting the field_presence feature retains the proto2 required behavior @@ -207,7 +351,7 @@ message Bar { After running Prototiller, the equivalent code might look like this: ```proto -edition = "2023"; +edition = "2024"; // Setting the file-level field_presence feature matches the proto3 implicit default option features.field_presence = IMPLICIT; @@ -241,11 +385,16 @@ and after of a proto3 file. Editions behavior matches the behavior in proto3. **Applicable to the following scopes:** File, Message, Enum -**Default behavior in Edition 2023:** `ALLOW` +**Added in:** 2023 -**Behavior in proto2:** `LEGACY_BEST_EFFORT` +**Default behavior per syntax/edition:** -**Behavior in proto3:** `ALLOW` +Syntax/edition | Default +-------------- | -------------------- +2024 | `ALLOW` +2023 | `ALLOW` +proto3 | `ALLOW` +proto2 | `LEGACY_BEST_EFFORT` **Note:** Feature settings on different schema elements [have different scopes](#cascading). @@ -265,7 +414,7 @@ message Foo { After running Prototiller, the equivalent code might look like this: ```proto -edition = "2023"; +edition = "2024"; option features.json_format = LEGACY_BEST_EFFORT; message Foo { @@ -299,12 +448,16 @@ the following conditions are met: **Applicable to the following scopes:** File, Field -**Default behavior in Edition 2023:** `LENGTH_PREFIXED` +**Added in:** 2023 -**Behavior in proto2:** `LENGTH_PREFIXED` except for groups, which default to -`DELIMITED` +**Default behavior per syntax/edition:** -**Behavior in proto3:** `LENGTH_PREFIXED`. Proto3 doesn't support `DELIMITED`. +Syntax/edition | Default +-------------- | ----------------- +2024 | `LENGTH_PREFIXED` +2023 | `LENGTH_PREFIXED` +proto3 | `LENGTH_PREFIXED` +proto2 | `LENGTH_PREFIXED` **Note:** Feature settings on different schema elements [have different scopes](#cascading). @@ -325,7 +478,7 @@ message Foo { After running Prototiller, the equivalent code might look like this: ```proto -edition = "2023"; +edition = "2024"; message Foo { message Bar { @@ -351,11 +504,16 @@ for `repeated` fields has been migrated to in Editions. **Applicable to the following scopes:** File, Field -**Default behavior in Edition 2023:** `PACKED` +**Added in:** 2023 -**Behavior in proto2:** `EXPANDED` +**Default behavior per syntax/edition:** -**Behavior in proto3:** `PACKED` +Syntax/edition | Default +-------------- | ---------- +2024 | `PACKED` +2023 | `PACKED` +proto3 | `PACKED` +proto2 | `EXPANDED` **Note:** Feature settings on different schema elements [have different scopes](#cascading). @@ -374,7 +532,7 @@ message Foo { After running Prototiller, the equivalent code might look like this: ```proto -edition = "2023"; +edition = "2024"; option features.repeated_field_encoding = EXPANDED; message Foo { @@ -397,7 +555,7 @@ message Foo { After running Prototiller, the equivalent code might look like this: ```proto -edition = "2023"; +edition = "2024"; message Foo { repeated int32 bar = 6; @@ -425,11 +583,16 @@ and after of a proto3 file. **Applicable to the following scopes:** File, Field -**Default behavior in Edition 2023:** `VERIFY` +**Added in:** 2023 -**Behavior in proto2:** `NONE` +**Default behavior per syntax/edition:** -**Behavior in proto3:** `VERIFY` +Syntax/edition | Default +-------------- | -------- +2024 | `VERIFY` +2023 | `VERIFY` +proto3 | `VERIFY` +proto2 | `NONE` **Note:** Feature settings on different schema elements [have different scopes](#cascading). @@ -447,21 +610,91 @@ message MyMessage { After running Prototiller, the equivalent code might look like this: ```proto -edition = "2023"; +edition = "2024"; message MyMessage { string foo = 1 [features.utf8_validation = NONE]; } ``` -### Language-specific Features {#lang-specific} +## Language-specific Features {#lang-specific} Some features apply to specific languages, and not to the same protos in other languages. Using these features requires you to import the corresponding *_features.proto file from the language's runtime. The examples in the following sections show these imports. -#### `features.(pb.cpp/pb.java).legacy_closed_enum` {#legacy_closed_enum} +### `features.(pb.cpp).enum_name_uses_string_view` {#enum-name-string-view} + +**Languages:** C++ + +Before Edition 2024, all generated enum types provide the following function to +obtain the label out of an enum value, which has some overhead to construct the +`std::string` instances at runtime: + +```cpp +const std::string& Foo_Name(int); +``` + +The default feature value in Edition 2024 changes this signature to return +`absl::string_view` to allow for better storage decoupling and potential +memory/CPU savings. If you aren't ready to migrate, yet, you can override this +to set it back to its previous behavior. See +[string_view return type](/support/migration#string_view-return-type) +in the migration guide for more on this topic. + +**Values available:** + +* `true`: The enum uses `string_view` for its values. +* `false`: The enum uses `std::string` for its values. + +**Applicable to the following scopes:** Enum, File + +**Added in:** 2024 + +**Default behavior per syntax/edition:** + +Syntax/edition | Default +-------------- | ------- +2024 | `true` +2023 | `false` +proto3 | `false` +proto2 | `false` + +**Note:** Feature settings on different schema elements +[have different scopes](#cascading). + +### `features.(pb.java).large_enum` {#java-large_enum} + +**Languages:** Java + +This language-specific feature enables you to adopt new functionality that +handles large enums in Java without causing compiler errors. Note that this +feature replicates enum-like behavior but has some notable differences. For +example, switch statements are not supported. + +**Values available:** + +* `true`: Java enums will use the new functionality. +* `false`: Java enums will continue to use Java enums. + +**Applicable to the following scopes:** Enum + +**Added in:** 2024 + +**Default behavior per syntax/edition:** + +Syntax/edition | Default +-------------- | ------- +2024 | `false` +2023 | `false` +proto3 | `false` +proto2 | `false` + +**Note:** Feature settings on different schema elements +[have different scopes](#cascading). + +### `features.(pb.cpp/pb.java).legacy_closed_enum` {#legacy_closed_enum} **Languages:** C++, Java @@ -480,11 +713,16 @@ before and after of a proto3 file. **Applicable to the following scopes:** File, Field -**Default behavior in Edition 2023:** `false` +**Added in:** 2023 -**Behavior in proto2:** `true` +**Default behavior per syntax/edition:** -**Behavior in proto3:** `false` +Syntax/edition | Default +-------------- | ------- +2024 | `false` +2023 | `false` +proto3 | `false` +proto2 | `true` **Note:** Feature settings on different schema elements [have different scopes](#cascading). @@ -504,7 +742,7 @@ message Msg { After running Prototiller, the equivalent code might look like this: ```proto -edition = "2023"; +edition = "2024"; import "myproject/proto3file.proto"; @@ -519,14 +757,51 @@ message Msg { } ``` -#### `features.(pb.cpp).string_type` {#string_type} +### `features.(pb.java).nest_in_file_class` {#java-nest_in_file} + +**Languages:** Java + +This feature controls whether the Java generator will nest the generated class +in the Java generated file class. Setting this option to `Yes` is the equivalent +of setting `java_multiple_files = true` in proto2/proto3/edition 2023. + +The default outer classname is also updated to always be the camel-cased .proto +filename suffixed with Proto by default (for example, `foo/bar_baz.proto` +becomes `BarBazProto`). You can still override this using the +`java_outer_classname` file option and replace the pre-Edition 2024 default of +`BarBaz` or `BarBazOuterClass` depending on the presence of conflicts. + +**Values available:** + +* `NO`: Do not nest the generated class in the file class. +* `YES`: Nest the generated class in the file class. +* Legacy: An internal value used when the `java_multiple_files` option is set. + +**Applicable to the following scopes:** Message, Enum, Service + +**Added in:** 2024 + +**Default behavior per syntax/edition:** + +Syntax/edition | Default +-------------- | -------- +2024 | `NO` +2023 | `LEGACY` +proto3 | `LEGACY` +proto2 | `LEGACY` + +**Note:** Feature settings on different schema elements +[have different scopes](#cascading). + +### `features.(pb.cpp).string_type` {#string_type} **Languages:** C++ This feature determines how generated code should treat string fields. This replaces the `ctype` option from proto2 and proto3, and offers a new -`string_view` feature. In Edition 2023, either `ctype` or `string_type` may be -specified on a field, but not both. +`string_view` feature. In Edition 2023, you can specify either `ctype` or +`string_type` on a field, but not both. In Edition 2024, the `ctype` option is +removed. **Values available:** @@ -537,11 +812,16 @@ specified on a field, but not both. **Applicable to the following scopes:** File, Field -**Default behavior in Edition 2023:** `STRING` +**Added in:** 2023 -**Behavior in proto2:** `STRING` +**Default behavior per syntax/edition:** -**Behavior in proto3:** `STRING` +Syntax/edition | Default +-------------- | -------- +2024 | `VIEW` +2023 | `STRING` +proto3 | `STRING` +proto2 | `STRING` **Note:** Feature settings on different schema elements [have different scopes](#cascading). @@ -560,12 +840,12 @@ message Foo { After running Prototiller, the equivalent code might look like this: ```proto -edition = "2023"; +edition = "2024"; import "google/protobuf/cpp_features.proto"; message Foo { - string bar = 6; + string bar = 6 [features.(pb.cpp).string_type = STRING]; string baz = 7 [features.(pb.cpp).string_type = CORD]; } ``` @@ -584,17 +864,17 @@ message Foo { After running Prototiller, the equivalent code might look like this: ```proto -edition = "2023"; +edition = "2024"; import "google/protobuf/cpp_features.proto"; message Foo { - string bar = 6; + string bar = 6 [features.(pb.cpp).string_type = STRING]; string baz = 7 [features.(pb.cpp).string_type = CORD]; } ``` -#### `features.(pb.java).utf8_validation` {#java-utf8_validation} +### `features.(pb.java).utf8_validation` {#java-utf8_validation} **Languages:** Java @@ -613,11 +893,16 @@ before and after of a proto3 file. **Applicable to the following scopes:** Field, File -**Default behavior in Edition 2023:** `DEFAULT` +**Added in:** 2023 -**Behavior in proto2:** `DEFAULT` +**Default behavior per syntax/edition:** -**Behavior in proto3:** `DEFAULT` +Syntax/edition | Default +-------------- | --------- +2024 | `DEFAULT` +2023 | `DEFAULT` +proto3 | `DEFAULT` +proto2 | `DEFAULT` **Note:** Feature settings on different schema elements [have different scopes](#cascading). @@ -638,7 +923,7 @@ message MyMessage { After running Prototiller, the equivalent code might look like this: ```proto -edition = "2023"; +edition = "2024"; import "google/protobuf/java_features.proto"; @@ -650,37 +935,11 @@ message MyMessage { } ``` -#### `features.(pb.java).large_enum` {#java-large_enum} - -**Languages:** Java - -This language-specific feature enables you to adopt new functionality that -handles large enums in Java without causing compiler errors. - -This is new behavior, so doesn't affect proto2 or proto3 schema definition -files. - -**Values available:** - -* `true`: Java enums will use the new functionality. -* `false`: Java enums will continue to use Java enums. - -**Applicable to the following scopes:** Enum - -**Default behavior in Edition 2023:** `false` - -**Behavior in proto2:** `false` - -**Behavior in proto3:** `false` - -**Note:** Feature settings on different schema elements -[have different scopes](#cascading). - ## Preserving proto2 or proto3 Behavior {#preserving} You may want to move to the editions format but not deal with updates to the way that generated code behaves yet. This section shows the changes that the -Prototiller tool makes to your .proto files to make Edition 2023 protos behave +Prototiller tool makes to your .proto files to make editions-based protos behave like a proto2 or proto3 file. After these changes are made at the file level, you get the proto2 or proto3 @@ -695,6 +954,8 @@ the following sections to the top of your .proto file. ### Proto2 Behavior {#proto2-behavior} +The following shows the settings to replicate proto2 behavior with Edition 2023. + ```proto edition = "2023"; @@ -712,8 +973,9 @@ option features.(pb.java).legacy_closed_enum = true; ### Proto3 Behavior {#proto3-behavior} +The following shows the settings to replicate proto3 behavior with Edition 2023. + ```proto -// proto3 behaviors edition = "2023"; import "google/protobuf/cpp_features.proto"; @@ -729,6 +991,24 @@ option features.(pb.cpp).legacy_closed_enum = false; option features.(pb.java).legacy_closed_enum = false; ``` +### Edition 2023 to 2024 {#2023-2024} + +The following shows the settings to replicate Edition 2023 behavior with Edition +2024. + +```proto +edition = "2024"; + +import option "third_party/protobuf/cpp_features.proto"; +import option "third_party/java/protobuf/java_features.proto"; + +option features.(pb.cpp).string_type = STRING; +option features.enforce_naming_style = STYLE_LEGACY; +option features.default_symbol_visibility = EXPORT_ALL; +option features.(pb.cpp).enum_name_uses_string_view = false; +option features.(pb.java).nest_in_file_class = LEGACY; +``` + ### Caveats and Exceptions {#caveats} This section shows the changes that you'll need to make manually if you choose @@ -737,6 +1017,8 @@ not to use Prototiller. Setting the file-level defaults shown in the previous section sets the default behaviors in most cases, but there are a few exceptions. +**Edition 2023 and later** + * `optional`: Remove all instances of the `optional` label and change the [`features.field_presence`](#field_presence) to `EXPLICIT` if the file default is `IMPLICIT`. @@ -756,3 +1038,14 @@ behaviors in most cases, but there are a few exceptions. [Proto2 Behavior](#proto2-behavior). For proto3 files converted to editions format, add `[features.repeated_field_encoding=EXPANDED]` at the field level when you don't want the default proto3 behavior. + +**Edition 2024 and later** + +* (C++) `ctype`: Remove all instances of the `ctype` option and set the + [`features.(pb.cpp).string_type`](#string_type) value. +* (C++ and Go) `weak`: Remove weak + importsimports.. Use + [`import option`](/editions/overview#import-option) + instead. +* (Java) `java_multiple_files`: Remove `java_multiple_files` and use + [`features.(pb.java).nest_in_file_class`](#java-nest_in_file) instead. diff --git a/content/editions/overview.md b/content/editions/overview.md index c41064284..b62692862 100644 --- a/content/editions/overview.md +++ b/content/editions/overview.md @@ -9,7 +9,7 @@ type = "docs" Protobuf Editions replace the proto2 and proto3 designations that we have used for Protocol Buffers. Instead of adding `syntax = "proto2"` or `syntax = "proto3"` at the top of proto definition files, you use an edition number, such -as `edition = "2023"`, to specify the default behaviors your file will have. +as `edition = "2024"`, to specify the default behaviors your file will have. Editions enable the language to evolve incrementally over time. Instead of the hardcoded behaviors that older versions have had, editions @@ -22,7 +22,7 @@ the default behavior for the edition you've selected. You can also override your overrides. The [section later in this topic on lexical scoping](#scoping) goes into more detail on that. -*The latest released edition is 2023.* +*The latest released edition is 2024.* ## Lifecycle of a Feature {#lifecycles} @@ -33,7 +33,8 @@ example: 1. Edition 2031 creates `feature.amazing_new_feature` with a default value of `false`. This value maintains the same behavior as all earlier editions. - That is, it defaults to no impact. + That is, it defaults to no impact. Not all new features will default to the + no-op option, but for the sake of this example, `amazing_new_feature` does. 2. Developers update their .proto files to `edition = "2031"`. @@ -50,7 +51,8 @@ example: -4. At some point, `feature.amazing_new_feature` is marked deprecated in an edition and removed in a later one. +4. At some point, `feature.amazing_new_feature` is marked deprecated in an + edition and removed in a later one. When a feature is removed, the code generators for that behavior and the runtime libraries that support it may also be removed. The timelines will be @@ -61,8 +63,6 @@ example: -Because of this lifecycle, any `.proto` file that does not use deprecated -features has a no-op upgrade from one edition to the next. You will have the full window of the Google migration plus the deprecation window to upgrade your code. @@ -73,14 +73,14 @@ features may also use enums. For example, `features.field_presence` has values ## Migrating to Protobuf Editions {#migrating} Editions won't break existing binaries and don't change a message's binary, -text, or JSON serialization format. The first edition is as minimally disruptive -as possible. The first edition establishes the baseline and combines proto2 and -proto3 definitions into a new single definition format. +text, or JSON serialization format. Edition 2023 was as minimally disruptive as +possible. It established the baseline and combined proto2 and proto3 definitions +into a new single definition format. -When the subsequent editions are released, default behaviors for features may -change. You can have Prototiller do a no-op transformation of your .proto file -or you can choose to accept some or all of the new behaviors. Editions are -planned to be released roughly once a year. +As more editions are released, default behaviors for features may change. You +can have Prototiller do a no-op transformation of your .proto file or you can +choose to accept some or all of the new behaviors. Editions are planned to be +released roughly once a year. ### Proto2 to Editions {#proto2-migration} @@ -89,7 +89,7 @@ Prototiller tool to change the definition files to use Protobuf Editions syntax.
-#### Proto2 syntax {.new-tab} +#### Proto2 Syntax {.new-tab} ```proto // proto2 file @@ -119,15 +119,20 @@ message Player { } ``` -#### Editions syntax {.new-tab} +#### Editions Syntax {.new-tab} ```proto // Edition version of proto2 file -edition = "2023"; +edition = "2024"; package com.example; option features.utf8_validation = NONE; +option features.enforce_naming_style = STYLE_LEGACY; +option features.default_symbol_visibility = EXPORT_ALL; + +// Sets the default behavior for C++ strings +option features.(pb.cpp).string_type = STRING; message Player { // fields have explicit presence, so no explicit setting needed @@ -137,8 +142,8 @@ message Player { // to match the proto2 behavior, EXPANDED is set at the field level repeated int32 scores = 3 [features.repeated_field_encoding = EXPANDED]; - enum Handed { - // this overrides the default edition 2023 behavior, which is OPEN + export enum Handed { + // this overrides the default editions behavior, which is OPEN option features.enum_type = CLOSED; HANDED_UNSPECIFIED = 0; HANDED_LEFT = 1; @@ -161,7 +166,7 @@ Prototiller tool to change the definition files to use Protobuf Editions syntax.
-#### Proto3 syntax {.new-tab} +#### Proto3 Syntax {.new-tab} ```proto // proto3 file @@ -191,14 +196,21 @@ message Player { } ``` -#### Editions syntax {.new-tab} +#### Editions Syntax {.new-tab} ```proto // Editions version of proto3 file -edition = "2023"; +edition = "2024"; package com.example; +option features.utf8_validation = NONE; +option features.enforce_naming_style = STYLE_LEGACY; +option features.default_symbol_visibility = EXPORT_ALL; + +// Sets the default behavior for C++ strings +option features.(pb.cpp).string_type = STRING; + message Player { // fields have explicit presence, so no explicit setting needed string name = 1 [default = "N/A"]; @@ -207,7 +219,7 @@ message Player { // PACKED is the default state, and is provided just for illustration repeated int32 scores = 3 [features.repeated_field_encoding = PACKED]; - enum Handed { + export enum Handed { HANDED_UNSPECIFIED = 0; HANDED_LEFT = 1; HANDED_RIGHT = 2; @@ -227,8 +239,8 @@ message Player { ### Lexical Scoping {#scoping} Editions syntax supports lexical scoping, with a per-feature list of allowed -targets. For example, in the first edition, features can be specified at only -the file level or the lowest level of granularity. The implementation of lexical +targets. For example, in Edition 2023, features can be specified at only the +file level or the lowest level of granularity. The implementation of lexical scoping enables you to set the default behavior for a feature across an entire file, and then override that behavior at the message, field, enum, enum value, oneof, service, or method level. Settings made at a higher level (file, message) @@ -240,7 +252,7 @@ The following code sample shows some features being set at the file, field, and enum level. ```proto {highlight="lines:3,7,16"} -edition = "2023"; +edition = "2024"; option features.enum_type = CLOSED; @@ -299,7 +311,7 @@ enum Employment { ```proto // file myproject/edition.proto -edition = "2023"; +edition = "2024"; import "myproject/foo.proto"; ``` @@ -312,8 +324,9 @@ proto3 data files or file streams using your editions-syntax proto definitions. There are some grammar changes in editions compared to proto2 and proto3. -**Syntax description.** Instead of the `syntax` element, you use an `edition` -element: +#### Syntax Description {#syntax-descrip} + +Instead of the `syntax` element, you use an `edition` element: ```proto syntax = "proto2"; @@ -321,17 +334,141 @@ syntax = "proto3"; edition = "2028"; ``` -**Reserved names.** You no longer put field names and enum value names in -quotation marks when reserving them: +#### Reserved Names {#reserved-names} + +You no longer put field names and enum value names in quotation marks when +reserving them: ```proto reserved foo, bar; ``` -**Group syntax.** Group syntax, available in proto2, is removed in editions. The -special wire-format that groups used is still available by using `DELIMITED` -message encoding. +#### Group Syntax {#group-syntax} + +Group syntax, available in proto2, is removed in editions. The special +wire-format that groups used is still available by using `DELIMITED` message +encoding. -**Required label.** The `required` label, available only in proto2, is -unavailable in editions. The underlying functionality is still available +#### Required Label {#required-label} + +The `required` label, available only in proto2, is unavailable in editions. The +underlying functionality is still available by using `features.field_presence=LEGACY_REQUIRED`. + +#### `import option` {#import-option} + +Edition 2024 added support for option imports using the syntax `import option`. + +Option imports must come after any other `import` statements. + +Unlike normal `import` statements, option imports import only custom options +defined in a `.proto` file, without importing other symbols. + +This means that messages and enums are excluded from the option import. In the +following example, the `Bar` message cannot be used as a field type in +`foo.proto`, but options with type `Bar` can still be set. + +```proto +// bar.proto +edition = "2024"; + +import "google/protobuf/descriptor.proto"; + +message Bar { + bool bar = 1; +} + +extend proto2.FileOptions { + bool file_opt1 = 5000; + Bar file_opt2 = 5001; +} + +// foo.proto: +edition = "2024"; + +import option "bar.proto"; + +option (file_opt1) = true; +option (file_opt2) = {bar: true}; + +message Foo { + // Bar bar = 1; // This is not allowed +} +``` + +Option imports do not require generated code for its symbols and should thus be +provided as `option_deps` in `proto_library` instead of `deps`. This avoids +generating unreachable code. + +```proto +proto_library( + name = "foo", + srcs = ["foo.proto"], + option_deps = [":custom_option_proto"] +) +``` + +Option imports and `option_deps` are strongly recommended when importing +protobuf language features and other custom options to avoid generating +unnecessary code. + +This replaces `import weak`, which was removed in Edition 2024. + +#### `export` / `local` Keywords {#export-local} + +`export` and `local` keywords were added in Edition 2024 as modifiers for the +symbol visibility of importable symbols, from the default behavior specified by +[`features.default_symbol_visibility`](/editions/features#symbol-vis). + +This controls which symbols can be imported from other proto files, but does not +affect code-generation. + +In Edition 2024, these can be set on all `message` and `enum` symbols by +default. However, some values of the `default_symbol_visibility` feature further +restrict which symbols are exportable. + +Example: + +```proto +// Top-level symbols are exported by default in Edition 2024 +message LocalMessage { + int32 baz = 1; + // Nested symbols are local by default in Edition 2024; applying the `export` + // keyword overrides this + export enum ExportedNestedEnum { + UNKNOWN_EXPORTED_NESTED_ENUM_VALUE = 0; + } +} + +// The `local` keyword overrides the default behavior of exporting messages +local message AnotherMessage { + int32 foo = 1; +} +``` + +#### `import weak` and Weak Field Option {#import-weak} + +As of Edition 2024, weak imports are no longer allowed. + +If you previously relied on `import weak` to declare a "weak +dependency"—to import custom options without generated code for C++ and +Go—you should instead migrate to use `import option`. + +See [`import option`](/editions/overview#import-option) +for more details. + +#### `ctype` Field Option {#ctype} + +As of Edition 2024, `ctype` field option is no longer allowed. Use the +`string_type` feature instead. + +See +[`features.(pb.cpp).string_type`](/editions/features#string_type) +for more details. + +#### `java_multiple_files` File Option {#java-mult-files} + +As of Edition 2024, the `java_multiple_files` file option no longer available. +Use the +[`features.(pb.java).nest_in_file_class`](/editions/features#java-nest_in_file) +Java feature, instead. diff --git a/content/getting-started/pythontutorial.md b/content/getting-started/pythontutorial.md index 5db695cbe..e26d88aac 100644 --- a/content/getting-started/pythontutorial.md +++ b/content/getting-started/pythontutorial.md @@ -444,14 +444,14 @@ ignore any new fields. To the old code, optional fields that were deleted will simply have their default value, and deleted repeated fields will be empty. New code will also transparently read old messages. However, keep in mind that new optional fields will not be present in old messages, so you will need to either -check explicitly whether they're set with `has_`, or provide a reasonable -default value in your `.proto` file with `[default = value]` after the tag -number. If the default value is not specified for an optional element, a +check explicitly whether they're set with `HasField('field_name')`, or provide a +reasonable default value in your `.proto` file with `[default = value]` after +the tag number. If the default value is not specified for an optional element, a type-specific default value is used instead: for strings, the default value is the empty string. For booleans, the default value is false. For numeric types, the default value is zero. Note also that if you added a new repeated field, your new code will not be able to tell whether it was left empty (by new code) -or never set at all (by old code) since there is no `has_` flag for it. +or never set at all (by old code) since there is no `HasField` check for it. ## Advanced Usage {#advanced-usage} diff --git a/content/news/2025-01-23.md b/content/news/2025-01-23.md index bd7832ef3..fe42e3f63 100644 --- a/content/news/2025-01-23.md +++ b/content/news/2025-01-23.md @@ -25,14 +25,17 @@ considered a short-term workaround. ## Poison MSVC + Bazel -We will be dropping support for using Bazel and MSVC together in v34. As of v30, -we will poison this combination with an error unless you specify the opt-out -flag `--define=protobuf_allow_msvc=true` to silence it. +**Update:** This plan has been canceled. You can learn more about this in the +announcement on [July 16, 2025](/news/2025-07-16). -MSVC's path length limits combined with Bazel's sandboxing have become +~~We will be dropping support for using Bazel and MSVC together in v34. As of +v30, we will poison this combination with an error unless you specify the +opt-out flag `--define=protobuf_allow_msvc=true` to silence it.~~ + +~~MSVC's path length limits combined with Bazel's sandboxing have become increasingly difficult to support in combination. Rather than randomly break users who install protobuf into a long path, we will prohibit the use of MSVC from Bazel altogether. We will continue to support MSVC with CMake, and begin supporting [clang-cl](https://clang.llvm.org/docs/UsersManual.html#clang-cl) with Bazel. For any feedback or discussion, see -https://github.com/protocolbuffers/protobuf/issues/20085. +https://github.com/protocolbuffers/protobuf/issues/20085.~~ diff --git a/content/news/2025-07-14.md b/content/news/2025-07-14.md new file mode 100644 index 000000000..95baca143 --- /dev/null +++ b/content/news/2025-07-14.md @@ -0,0 +1,69 @@ ++++ +title = "Changes Announced on July 14, 2025" +linkTitle = "July 14, 2025" +toc_hide = "true" +description = "Changes announced for Protocol Buffers on July 14, 2025." +type = "docs" ++++ + +## Deprecating FieldDescriptor Enums + +We are announcing an upcoming change regarding the `FieldDescriptor` enum and +its associated values representing optional, required, and repeated. These are +being deprecated as we encourage the use of more precise accessor methods. + +### Background + +While at one time the `FieldDescriptor.label` enum served a purpose, the +evolution of Protocol Buffers has introduced more idiomatic ways to determine a +field's cardinality (singular/repeated) and presence semantics. + +* In proto2, `optional`, `required`, and `repeated` are explicit keywords. +* In proto3, `required` is no longer supported. All scalar fields are + implicitly "`optional`" in the sense that they have default values if not + set. The `optional` keyword was later reintroduced in proto3 to explicitly + track presence for scalar fields (distinguishing between an unset field and + a field set to its default value). +* In edition 2023 we removed the `optional` and `required` keywords and use + features to control those behaviors. + +The `label` enum conflates these distinct concepts (cardinality, requiredness, +and explicit presence tracking), leading to potential confusion, especially with +proto3's field presence model. + +### Impact and Migration + +The `FieldDescriptor.label` field will eventually be removed from the API. + +Note that the method names in this topic may be spelled slightly differently in +some languages. + +* **For Protocol Buffer Editions fields:** + * **Key Methods for Editions:** + * `hasPresence` becomes the primary method to determine if a singular + field tracks presence, reflecting the + [`features.field_presence`](/editions/features#field_presence) + setting for that field. + * **Migration:** Rely on `isRepeated` and `isRequired` for cardinality and + `hasPresence` to check for explicit presence tracking in singular + fields. +* **For proto2/proto3 fields:** `getLabel` will eventually be removed, and is + not recommended in the meantime. + +All users of Protocol Buffers who interact with `FieldDescriptor` objects in +their code (for example for code generation, reflection, and dynamic message +handling) should migrate away from using `FieldDescriptor.label` directly. + +Instead, update your code to use the following methods: + +* To check if a field is repeated: `field.isRepeated` +* To check if a field is required (proto2 and editions only): + `field.isRequired` +* To check if a singular field has explicit presence, use `hasPresence` + +### Timeline + +This deprecation is effective immediately. While `getLabel` will continue to +function, we recommend migrating your code proactively to ensure future +compatibility and clarity. This change will lead to a more robust and +understandable experience for developers using Protocol Buffers. diff --git a/content/news/2025-07-16.md b/content/news/2025-07-16.md new file mode 100644 index 000000000..9222ebeff --- /dev/null +++ b/content/news/2025-07-16.md @@ -0,0 +1,16 @@ ++++ +title = "Changes Announced on July 16, 2025" +linkTitle = "July 16, 2025" +toc_hide = "true" +description = "Changes announced for Protocol Buffers on July 16, 2025." +type = "docs" ++++ + +## Retaining support for Bazel with MSVC + +We announced on January 23, 2025 that we were planning to drop support for using +Bazel and MSVC together starting in v34. This plan is canceled due to Bazel's +[upcoming changes](https://github.com/bazelbuild/bazel/pull/26532) to virtual +includes on Windows. Clang-cl support will be kept in place as an alternative on +Windows. The opt-out flag `--define=protobuf_allow_msvc=true` will no longer be +required as of the 32.0 release. diff --git a/content/news/_index.md b/content/news/_index.md index 17c8c87ce..2964b6960 100644 --- a/content/news/_index.md +++ b/content/news/_index.md @@ -20,6 +20,10 @@ New news topics will also be published to the The following news topics provide information in the reverse order in which it was released. +* [July 16, 2025](/news/2025-07-16) - Resuming support + for Bazel with MSVC +* [July 14, 2025](/news/2025-07-14) - Deprecating + FieldDescriptor labels * [June 27, 2025](/news/2025-06-27) - Edition 2024 * [March 18, 2025](/news/2025-03-18) - Dropping support for Ruby 3.0 @@ -97,8 +101,6 @@ release notes will be more complete. Also, not everything from the chronological listing will be in these topics, as some content is not specific to a particular release. -* [Version 32.x](/news/v32) -* [Version 31.x](/news/v31) * [Version 30.x](/news/v30) * [Version 29.x](/news/v29) * [Version 26.x](/news/v26) diff --git a/content/news/v31.md b/content/news/v31.md index 91944be3a..781ffbb88 100644 --- a/content/news/v31.md +++ b/content/news/v31.md @@ -16,9 +16,84 @@ require action on your part. These describe changes as we anticipate them being implemented, but due to the flexible nature of software some of these changes may not land or may vary from how they are described in this topic. -### Dropping Ruby 3.0 Support +## Dropping Ruby 3.0 Support As per our official [Ruby support policy](https://cloud.google.com/ruby/getting-started/supported-ruby-versions), we will be dropping support for Ruby 3.0. The minimum supported Ruby version will be 3.1. + +## Deprecating Label Enums + +We are announcing an upcoming change regarding the `FieldDescriptor.label` enum +and its associated values: `FieldDescriptor.LABEL_OPTIONAL`, +`FieldDescriptor.LABEL_REQUIRED`, and `FieldDescriptor.LABEL_REPEATED`. These +are being deprecated as we encourage the use of more precise accessor methods. + +### Background + +While at one time the `FieldDescriptor.label` enum served a purpose, the +evolution of Protocol Buffers has introduced more explicit ways to determine a +field's cardinality (singular/repeated) and presence semantics. + +* In proto2, `optional`, `required`, and `repeated` are explicit keywords. +* In proto3, `required` is no longer supported. All scalar fields are + implicitly "`optional`" in the sense that they have default values if not + set. The `optional` keyword was later reintroduced in proto3 to explicitly + track presence for scalar fields (distinguishing between an unset field and + a field set to its default value). +* In edition 2023 we removed the `optional` and `required` keywords and use + features to control those behaviors. + +The `label` enum conflates these distinct concepts (cardinality, requiredness, +and explicit presence tracking), leading to potential confusion, especially with +proto3's field presence model. + +### Impact and Migration + +The `FieldDescriptor.label` field will eventually be removed from the API to +maintain backward compatibility. Note that the method names in this topic may be +spelled slightly differently in some languages. + +* **For Protocol Buffer Editions fields:** + * **Behavior:** The `getLabel` method will be simplified: + * It will return `FieldDescriptor.LABEL_OPTIONAL` for all singular + fields. + * It will return `FieldDescriptor.LABEL_REPEATED` for all repeated + fields. + * **Key Methods for Editions:** + * `hasOptionalKeyword` will always return `false` (as the optional + keyword's role in presence is superseded by feature-based presence + in Editions). + * `hasPresence` becomes the primary method to determine if a singular + field tracks presence, reflecting the + [`features.field_presence`](/editions/features#field_presence) + setting for that field. + * **Migration:** Rely on `isRepeated` for cardinality and `hasPresence` to + check for explicit presence tracking in singular fields. +* **For proto3 fields:** `getLabel` will eventually be removed, and is not + recommended in the meantime. Update your code to use `hasOptionalKeyword` + and `getRealContainingOneof` instead. +* **For proto2 fields:** `getLabel` will continue to reflect `LABEL_OPTIONAL`, + `LABEL_REQUIRED`, or `LABEL_REPEATED` as defined in the .proto file for the + time being. + +All users of Protocol Buffers who interact with `FieldDescriptor` objects in +their code (for example for code generation, reflection, and dynamic message +handling) should migrate away from using `FieldDescriptor.label` directly. + +Instead, update your code to use the following methods: + +* To check if a field is repeated: `field.isRepeated` +* To check if a field is required (proto2 and editions only): + `field.isRequired` +* To check if a singular proto3 field has explicit presence, use `hasPresence` +* To check if a singular field has explicit presence via a oneof, use + `hasPresence` + +### Timeline + +This deprecation is effective immediately. While `getLabel` will continue to +function, we recommend migrating your code proactively to ensure future +compatibility and clarity. This change will lead to a more robust and +understandable experience for developers using Protocol Buffers. diff --git a/content/programming-guides/enum.md b/content/programming-guides/enum.md index 66207534b..2e48628c4 100644 --- a/content/programming-guides/enum.md +++ b/content/programming-guides/enum.md @@ -9,9 +9,10 @@ Enums behave differently in different language libraries. This topic covers the different behaviors as well as the plans to move protobufs to a state where they are consistent across all languages. If you're looking for information on how to use enums in general, see the corresponding sections in the -[proto2](/programming-guides/proto2#enum) and -[proto3](/programming-guides/proto3#enum) language guide -topics. +[proto2](/programming-guides/proto2#enum), +[proto3](/programming-guides/proto3#enum), and +[editions 2023](/programming-guides/editions#enum) +language guide topics. ## Definitions {#definitions} @@ -20,8 +21,8 @@ except in their handling of unknown values. Practically, this means that simple cases work the same, but some corner cases have interesting implications. For the purpose of explanation, let us assume we have the following `.proto` -file (we are deliberately not specifying if this is a `syntax = "proto2"` or -`syntax = "proto3"` file right now): +file (we are deliberately not specifying if this is a `syntax = "proto2"`, +`syntax = "proto3"`, or `edition = "2023"` file right now): ``` enum Enum { @@ -78,8 +79,10 @@ Similarly, maps with *closed* enums for their value will place entire entries ## History {#history} Prior to the introduction of `syntax = "proto3"` all enums were *closed*. Proto3 -introduced *open* enums specifically because of the unexpected behavior that -*closed* enums cause. +and editions use *open* enums specifically because of the unexpected behavior +that *closed* enums cause. You can use +[`features.enum_type`](/editions/features#enum_type) to +explicitly set editions enums to open, if needed. ## Specification {#spec} @@ -97,6 +100,10 @@ behave. * When a `proto2` file imports an enum defined in a `proto3` file, that enum should be treated as **open**. +Editions honor whatever behavior the enum had in the file being imported from. +Proto2 enums are always treated as closed, proto3 enums are always treated as +open, and when importing from another editions file it uses the feature setting. + ## Known Issues {#known-issues} ### C++ {#cpp} diff --git a/content/programming-guides/style.md b/content/programming-guides/style.md index 3113660ae..af3a92fc8 100644 --- a/content/programming-guides/style.md +++ b/content/programming-guides/style.md @@ -195,14 +195,6 @@ service FooService { } ``` -For more service-related guidance, see -[Create Unique Protos per Method](/best-practices/api#unique-protos) -and -[Don't Include Primitive Types in a Top-level Request or Response Proto](/programming-guides/api#dont-include-primitive-types) -in the API Best Practices topic, and -[Define Message Types in Separate Files](/best-practices/dos-donts#separate-files) -in Proto Best Practices. - ## Things to Avoid {#avoid} ### Required Fields {#required} @@ -210,7 +202,9 @@ in Proto Best Practices. Required fields are a way to enforce that a given field must be set when parsing wire bytes, and otherwise refuse to parse the message. The required invariant is generally not enforced on messages constructed in memory. Required fields were -removed in proto3. +removed in proto3. Proto2 `required` fields that have been migrated to editions +2023 can use the `field_presence` feature set to `LEGACY_REQUIRED` to +accommodate. While enforcement of required fields at the schema level is intuitively desirable, one of the primary design goals of protobuf is to support long term @@ -228,8 +222,10 @@ See ### Groups {#groups} Groups is an alternate syntax and wire format for nested messages. Groups are -considered deprecated in proto2 and were removed from proto3. You should use a -nested message definition and field of that type instead of using the group -syntax. +considered deprecated in proto2, were removed from proto3, and are converted to +a delimited representation in edition 2023. You can use a nested message +definition and field of that type instead of using the group syntax, using the +[`message_encoding`](/editions/features#message_encoding) +feature for wire-compatibility. See [groups](/programming-guides/proto2#groups). diff --git a/content/reference/dart/dart-generated.md b/content/reference/dart/dart-generated.md index d2df98b23..ff94ca5cd 100644 --- a/content/reference/dart/dart-generated.md +++ b/content/reference/dart/dart-generated.md @@ -7,12 +7,12 @@ type = "docs" +++ Any differences between -proto2 and proto3 generated code are highlighted - note that these differences -are in the generated code as described in this document, not the base API, which -are the same in both versions. You should read the -[proto2 language guide](/programming-guides/proto2) -and/or the -[proto3 language guide](/programming-guides/proto3) +proto2, proto3, and editions generated code are highlighted - note that these +differences are in the generated code as described in this document, not the +base API, which are the same in both versions. You should read the +[proto2 language guide](/programming-guides/proto2), +[proto3 language guide](/programming-guides/proto3), or +[editions language guide](/programming-guides/editions) before reading this document. ## Compiler Invocation {#invocation} @@ -109,13 +109,16 @@ case-conversion works as follows: Thus, for the field `foo_bar_baz`, the getter becomes `get fooBarBaz` and a method prefixed with `has` would be `hasFooBarBaz`. -### Singular Primitive Fields (proto2) +### Singular Primitive Fields -For any of these field definitions: +All fields have +[explicit presence](/programming-guides/field_presence#presence-proto2) +in the Dart implementation. + +For the following field definition: ```proto -optional int32 foo = 1; -required int32 foo = 1; +int32 foo = 1; ``` The compiler will generate the following accessor methods in the message class: @@ -128,35 +131,10 @@ The compiler will generate the following accessor methods in the message class: - `void clearFoo()`: Clears the value of the field. After calling this, `hasFoo()` will return `false` and `get foo` will return the default value. -For other simple field types, the corresponding Dart type is chosen according to -the -[scalar value types table](/programming-guides/proto2#scalar). -For message and enum types, the value type is replaced with the message or enum -class. - -### Singular Primitive Fields (proto3) - -For this field definition: - -```proto -int32 foo = 1; -``` - -The compiler will generate the following accessor methods in the message class: - -- `int get foo`: Returns the current value of the field. If the field is not - set, returns the default value. -- `set foo(int value)`: Sets the value of the field. After calling this, `get - foo` will return `value`. -- `void clearFoo()`: Clears the value of the field. After calling this,`get - foo` will return the default value. - {{% alert title="Note" color="note" %}} Due to a quirk in the Dart proto3 implementation, the following methods are generated - even if the `optional` modifier, used to request - [presence semantics](/programming-guides/field_presence#presence-in-proto3-apis), - isn't in the proto - definition.{{% /alert %}} + even if implicit presence is + configured.{{% /alert %}} - `bool hasFoo()`: Returns `true` if the field is set. @@ -171,6 +149,12 @@ The compiler will generate the following accessor methods in the message class: - `void clearFoo()`: Clears the value of the field. After calling this, `hasFoo()` will return `false` and `get foo` will return the default value. +For other simple field types, the corresponding Dart type is chosen according to +the +[scalar value types table](/programming-guides/editions#scalar). +For message and enum types, the value type is replaced with the message or enum +class. + ### Singular Message Fields {#singular-message} Given the message type: @@ -188,7 +172,7 @@ message Baz { // The generated code is the same result if required instead of optional. } -// proto3 +// proto3 and editions message Baz { Bar bar = 1; } @@ -226,10 +210,6 @@ The compiler will generate: For this field definition: ```proto -// proto2 -optional int64 bar = 1; - -// proto3 int64 bar = 1; ``` @@ -246,7 +226,7 @@ import 'package:fixnum/fixnum.dart'; ### Map Fields -Given a [`map`](/programming-guides/proto3#maps) field +Given a [`map`](/programming-guides/editions#maps) field definition like this: ```proto @@ -259,9 +239,9 @@ The compiler will generate the following getter: field is not set, returns an empty map. Modifications to the map are reflected in the field. -## Any +### Any -Given an [`Any`](/programming-guides/proto3#any) field +Given an [`Any`](/programming-guides/editions#any) field like this: ```proto @@ -303,9 +283,9 @@ and unpack the `Any`'s values: {String typeUrlPrefix = 'type.googleapis.com'}); ``` -## Oneof +### Oneof -Given a [`oneof`](/programming-guides/proto3#oneof) +Given a [`oneof`](/programming-guides/editions#oneof) definition like this: ```proto @@ -319,7 +299,7 @@ message Foo { The compiler will generate the following Dart enum type: -```proto +```dart enum Foo_Test { name, subMessage, notSet } ``` @@ -343,7 +323,7 @@ are generated. For instance for `name`: this, `get name` will return the default value and `whichTest()` will return `Foo_Test.notSet`. -## Enumerations {#enum} +### Enumerations {#enum} Given an enum definition like: @@ -414,10 +394,10 @@ The protocol buffer compiler will generate a class called `Bar`, which extends `GeneratedMessage`, and a class called `Bar_Color`, which extends `ProtobufEnum`. -## Extensions (proto2 only) {#extension} +## Extensions (not available in proto3) {#extension} Given a file `foo_test.proto` including a message with an -[extension range](/programming-guides/proto2#extensions) +[extension range](/programming-guides/editions#extensions) and a top-level extension definition: ```proto @@ -453,7 +433,7 @@ Extensions can also be declared nested inside of another message: ```proto message Baz { extend Foo { - optional int32 bar = 124; + int32 bar = 124; } } ``` diff --git a/content/reference/go/go-generated-opaque.md b/content/reference/go/go-generated-opaque.md index cb5fd705a..b81c7ef65 100644 --- a/content/reference/go/go-generated-opaque.md +++ b/content/reference/go/go-generated-opaque.md @@ -256,68 +256,47 @@ case-conversion works as follows: Thus, you can access the proto field `birth_year` using the `GetBirthYear()` method in Go, and `_birth_year_2` using `GetXBirthYear_2()`. -### Singular Scalar Fields (proto2) {#singular-scalar-proto2} +### Singular Fields -For either of these field definitions: +For this field definition: ```proto -optional int32 birth_year = 1; -required int32 birth_year = 1; +// proto2 and proto3 +message Artist { + optional int32 birth_year = 1; +} + +// editions +message Artist { + int32 birth_year = 1 [features.field_presence = EXPLICIT]; +} ``` -the compiler generates the following accessor methods: +the compiler generates a Go struct with the following accessor methods: ```go -func (m *Artist) GetBirthYear() int32 { ... } -func (m *Artist) SetBirthYear(v int32) { ... } -func (m *Artist) HasBirthYear() bool { ... } -func (m *Artist) ClearBirthYear() { ... } +func (m *Artist) GetBirthYear() int32 +func (m *Artist) SetBirthYear(v int32) ``` -The accessor method `GetBirthYear()` returns the `int32` value in `birth_year` -or the default value if the field is unset. If the default is not explicitly -set, the [zero value](https://golang.org/ref/spec#The_zero_value) of -that type is used instead (`0` for numbers, the empty string for strings). +With implicit presence, the getter returns the `int32` value in `birth_year` or +the [zero value](https://golang.org/ref/spec#The_zero_value) of that +type if the field is unset (`0` for numbers, the empty string for strings). With +explicit presence, the getter returns the `int32` value in `birth_year` or the +default value if the field is unset. If the default is not explicitly set, the +zero value is used instead. For other scalar field types (including `bool`, `bytes`, and `string`), `int32` is replaced with the corresponding Go type according to the -[scalar value types table](/programming-guides/proto2#scalar). - -### Singular Scalar Fields (proto3) {#singular-scalar-proto3} - -For this field definition: - -```proto -int32 birth_year = 1; -optional int32 first_active_year = 2; -``` +[scalar value types table](/programming-guides/proto3#scalar). -the compiler generates the following accessor methods: +In fields with explicit presence, you can also use these methods: ```go -func (m *Artist) GetBirthYear() int32 { ... } -func (m *Artist) SetBirthYear(v int32) { ... } -// NOTE: No HasBirthYear() or ClearBirthYear() methods; -// proto3 fields only have presence when declared as optional: -// /programming-guides/field_presence.md - -func (m *Artist) GetFirstActiveYear() int32 { ... } -func (m *Artist) SetFirstActiveYear(v int32) { ... } -func (m *Artist) HasFirstActiveYear() bool { ... } -func (m *Artist) ClearFirstActiveYear() { ... } +func (m *Artist) HasBirthYear() bool +func (m *Artist) ClearBirthYear() ``` -The accessor method `GetBirthYear()` returns the `int32` value in `birth_year` -or the [zero value](https://golang.org/ref/spec#The_zero_value) of -that type if the field is unset (`0` for numbers, the empty string for strings). - -For other scalar field types (including `bool`, `bytes`, and `string`), `int32` -is replaced with the corresponding Go type according to the -[scalar value types table](/programming-guides/proto3#scalar). -Unset values in the proto will be represented as the -[zero value](https://golang.org/ref/spec#The_zero_value) of that type -(`0` for numbers, the empty string for strings). - ### Singular Message Fields {#singular-message} Given the message type: @@ -335,7 +314,7 @@ message Concert { // The generated code is the same result if required instead of optional. } -// proto3 +// proto3 and editions message Concert { Band headliner = 1; } diff --git a/content/reference/go/go-generated.md b/content/reference/go/go-generated.md index 07592dc8f..b571e3a07 100644 --- a/content/reference/go/go-generated.md +++ b/content/reference/go/go-generated.md @@ -7,12 +7,12 @@ type = "docs" +++ Any differences between -proto2 and proto3 generated code are highlighted - note that these differences -are in the generated code as described in this document, not the base API, which -are the same in both versions. You should read the -[proto2 language guide](/programming-guides/proto2) -and/or the -[proto3 language guide](/programming-guides/proto3) +proto2, proto3, and editions generated code are highlighted - note that these +differences are in the generated code as described in this document, not the +base API, which are the same in both versions. You should read the +[proto2 language guide](/programming-guides/proto2), +[proto3 language guide](/programming-guides/proto3), or +[editions language guide](/programming-guides/editions) before reading this document. {{% alert title="Note" color="warning" %}}You are @@ -256,13 +256,14 @@ The case-conversion works as follows: Thus, the proto field `birth_year` becomes `BirthYear` in Go, and `_birth_year_2` becomes `XBirthYear_2`. -### Singular Scalar Fields (proto2) {#singular-scalar-proto2} + -For either of these field definitions: +### Singular Explicit Presence Scalar Fields {#singular-explicit} + +For the field definition: ```proto -optional int32 birth_year = 1; -required int32 birth_year = 1; +int32 birth_year = 1; ``` the compiler generates a struct with an `*int32` field named `BirthYear` and an @@ -275,13 +276,14 @@ For other scalar field types (including `bool`, `bytes`, and `string`), `*int32` is replaced with the corresponding Go type according to the [scalar value types table](/programming-guides/proto2#scalar). -### Singular Scalar Fields (proto3) {#singular-scalar-proto3} + + +### Singular Implicit Presence Scalar Fields {#singular-implicit} For this field definition: ```proto int32 birth_year = 1; -optional int32 first_active_year = 2; ``` The compiler will generate a struct with an `int32` field named `BirthYear` and @@ -297,8 +299,8 @@ For other scalar field types (including `bool`, `bytes`, and `string`), `int32` is replaced with the corresponding Go type according to the [scalar value types table](/programming-guides/proto3#scalar). Unset values in the proto will be represented as the -[zero value](https://golang.org/ref/spec#The_zero_value) of that type -(`0` for numbers, the empty string for strings). +[zero value](https://golang.org/ref/spec#The_zero_value) of that type (`0` for +numbers, the empty string for strings). ### Singular Message Fields {#singular-message} @@ -321,6 +323,11 @@ message Concert { message Concert { Band headliner = 1; } + +// editions +message Concer { + Band headliner = 1; +} ``` The compiler will generate a Go struct @@ -606,13 +613,13 @@ represented in Go in exactly the same way, with multiple names corresponding to the same numeric value. The reverse mapping contains a single entry for the numeric value to the name which appears first in the .proto file. -## Extensions (proto2) {#extensions} +## Extensions {#extensions} Given an extension definition: ```proto extend Concert { - optional int32 promo_id = 123; + int32 promo_id = 123; } ``` @@ -642,9 +649,9 @@ For example, given the following definition: ```proto extend Concert { - optional int32 singular_int32 = 1; + int32 singular_int32 = 1; repeated bytes repeated_strings = 2; - optional Band singular_message = 3; + Band singular_message = 3; } ``` @@ -667,7 +674,7 @@ pattern is to do something like this: ```proto message Promo { extend Concert { - optional int32 promo_id = 124; + int32 promo_id = 124; } } ``` diff --git a/content/reference/go/opaque-migration.md b/content/reference/go/opaque-migration.md index 72e58deb0..3b49db217 100644 --- a/content/reference/go/opaque-migration.md +++ b/content/reference/go/opaque-migration.md @@ -12,30 +12,28 @@ the [Go Protobuf: Releasing the Opaque API](https://go.dev/blog/protobuf-opaque) blog post for an introduction. The migration to the Opaque API happens incrementally, on a per-proto-message or -per-`.proto`-file basis, by setting the Protobuf Editions feature `api_level` -option to one of its possible values: +per-`.proto`-file basis, by setting the `api_level` feature to one of its +possible values: -* `API_OPEN` selects the Open Struct API; this was the only API before - December 2024. +* `API_OPEN` selects the Open Struct API. This was backported into edition + 2023, so older versions of the Go plugin may not honor it. * `API_HYBRID` is a step between Open and Opaque: The Hybrid API also includes accessor methods (so you can update your code), but still exports the struct fields as before. There is no performance difference; this API level only helps with the migration. -* `API_OPAQUE` selects the Opaque API. +* `API_OPAQUE` selects the Opaque API; this is the default for Edition 2024 + and newer. -Today, the default is `API_OPEN`, but the upcoming -[Protobuf Edition 2024](/editions/overview) will change -the default to `API_OPAQUE`. - -To use the Opaque API before Edition 2024, set the `api_level` like so: +To override the default for a specific `.proto` file, set the `api_level` +feature: ```proto -edition = "2023"; +edition = "2024"; package log; import "google/protobuf/go_features.proto"; -option features.(pb.go).api_level = API_OPAQUE; +option features.(pb.go).api_level = API_OPEN; message LogEntry { … } ``` @@ -48,7 +46,7 @@ For your convenience, you can also override the default API level with a `protoc` command-line flag: ``` -protoc […] --go_opt=default_api_level=API_OPAQUE +protoc […] --go_opt=default_api_level=API_OPEN ``` To override the default API level for a specific file (instead of all files), @@ -56,13 +54,9 @@ use the `apilevelM` mapping flag (similar to [the `M` flag for import paths](/reference/go/go-generated/#package)): ``` -protoc […] --go_opt=apilevelMhello.proto=API_OPAQUE +protoc […] --go_opt=apilevelMhello.proto=API_OPEN ``` -The command-line flags also work for `.proto` files still using proto2 or proto3 -syntax, but if you want to select the API level from within the `.proto` file, -you need to migrate said file to editions first. - ## Automated migration {#automated} We try to make migrating existing projects to the Opaque API as easy as possible diff --git a/content/reference/java/java-generated.md b/content/reference/java/java-generated.md index ae8bd9958..126740c30 100644 --- a/content/reference/java/java-generated.md +++ b/content/reference/java/java-generated.md @@ -897,7 +897,7 @@ RPC systems based on `.proto`-language service definitions should provide [plugins](/reference/cpp/api-docs/google.protobuf.compiler.plugin.pb) to generate code appropriate for the system. These plugins are likely to require that abstract services are disabled, so that they can generate their own classes -of the same names. Plugins are new in version 2.3.0 (January 2010). +of the same names. The remainder of this section describes what the protocol buffer compiler generates when abstract services are enabled. diff --git a/content/reference/kotlin/kotlin-generated.md b/content/reference/kotlin/kotlin-generated.md index f35dfea64..703324afa 100644 --- a/content/reference/kotlin/kotlin-generated.md +++ b/content/reference/kotlin/kotlin-generated.md @@ -6,13 +6,13 @@ description = "Describes exactly what Kotlin code the protocol buffer compiler g type = "docs" +++ -Any differences between proto2 and proto3 generated code -are highlighted—note that these differences are in the generated code as -described in this document, not the base message classes/interfaces, which are -the same in both versions. You should read the -[proto2 language guide](/programming-guides/proto2) -and/or -[proto3 language guide](/programming-guides/proto3) +Any differences between proto2, proto3, and editions +generated code are highlighted—note that these differences are in the +generated code as described in this document, not the base message +classes/interfaces, which are the same in both versions. You should read the +[proto2 language guide](/programming-guides/proto2), +[proto3 language guide](/programming-guides/proto3), +and/or the [Editions guide](/programming-guides/editions) before reading this document. ## Compiler Invocation {#invocation} @@ -127,18 +127,18 @@ In a few special cases in which a field name conflicts with reserved words in Kotlin or methods already defined in the protobuf library, an extra underscore is appended. For instance, the clearer for a field named `in` is `clearIn_()`. -### Singular Fields (proto2) +### Singular Fields -For any of these field definitions: +For this field definition: ```proto -optional int32 foo = 1; -required int32 foo = 1; +int32 foo = 1; ``` The compiler will generate the following accessors in the DSL: -- `fun hasFoo(): Boolean`: Returns `true` if the field is set. +- `fun hasFoo(): Boolean`: Returns `true` if the field is set. This is not + generated for fields using implicit presence. - `var foo: Int`: The current value of the field. If the field is not set, returns the default value. - `fun clearFoo()`: Clears the value of the field. After calling this, @@ -173,54 +173,6 @@ In general, this is because the compiler does not know whether `Foo` has a Kotlin DSL at all, or e.g. only has the Java APIs generated. This means that you do not have to wait for messages you depend on to add Kotlin code generation. -### Singular Fields (proto3) - -For this field definition: - -```proto -int32 foo = 1; -``` - -The compiler will generate the following property in the DSL: - -- `var foo: Int`: Returns the current value of the field. If the field is not - set, returns the default value for the field's type. -- `fun clearFoo()`: Clears the value of the field. After calling this, - `getFoo()` will return the default value for the field's type. - -For other simple field types, the corresponding Java type is chosen according to -the -[scalar value types table](/programming-guides/proto2#scalar). -For message and enum types, the value type is replaced with the message or enum -class. As the message type is still defined in Java, unsigned types in the -message are represented using the standard corresponding signed types in the -DSL, for compatibility with Java and older versions of Kotlin. - -#### Embedded Message Fields - -For message field types, an additional accessor method is generated in the DSL: - -- `boolean hasFoo()`: Returns `true` if the field has been set. - -Note that there is no shortcut for setting a submessage based on a DSL. For -example, if you have a field - -```proto -Foo my_foo = 1; -``` - -you must write - -```kotlin -myFoo = foo { - ... -} -``` - -In general, this is because the compiler does not know whether `Foo` has a -Kotlin DSL at all, or e.g. only has the Java APIs generated. This means that you -do not have to wait for messages you depend on to add Kotlin code generation. - ### Repeated Fields {#repeated} For this field definition: @@ -276,8 +228,7 @@ The compiler will generate the following accessor methods in the DSL: fields are set; see the [Java code reference](/reference/java/java-generated#oneof) for the return type -- `fun hasFoo(): Boolean` (proto2 only): Returns `true` if the oneof case is - `FOO`. +- `fun hasFoo(): Boolean`: Returns `true` if the oneof case is `FOO`. - `val foo: Int`: Returns the current value of `oneof_name` if the oneof case is `FOO`. Otherwise, returns the default value of this field. @@ -313,9 +264,9 @@ The compiler will generate the following members in the DSL class: - `fun DslMap.clear()`: clears all entries from this map field -## Extensions (proto2 only) {#extension} +## Extensions {#extension} -Given a message with an extension range: +Given a proto2 or editions message with an extension range: ```proto message Foo { @@ -363,7 +314,7 @@ Given an extension definition: ```proto extend Foo { - optional int32 bar = 123; + int32 bar = 123; } ``` diff --git a/content/reference/objective-c/objective-c-generated.md b/content/reference/objective-c/objective-c-generated.md index a409daea2..62898604f 100644 --- a/content/reference/objective-c/objective-c-generated.md +++ b/content/reference/objective-c/objective-c-generated.md @@ -7,11 +7,12 @@ type = "docs" +++ Any -differences between proto2 and proto3 generated code are highlighted. You should -read the +differences between proto2, proto3, and Editions generated code are highlighted. +You should read the [proto2 language guide](/programming-guides/proto2) and/or [proto3 language guide](/programming-guides/proto3) +and/or the [Editions guide](/programming-guides/editions) before reading this document. ## Compiler invocation {#invocation} @@ -166,14 +167,16 @@ The behaviors for this interface are as follows: - (BOOL)isEqual:(id)value; ``` -### Unknown fields (proto2 only) +### Unknown fields -If a message created with an -[older version](/programming-guides/proto2#updating) of -your .proto definition is parsed with code generated from a newer version (or -vice versa), the message may contain optional or repeated fields that the -\"new\" code does not recognize. In proto2 generated code, these fields are not -discarded and are stored in the message's `unknownFields` property. +When a message is parsed, it may contain fields that are not known to the +parsing code. This can happen when a message is created with an +[older version](/programming-guides/proto2#updating) of a +.proto definition and is then parsed with code generated from a newer version +(or vice versa). + +These fields are not discarded and are stored in the message's `unknownFields` +property: ```objc @property(nonatomic, copy, nullable) GPBUnknownFieldSet *unknownFields; @@ -182,14 +185,16 @@ discarded and are stored in the message's `unknownFields` property. You can use the `GPBUnknownFieldSet` interface to fetch these fields by number or loop over them as an array. -In proto3, unknown fields are simply discarded when a message is parsed. - ## Fields The following sections describe the code generated by the protocol buffer -compiler for message fields. +compiler for message fields. They are divided by those with implicit and +explicit presence. You can learn more about this distinction in +[Field Presence](/programming-guides/field_presence). + + -### Singular fields (proto3) {#singular3} +### Singular Fields with Implicit Presence {#singular-implicit} For every singular field the compiler generates a property to store data and an integer constant containing the field number. Message type fields also get a @@ -325,7 +330,9 @@ foo.a.b = 2; where `a` will be automatically created via the accessors if necessary. If `foo.a` returned `nil`, the `foo.a.b` setter pattern would not work. -### Singular fields (proto2) {#singular2} + + +### Singular Fields with Explicit Presence {#singular-explicit} For every singular field the compiler generates a property to store data, an integer constant containing the field number, and a `has..` property that lets @@ -481,7 +488,7 @@ where `a` will be automatically created via the accessors if necessary. If ### Repeated fields {#repeated} -Like singular fields([proto2](#singular2) [proto3](#singular3)), the protocol +Like singular fields ([proto2](#singular2) [proto3](#singular3)), the protocol buffer compiler generates one data property for each repeated field. This data property is a `GPBArray` depending on the field type where `` can be one of `UInt32`, `Int32`, `UInt64`, `Int64`, `Bool`, `Float`, `Double`, or @@ -535,6 +542,9 @@ typedef GPB_ENUM(Foo_FieldNumber) { @end ``` +**Note:** The behavior of repeated fields can be configured in Editions with the +[`features.repeated_field_encoding` feature](/editions/features#repeated_field_encoding). + For string, bytes and message fields, elements of the array are `NSString*`, `NSData*` and pointers to subclasses of `GPBMessage` respectively. @@ -1040,13 +1050,16 @@ support the case where the server returns values that the client may not recognize due to the client and server being compiled with different versions of the proto file. -Unrecognized enum values are treated differently depending on which protocol -buffers version you are using. In proto3, `kGPBUnrecognizedEnumeratorValue` is -returned for the typed enumerator value if the enumerator value in the parsed -message data is not one that the code reading it was compiled to support. If the -actual value is desired, use the raw value accessors to get the value as an -`int32_t`. If you are using proto2, unrecognized enum values are treated as -unknown fields. +Unrecognized enum values are treated differently depending on the language +version and the `features.enum_type` feature in Editions. + +* In open enums, `kGPBUnrecognizedEnumeratorValue` is returned for the typed + enumerator value if the enumerator value in the parsed message data is not + one that the code reading it was compiled to support. If the actual value is + desired, use the raw value accessors to get the value as an `int32_t`. +* In closed enums, unrecognized enum values are treated as unknown fields. +* Proto2 enums are closed, and proto3 enums are open. In Editions, the + behavior is configurable with the `features.enum_type` feature. `kGPBUnrecognizedEnumeratorValue` is defined as `0xFBADBEEF`, and it will be an error if any enumerator in an enumeration has this value. Attempting to set any @@ -1093,15 +1106,15 @@ let aValue = Foo.ValueA let anotherValue: Foo = .GPBUnrecognizedEnumeratorValue ``` -## Well-known types (proto3 only) {#wellknown} +## Well-known types {#wellknown} -If you use any of the message types provided with proto3, they will in general -just use their proto definitions in generated Objective-C code, though we supply -some basic conversion methods in categories to make using them simpler. Note -that we do not have special APIs for all well-known types yet, including -[`Any`](/programming-guides/proto3#any) (there is -currently no helper method to convert an `Any`'s message value into a message of -the appropriate type). +If you use any of the message types provided with protocol buffers, they will in +general just use their proto definitions in generated Objective-C code, though +we supply some basic conversion methods in categories to make using them +simpler. Note that we do not have special APIs for all well-known types yet, +including [`Any`](/programming-guides/proto3#any) (there +is currently no helper method to convert an `Any`'s message value into a message +of the appropriate type). ### Time Stamps @@ -1125,7 +1138,7 @@ the appropriate type). @end ``` -## Extensions (proto2 only) {#extensions} +## Extensions (proto2 and editions only) {#extensions} Given a message with an [extension range](/programming-guides/proto2#extensions): @@ -1136,13 +1149,13 @@ message Foo { } extend Foo { - optional int32 foo = 101; + int32 foo = 101; repeated int32 repeated_foo = 102; } message Bar { extend Foo { - optional int32 bar = 103; + int32 bar = 103; repeated int32 repeated_bar = 104; } } diff --git a/content/reference/php/php-generated.md b/content/reference/php/php-generated.md index 3e5ecb09b..957132899 100644 --- a/content/reference/php/php-generated.md +++ b/content/reference/php/php-generated.md @@ -7,9 +7,10 @@ type = "docs" +++ You should read the -[proto3 language guide](/programming-guides/proto3) +[proto3 language guide](/programming-guides/proto3) or +[Editions language guide](/programming-guides/editions) before reading this document. Note that the protocol buffer compiler currently -only supports proto3 code generation for PHP. +only supports proto3 and editions code generation for PHP. ## Compiler Invocation {#invocation} @@ -37,6 +38,7 @@ protoc --proto_path=src --php_out=build/gen src/example.proto And `src/example.proto` is defined as: ```proto +edition = "2023"; package foo.bar; message MyMessage {} ``` @@ -87,7 +89,12 @@ to Pascal case.* Given a simple message declaration: ```proto -message Foo {} +message Foo { + int32 int32_value = 1; + string string_value = 2; + repeated int32 repeated_int32_value = 3; + map map_int32_int32_value = 4; +} ``` The protocol buffer compiler generates a PHP class called `Foo`. This class @@ -97,11 +104,12 @@ following example: ```php $from = new Foo(); -$from->setInt32(1); -$from->setString('a'); -$from->getRepeatedInt32()[] = 1; -$from->getMapInt32Int32()[1] = 1; +$from->setInt32Value(1); +$from->setStringValue('a'); +$from->getRepeatedInt32Value()[] = 1; +$from->getMapInt32Int32Value()[1] = 1; $data = $from->serializeToString(); +$to = new Foo(); try { $to->mergeFromString($data); } catch (Exception $e) { @@ -119,20 +127,21 @@ classes. So, for example, if you have this in your `.proto`: ```proto message TestMessage { - optional int32 a = 1; - message NestedMessage {...} + message NestedMessage { + int32 a = 1; + } } ``` -The compiler will generate the following classes: +The compiler will generate the following class: ```php -class TestMessage { - public a; -} - // PHP doesn’t support nested classes. -class TestMessage_NestedMessage {...} +class TestMessage_NestedMessage { + public function __construct($data = NULL) {...} + public function getA() {...} + public function setA($var) {...} +} ``` If the message class name is reserved (for example, `Empty`), the prefix `PB` is @@ -147,16 +156,21 @@ specified, it is prepended to all generated message classes. ## Fields -For each field in a message type, there are accessor methods to set and get the -field. So given a field `x` you can write: +For each field in a message type, the protocol buffer compiler generates a set +of accessor methods to set and get the field. The accessor methods are named +using `snake_case` field names converted to `PascalCase`. So, given a field +`field_name`, the accessor methods will be `getFieldName` and `setFieldName`. ```php -$m = new MyMessage(); -$m->setX(1); -$val = $m->getX(); - -$a = 1; -$m->setX($a); +// optional MyEnum optional_enum +$m->getOptionalEnum(); +$m->setOptionalEnum(MyEnum->FOO); +$m->hasOptionalEnum(); +$m->clearOptionalEnum(); + +// MyEnum implicit_enum +$m->getImplicitEnum(); +$m->setImplicitEnum(MyEnum->FOO); ``` Whenever you set a field, the value is type-checked against the declared type of @@ -171,9 +185,24 @@ You can see the corresponding PHP type for each scalar protocol buffers type in the [scalar value types table](/programming-guides/proto3#scalar). +### `has...` and `clear...` + +For fields with explicit presence, the compiler generates a `has...()` method. +This method returns `true` if the field is set. + +The compiler also generates a `clear...()` method. This method unsets the field. +After calling this method, `has...()` will return `false`. + +For fields with implicit presence, the compiler does not generate `has...()` or +`clear...()` methods. For these fields, you can check for presence by comparing +the field value with the default value. + ### Singular Message Fields {#embedded_message} -A field with a message type defaults to nil, and is not automatically created +For a field with a message type, the compiler generates the same accessor +methods as for scalar types. + +A field with a message type defaults to `null`, and is not automatically created when the field is accessed. Thus you need to explicitly create sub messages, as in the following: @@ -265,10 +294,10 @@ specified, it is prepended to all generated enum classes. ## Oneof -For a [oneof](/programming-guides/proto3#oneof)s, the -protocol buffer compiler generates the same code as it would for regular -singular fields, but also adds a special accessor method that lets you find out -which oneof field (if any) is set. So, given this message: +For a [oneof](/programming-guides/editions#oneof), the +protocol buffer compiler generates a `has` and `clear` method for each field in +the oneof, as well as a special accessor method that lets you find out which +oneof field (if any) is set. So, given this message: ```proto message TestMessage { @@ -293,5 +322,16 @@ class TestMessage { } ``` -The accessor method's name is based on the oneof's name, and returns an enum -value representing the field in the oneof that is currently set. +The accessor method's name is based on the oneof's name, and returns a string +representing the field in the oneof that is currently set. If the oneof is not +set, the method returns an empty string. + +When you set a field in a oneof, it automatically clears all other fields in the +oneof. If you want to set multiple fields in a oneof, you must do so in separate +statements. + +```php +$m = new TestMessage(); +$m->setOneofInt32(42); // $m->hasOneofInt32() is true +$m->setOneofInt64(123); // $m->hasOneofInt32() is now false +``` diff --git a/content/reference/protobuf/proto3-spec.md b/content/reference/protobuf/proto3-spec.md index 166ff21ab..d4ad6dd62 100644 --- a/content/reference/protobuf/proto3-spec.md +++ b/content/reference/protobuf/proto3-spec.md @@ -1,8 +1,8 @@ +++ -title = "Protocol Buffers Version 3 Language Specification" +title = "Protocol Buffers Language Specification (Proto3)" weight = 810 -linkTitle = "Version 3 Language Specification" -description = "Language specification reference for version 3 of the Protocol Buffers language (proto3)." +linkTitle = "Language Specification (Proto3)" +description = "Language specification reference for the Protocol Buffers language (Proto3)." type = "docs" +++ diff --git a/content/reference/python/python-generated.md b/content/reference/python/python-generated.md index 9574c098e..5db203377 100644 --- a/content/reference/python/python-generated.md +++ b/content/reference/python/python-generated.md @@ -7,13 +7,13 @@ type = "docs" +++ Any -differences between proto2 and proto3 generated code are highlighted - note that -these differences are in the generated code as described in this document, not -the base message classes/interfaces, which are the same in both versions. You -should read the -[proto2 language guide](/programming-guides/proto2) -and/or -[proto3 language guide](/programming-guides/proto3) +differences between proto2, proto3, and Editions generated code are +highlighted - note that these differences are in the generated code as described +in this document, not the base message classes/interfaces, which are the same +across all versions. You should read the +[proto2 language guide](/programming-guides/proto2), +[proto3 language guide](/programming-guides/proto3), +and/or [Editions guide](/programming-guides/editions) before reading this document. The Python Protocol Buffers implementation is a little different from C++ and @@ -70,8 +70,8 @@ find parts of it included in other Python code that was released before Protocol Buffers. Since version 2 of Python Protocol Buffers has a completely different interface, and since Python does not have compile-time type checking to catch mistakes, we chose to make the version number be a prominent part of generated -Python file names. Currently both proto2 and proto3 use `_pb2.py` for their -generated files. {{% /alert %}} +Python file names. Currently proto2, proto3, and Editions all use `_pb2.py` for +their generated files. {{% /alert %}} ## Packages {#package} @@ -96,7 +96,7 @@ size. If the message's name is a Python keyword, then its class will only be accessible via `getattr()`, as described in the -[*Names which conflict with Python keywords*](#keyword-conflicts) section. +[Names that conflict with Python keywords](#keyword-conflicts) section. You should *not* create your own `Foo` subclasses. Generated classes are not designed for subclassing and may lead to \"fragile base class\" problems. @@ -131,7 +131,7 @@ message Foo { In this case, the `Bar` class is declared as a static member of `Foo`, so you can refer to it as `Foo.Bar`. -## Well Known Types {#wkt} +## Well-known Types {#wkt} Protocol buffers provides a number of [well-known types](/reference/protobuf/google.protobuf) @@ -276,19 +276,25 @@ type. As well as a property, the compiler generates an integer constant for each field containing its field number. The constant name is the field name converted to -upper-case followed by `_FIELD_NUMBER`. For example, given the field `optional -int32 foo_bar = 5;`, the compiler will generate the constant -`FOO_BAR_FIELD_NUMBER = 5`. +upper-case followed by `_FIELD_NUMBER`. For example, given the field `int32 +foo_bar = 5;`, the compiler will generate the constant `FOO_BAR_FIELD_NUMBER = +5`. If the field's name is a Python keyword, then its property will only be accessible via `getattr()` and `setattr()`, as described in the -[*Names which conflict with Python keywords*](#keyword-conflicts) section. +[Names that conflict with Python keywords](#keyword-conflicts) section. -### Singular Fields (proto2) {#singular-fields-proto2} +Protocol buffers defines two modes of field presence: `explicit` and `implicit`. +Each of these is described in the following sections. -If you have a singular (optional or required) field `foo` of any non-message -type, you can manipulate the field `foo` as if it were a regular field. For -example, if `foo`'s type is `int32`, you can say: +### Singular Fields with Explicit Presence {#singular-explicit} + +Singular fields with `explicit` presence are always able to differentiate +between the field being unset and the field being set to its default value. + +If you have a singular field `foo` of any non-message type, you can manipulate +the field `foo` as if it were a regular field. For example, if `foo`'s type is +`int32`, you can say: ```python message.foo = 123 @@ -311,7 +317,22 @@ message.ClearField("foo") assert not message.HasField("foo") ``` -### Singular Fields (proto3) {#singular-fields-proto3} +In Editions, fields have `explicit` presence by default. The following is an +example of an `explicit` field in an Editions `.proto` file: + +```proto +edition = "2023"; +message MyMessage { + int32 foo = 1; +} +``` + +### Singular Fields with Implicit Presence {#singular-implicit} + +Singular fields with `implicit` presence do not have a `HasField()` method. An +`implicit` field is always "set" and reading the field will always return a +value. Reading an `implicit` field that has not been assigned a value will +return the default value for that type. If you have a singular field `foo` of any non-message type, you can manipulate the field `foo` as if it were a regular field. For example, if `foo`'s type is @@ -339,18 +360,20 @@ message.ClearField("foo") Message types work slightly differently. You cannot assign a value to an embedded message field. Instead, assigning a value to any field within the child -message implies setting the message field in the parent. You can also use the +message implies setting the message field in the parent. Submessages always have +[explicit presence](#fields-with-explicit-presence), so you can also use the parent message's `HasField()` method to check if a message type field value has been set. So, for example, let's say you have the following `.proto` definition: ```proto +edition = "2023"; message Foo { - optional Bar bar = 1; + Bar bar = 1; } message Bar { - optional int32 i = 1; + int32 i = 1; } ``` @@ -465,12 +488,13 @@ the object's `extend()` method appends an entire list of messages, but makes a For example, given this message definition: ```proto +edition = "2023"; message Foo { repeated Bar bars = 1; } message Bar { - optional int32 i = 1; - optional int32 j = 2; + int32 i = 1; + int32 j = 2; } ``` @@ -551,7 +575,9 @@ foo.bars.extend([Bar(i=15), Bar(i=17)]) ### Groups (proto2) {#groups-proto2} **Note that groups are deprecated and should not be used when creating new -message types -- use nested message types instead.** +message types -- use nested message types (proto2, proto3) or +[delimited fields](/editions/features#message_encoding) +(editions) instead.** A group combines a nested message type and a field into a single declaration, and uses a different @@ -562,7 +588,7 @@ field's name is the **lowercased** name of the group. For example, except for wire format, the following two message definitions are equivalent: -```python +```proto // Version 1: Using groups message SearchResponse { repeated group SearchResult = 1 { @@ -709,7 +735,7 @@ message Foo { VALUE_B = 5; VALUE_C = 1234; } - optional SomeEnum bar = 1; + SomeEnum bar = 1; } ``` @@ -723,9 +749,9 @@ following enum in a proto: ```proto enum SomeEnum { - VALUE_A = 0; - VALUE_B = 5; - VALUE_C = 1234; + VALUE_A = 0; + VALUE_B = 5; + VALUE_C = 1234; } ``` @@ -748,16 +774,17 @@ assert foo.bar == Foo.VALUE_A If the enum's name (or an enum value) is a Python keyword, then its object (or the enum value's property) will only be accessible via `getattr()`, as described -in the [*Names which conflict with Python keywords*](#keyword-conflicts) -section. +in the [Names that conflict with Python keywords](#keyword-conflicts) section. -The values you can set in an enum depend on your protocol buffers version: +With proto2, enums are closed, and with proto3, enums are open. In Editions, the +`enum_type` feature determines the behavior of an enum. -- In **proto2**, an enum cannot contain a numeric value other than those - defined for the enum type. If you assign a value that is not in the enum, - the generated code will throw an exception. -- **proto3** uses open enum semantics: enum fields can contain any `int32` - value. +- `OPEN` enums can have any `int32` value, even if it is not specified in the + enum definition. This is the default in Editions. +- `CLOSED` enums cannot contain a numeric value other than those defined for + the enum type. If you assign a value that is not in the enum, the generated + code will throw an exception. This is equivalent to the behavior of enums in + proto2. Enums have a number of utility methods for getting field names from values and vice versa, lists of fields, and so on - these are defined in @@ -767,9 +794,9 @@ following standalone enum in `myproto.proto`: ```proto enum SomeEnum { - VALUE_A = 0; - VALUE_B = 5; - VALUE_C = 1234; + VALUE_A = 0; + VALUE_B = 5; + VALUE_C = 1234; } ``` @@ -793,11 +820,11 @@ defined is returned. ```proto enum SomeEnum { - option allow_alias = true; - VALUE_A = 0; - VALUE_B = 5; - VALUE_C = 1234; - VALUE_B_ALIAS = 5; + option allow_alias = true; + VALUE_A = 0; + VALUE_B = 5; + VALUE_C = 1234; + VALUE_B_ALIAS = 5; } ``` @@ -855,7 +882,7 @@ assert not message.HasField("serial_number") Note that calling `ClearField` on a oneof just clears the currently set field. -## Names which conflict with Python keywords {#keyword-conflicts} +## Names that conflict with Python keywords {#keyword-conflicts} If the name of a message, field, enum, or enum value is a [Python keyword](https://docs.python.org/3/reference/lexical_analysis#keywords), @@ -894,11 +921,12 @@ baz.in # SyntaxError: invalid syntax baz.from # SyntaxError: invalid syntax ``` -## Extensions (proto2 only) {#extension} +## Extensions {#extension} -Given a message with an extension range: +Given a proto2 or editions message with an extension range: ```proto +edition = "2023"; message Foo { extensions 100 to 199; } @@ -911,7 +939,7 @@ Given an extension definition: ```proto extend Foo { - optional int32 bar = 123; + int32 bar = 123; } ``` @@ -966,7 +994,7 @@ RPC systems based on `.proto`-language service definitions should provide [plugins](/reference/cpp/api-docs/google.protobuf.compiler.plugin.pb) to generate code appropriate for the system. These plugins are likely to require that abstract services are disabled, so that they can generate their own classes -of the same names. Plugins are new in version 2.3.0 (January 2010). +of the same names. The remainder of this section describes what the protocol buffer compiler generates when abstract services are enabled. diff --git a/content/reference/rust/rust-design-decisions.md b/content/reference/rust/rust-design-decisions.md index a26ecbb7e..c59b3bb7f 100644 --- a/content/reference/rust/rust-design-decisions.md +++ b/content/reference/rust/rust-design-decisions.md @@ -166,17 +166,19 @@ than Rust's std UTF-8 validation. Rust's `str` and `std::string::String` types maintain a strict invariant that they only contain valid UTF-8, but C++ Protobuf and C++'s `std::string` type generally do not enforce any such guarantee. `string` typed Protobuf fields are -intended to only ever contain valid UTF-8, but the enforcement of this has many -holes where a `string` field may end up containing invalid UTF-8 contents at -runtime. - -To deliver on zero-cost message sharing between C++ and Rust while minimizing -costly validations or risk of undefined behavior in Rust, we chose not to using -the `str`/`String` types for `string` field getters, and introduced the types -`ProtoStr` and `ProtoString` instead which are equivalent types except they -could contain invalid UTF-8 in rare situations. Those types let the application -code choose if they wish to perform the validation on-demand to get a `&str`, or -operate on the raw bytes to avoid any validation. +intended to only ever contain valid UTF-8, and C++ Protobuf uses a correct and +highly optimized UTF8 validator. C++ Protobuf's API surface is not set up to +strictly enforce a runtime invariant that `string` fields always contain valid +UTF-8 (instead, it defers any validation to serialize or subsequent parse time). + +To enable integrating Rust into preexisting codebases that use C++ Protobuf +while minimizing unnecessary validations or risk of undefined behavior in Rust, +we chose not to use the `str`/`String` types for `string` field getters. We +introduced the types `ProtoStr` and `ProtoString` instead, which are equivalent +types, except that they may contain invalid UTF-8 in rare situations. Those +types let the application code choose if they wish to perform the validation +on-demand to observe the fields as a `Result<&str>`, or operate on the raw bytes +to avoid any runtime validation. We are aware that vocabulary types like `str` are very important to idiomatic usage, and intend to keep an eye on if this decision is the right one as usage diff --git a/eng/doc/devguide/proto/ask_proto.md b/eng/doc/devguide/proto/ask_proto.md new file mode 100644 index 000000000..ef058c4b2 --- /dev/null +++ b/eng/doc/devguide/proto/ask_proto.md @@ -0,0 +1,52 @@ +# Ask Proto + +go/ask-proto + + + +Welcome to **Ask Proto**! Get instant help with your proto questions from our +new chatbot, powered by +. + ++ To get started, enter your query into the bot's **Start typing...** field. + ([example](https://screenshot.googleplex.com/7E3HbaWA5pwFt4W.png)) + ++ Looking for some inspiration? Copy and paste a [sample prompt](#prompts). + +Tell us what you think about this experience at go/ask-x-survey. + + + + + + + +## Sample Prompts {#prompts} + +Tip: Found an interesting prompt and/or response? [Let us know](#feedback). + +```shell + +List the top 3 pitfalls when designing and implementing Protobuf messages and APIs. Keep your response crisp and concise. + +Are enums open or closed in proto2, proto3, and editions? + +What field type conversions are safe to make? + +``` + +## We're listening {#feedback} + +Found any of the following? + ++ **Ask Proto** not returning a response ++ Misleading answers (hallucinating/incorrect) ++ Accurate answers, but missing some information ++ An interesting prompt and/or response ++ Anything else + +Let us know by +[logging bugs](https://b.corp.google.com/issues/new?component=37777&template=202665&assignee=logophile@google.com) diff --git a/eng/doc/devguide/proto/footer.md b/eng/doc/devguide/proto/footer.md new file mode 100644 index 000000000..e69de29bb From 110ed99caaa18647f843e4b27adb2343bd6ee587 Mon Sep 17 00:00:00 2001 From: David Castro <65198911+Logofile@users.noreply.github.com> Date: Fri, 25 Jul 2025 16:24:50 -0400 Subject: [PATCH 2/3] Delete internal-only files A couple of files that apply only to the documentation inside Google are in the repository. This removes them. --- eng/doc/devguide/proto/footer.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) delete mode 100644 eng/doc/devguide/proto/footer.md diff --git a/eng/doc/devguide/proto/footer.md b/eng/doc/devguide/proto/footer.md deleted file mode 100644 index e69de29bb..000000000 From 0367297a55ab8beb2c7baeeb499d6db6f4c7d66f Mon Sep 17 00:00:00 2001 From: David Castro <65198911+Logofile@users.noreply.github.com> Date: Fri, 25 Jul 2025 16:25:37 -0400 Subject: [PATCH 3/3] Delete internal-only files A couple of files that apply only internally made it into the repository. This removes the files. --- eng/doc/devguide/proto/ask_proto.md | 52 ----------------------------- 1 file changed, 52 deletions(-) delete mode 100644 eng/doc/devguide/proto/ask_proto.md diff --git a/eng/doc/devguide/proto/ask_proto.md b/eng/doc/devguide/proto/ask_proto.md deleted file mode 100644 index ef058c4b2..000000000 --- a/eng/doc/devguide/proto/ask_proto.md +++ /dev/null @@ -1,52 +0,0 @@ -# Ask Proto - -go/ask-proto - - - -Welcome to **Ask Proto**! Get instant help with your proto questions from our -new chatbot, powered by -. - -+ To get started, enter your query into the bot's **Start typing...** field. - ([example](https://screenshot.googleplex.com/7E3HbaWA5pwFt4W.png)) - -+ Looking for some inspiration? Copy and paste a [sample prompt](#prompts). - -Tell us what you think about this experience at go/ask-x-survey. - - - - - - - -## Sample Prompts {#prompts} - -Tip: Found an interesting prompt and/or response? [Let us know](#feedback). - -```shell - -List the top 3 pitfalls when designing and implementing Protobuf messages and APIs. Keep your response crisp and concise. - -Are enums open or closed in proto2, proto3, and editions? - -What field type conversions are safe to make? - -``` - -## We're listening {#feedback} - -Found any of the following? - -+ **Ask Proto** not returning a response -+ Misleading answers (hallucinating/incorrect) -+ Accurate answers, but missing some information -+ An interesting prompt and/or response -+ Anything else - -Let us know by -[logging bugs](https://b.corp.google.com/issues/new?component=37777&template=202665&assignee=logophile@google.com)