fix: config environment override improvements#27312
fix: config environment override improvements#27312gwossum wants to merge 62 commits intomaster-1.xfrom
Conversation
| // Apply the unindexed environment variable as a default value, if available. | ||
| if len(value) > 0 { | ||
| if err := applyEnvOverrides(getenv, prefix, f, structKey); err != nil { | ||
| return err | ||
| } | ||
| } |
There was a problem hiding this comment.
This behavior of using the unindexed environment value as a default for existing slice values is very strange, but it is what OSS and plutonium have both been doing. It's not intuitive and confusing. Claude doesn't like it either.
Is this intentional or a used feature? Or could this be improved?
There was a problem hiding this comment.
There is a test that explicitly tests for this, so it does seem intentional. Whether it is useful or problematic is another issue.
There was a problem hiding this comment.
One other very odd thing this does: Let's say I have a have:
type Config struct {
values []string `toml:"values"`
}If I start out with:
Config{
values: []string{"one", "two", "three"},
}and I have an environment variable INFLUXDB_VALUES="alice,bob,carol", then you end up with:
Config{
values: []string{"alice,bob,carol", "alice,bob,carol", "alice,bob,carol"
}If, however, you start out with:
Config{
values: []string{},
}and an environment variable INFLUXDB_VALUES="alice,bob,carol", then you end up with:
Config{
values: []string{"alice", "bob", "carol"}
}This is highly inconsistent behavior and very non-intuitive!
I think the reason the default is supported is for configuring common options across a slice of configurations, for instance UDP buffer sizes across all UDP inputs. It makes sense when you have a slice of structs, but behaves weirdly when you have a slice of simple types like strings. Maybe we handle the two cases differently?
There was a problem hiding this comment.
Does this work the same in Plutonium with a separate implementation, or does Plutonium rely on this implementation, or does it behave differently?
There was a problem hiding this comment.
So... Plutonium has the same behavior WRT to using the unindexed version of a variable to override a value for all elements in slice. For instance, INFLUXDB_UDP_DATABASE is a default for overriding each udp input, while INFLUXDB_UDP_1_DATABASE is a specific override for the second UDP input.
However, plutonium does not support a CSV list to set an array at all, so the behavior in https://github.com/influxdata/influxdb/pull/27312/changes#r3018901388 does not apply to plutonium.
Since the CSV setting would only work with slices of primitive types (e.g. []string, []int), I thought maybe there was no way in the actual config to use it. I did find configurations in both OSS and plutonium where it could be used, but of course only OSS currently supports using CSV to set them. Previously, using a CSV list was probably the only only way to override them, because growing a slice with indexed environment variables was not supported.
There was a problem hiding this comment.
The slice handling is greatly improved in the latest commit. Unindexed env vars with CSV data will now unconditionally overwrite slices of leaf types (not struct types). It is also now an error to use indexed and unindexed env vars with slices of leaf types, since that leads to surprising and unintuitive behavior.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
davidby-influx
left a comment
There was a problem hiding this comment.
I would wrap all errors returned as strings, but I'm open to dissent if I'm missing something.
I need to review the tests more.
| intValue, err := strconv.ParseInt(value, 0, element.Type().Bits()) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to apply %v to %v using type %v and value '%v': %s", prefix, structKey, element.Type().String(), value, err) | ||
| return false, fmt.Errorf("failed to apply %v to %v using type %v and value '%v': %s", prefix, structKey, element.Type().String(), value, err) |
There was a problem hiding this comment.
Wrap error here too
This comment was marked as outdated.
This comment was marked as outdated.
davidby-influx
left a comment
There was a problem hiding this comment.
Some []byte handling suggestionss
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if s.config.ReadBuffer != 0 { | ||
| err = s.conn.SetReadBuffer(s.config.ReadBuffer) | ||
| rbs := int(s.config.ReadBuffer) | ||
| err = s.conn.SetReadBuffer(rbs) | ||
| if err != nil { | ||
| s.Logger.Info("Failed to set UDP read buffer", | ||
| zap.Int("buffer_size", s.config.ReadBuffer), zap.Error(err)) | ||
| zap.Int("buffer_size", rbs), zap.Error(err)) | ||
| return err |
There was a problem hiding this comment.
ReadBuffer is now a signed 64-bit size (toml.SSize) but is cast directly to int before calling SetReadBuffer. On 32-bit platforms or for large values this can overflow/truncate, and negative values (e.g. "-1") will attempt to set a negative buffer size and likely fail service startup. Consider validating ReadBuffer > 0 and ReadBuffer <= math.MaxInt before casting, and skipping/returning a clear error otherwise.
| if s.Config.ReadBuffer != 0 { | ||
| err = conn.SetReadBuffer(s.Config.ReadBuffer) | ||
| rbs := int(s.Config.ReadBuffer) | ||
| err = conn.SetReadBuffer(rbs) | ||
| if err != nil { | ||
| return fmt.Errorf("unable to set UDP read buffer to %d: %s", | ||
| s.Config.ReadBuffer, err) | ||
| return fmt.Errorf("unable to set UDP read buffer to %d: %s", rbs, err) | ||
| } |
There was a problem hiding this comment.
ReadBuffer is now toml.SSize but is cast directly to int before calling SetReadBuffer. Large values can overflow/truncate on 32-bit platforms, and negative values will try to set a negative buffer size and likely fail startup. Consider requiring ReadBuffer > 0 and ReadBuffer <= math.MaxInt (or returning a clear error) before casting.
| memoryLimit := int64(0) | ||
| if s.conf.TotalBufferBytes != 0 { | ||
| memoryLimit = int64(s.conf.TotalBufferBytes / len(s.subs)) | ||
| memoryLimit = int64(s.conf.TotalBufferBytes) / int64(len(s.subs)) | ||
| if memoryLimit == 0 { | ||
| memoryLimit = 1 | ||
| } |
There was a problem hiding this comment.
TotalBufferBytes is now toml.SSize, but the logic treats any non-zero (including negative) value as enabling the per-subscription memory limit. A negative TotalBufferBytes will produce a negative memoryLimit and get passed to router.Update. Consider changing the check to > 0 (and/or validating TotalBufferBytes is non-negative) before dividing.
| // Verify correct when result when there is a false slice grow | ||
| // from a default enviroment variable, but no realy one from an | ||
| // indexed environemtn variable. |
There was a problem hiding this comment.
Spelling/grammar in the test comment: "enviroment", "realy", and "environemtn" are typos and the sentences are hard to parse. Please correct the comment text so future readers can understand the intent quickly.
| // Verify correct when result when there is a false slice grow | |
| // from a default enviroment variable, but no realy one from an | |
| // indexed environemtn variable. | |
| // Verify the result is correct when a default environment variable | |
| // suggests slice growth, but there is no indexed environment variable | |
| // to create an actual slice element. |
| "%s: slice element type %s must implement toml.Defaulter (add ApplyDefaults() method on *%s)", | ||
| path, elem, elem)) |
There was a problem hiding this comment.
VerifyDefaulters’ violation message always suggests adding ApplyDefaults() on *<elem>, but for pointer-element slices (e.g. []*T) that renders as **T and is incorrect/misleading. Consider formatting the suggestion based on the underlying struct type (e.g. suggest *T whether the slice is []T or []*T).
| "%s: slice element type %s must implement toml.Defaulter (add ApplyDefaults() method on *%s)", | |
| path, elem, elem)) | |
| "%s: slice element type %s must implement toml.Defaulter (add ApplyDefaults() method on %s)", | |
| path, elem, ptrType)) |
davidby-influx
left a comment
There was a problem hiding this comment.
The AI suggestions on casting checks, and a few nits and questions from me.
Continued nice work!
| log.Info("applied env var overrides", | ||
| zap.String("prefix", EnvVarPrefix), | ||
| zap.Strings("vars", appliedEnvVars)) | ||
| log.Info("unmatched env vars in namespace; may be misspelled, unsupported by env var overrides, or removed in this version", |
There was a problem hiding this comment.
Info or Warn or doesn't matter? Very minor, not a blocker.
| // | ||
| // Both lists are emitted unconditionally (even when empty) so operators can | ||
| // confirm the diagnostic ran. The unmatched list is computed against | ||
| // os.Environ() so it reflects the real process environment regardless of |
There was a problem hiding this comment.
Is that what we want? Why not respect the local GetEnv if there is one? It would seem to make testing easier to do so.
There was a problem hiding this comment.
We can't use cmd.Getenv because we need all environment variables, but there could be symmetrical cmd.Environ. I'll make the change.
|
|
||
| if s.Config.ReadBuffer != 0 { | ||
| err = conn.SetReadBuffer(s.Config.ReadBuffer) | ||
| rbs := int(s.Config.ReadBuffer) |
There was a problem hiding this comment.
We have to make the conversion because SetReadBuffer takes an int, while the configuration is a toml.SSize (uint64). I've added range checking in an upcoming commit. The alternative is to either create a toml.ISize type, or not allow size suffixes for the configuration.
| return nil, 0, fmt.Errorf("%w: %c (expected k, m, or g)", ErrSizeBadSuffix, suffix) | ||
| } | ||
| numText = text[:len(text)-1] | ||
| numText = text[:len(text)-suffixSize] |
| seen := make(map[string]struct{}) | ||
| var unmatched []string | ||
| for _, entry := range environ { | ||
| eq := strings.IndexByte(entry, '=') |
There was a problem hiding this comment.
I feel like all the string handling should be runes, but I can't see how to do it efficiently with the runes package. So I guess this is just a lament, not a requested change.
There was a problem hiding this comment.
For low-ASCII characters, strings.IndexByte is safe. However, I've changed it to strings.Cut in an upcoming push.
| var violations []string | ||
| seen := make(map[reflect.Type]bool) | ||
| var walk func(t reflect.Type, path string) | ||
| walk = func(t reflect.Type, path string) { |
There was a problem hiding this comment.
Would this be cleaner as a separate function, not a lambda? Not a blocker.
There was a problem hiding this comment.
taking in violations and returning it....
There was a problem hiding this comment.
Extracted and simplified. Also leveraged the cycle breaking code to be recursive type checking code, since recursive types (e.g. linked lists) can not be expressed in toml and therefore are not valid for config data types. Will be in next push.
| // rather than the last byte, so the error message shows the actual | ||
| // character instead of a UTF-8 continuation byte. | ||
| {"bad_suffix_multibyte", "1√", itoml.ErrSizeBadSuffix, | ||
| "unknown size suffix: √ (expected k, m, or g)"}, |
- Environment overrides with group names and group IDs now work correctly. - Slices can be extended using environment variables. Previously they could only overwrite existing elements. - Environment variables properly override non-string slices in all cases. - Prevent possible nil dereference. - Better errors for size parsing errors. - Improved error messages when a TextUnmarshaler fails. - Trim whitespace from environment variable values. - Greatly improved test coverage.
Prevent unbounded slice growth via environment variables. Also add additional test cases.
- Apply default environment overrides using the unindexed environment variable when growing a slice. - Refactor to remove code duplication and remove hasEnvForType.
Remove unnecessary hasEnvValue check before applying environment overrides to struct fields.
- Add diagnostic log messages when configuration is loaded at startup or
reloaded on SIGHUP:
- List of environment variables used for overrides
- List of environment variables which have the appropriate prefix but
were not used for some reason
Make the toml and environment variable parsing explicitly rune-safe. This fixes an issue where the error message when a size suffix is not low ASCII has the incorrect character in it. Other places are made explicitly rune-safe for maintainability, but do not fix any issue.
Add comments and test cases for parsing of hex, octal, and binary environment variable parsing.
Use %q in error messages and logs instead of %v and %s for TOML and environment variable parsing.
Trim whitespace from group names and numeric IDs when parsing TOML or environment variables. Add related tests.
Improve error message for negative toml.Size. Adjust tests as necessary.
Add test specifically for `[data] max-index-log-file-size` for #22974. Modernize TestConfig_Parse_EnvOverride.
Update Config structs to make use of toml.SSize where appropriate. Updated fields: - [collectd] read-buffer - [data] tar-stream-buffer-size - [flux-controller] query-initial-memory-bytes - [flux-controller] query-max-memory-bytes - [flux-controller] total-max-memory-bytes - [http] max-body-size - [subscriber] total-buffer-bytes - [udp] read-buffer
- Prevent unsafe type conversions of toml.Size and toml.SSize by introducing methods that check ranges when doing type conversions. - Make use of these methods either at time of use or configuration validation. - Change "%s" to "%w" in some `fmt.Errorf` calls
Add `Command.Environ` for symmetry with `Command.Getenv`.
Simplify finding environment variable names from `os.Environ` style list with `strings.Cut`.
- Change toml.VerifyDefaulters to toml.VerifyConfigType - Add checks to toml.VerifyConfigType to detect recursive data types, which can not be expressed with toml. - Break walk lambda out of VerifyConfigType into a top-level function, walkForConfigType. - Make use of package level reflect values
- Fix typo in error message - Update tests for updated error message
Add comment explaining why slice growth through environment variables is bounded.
ApplyEnvOverrides will now return an error if a slice element type that needs ApplyDefaults() does not implement it. This error should not occur in production due to unit tests verifying that config types implement ApplyDefaults() as needed, in addition to compile-time type constraint checks. Update tests as needed.
Improved consistency of unindexed environment variable override handling. Previously, unindexed environment variables could set slices using comma-delimited value lists, but only if the underlying slice was empty. This was unintuitive and potentially impossible for the user to know if the value was already set or not. Now the unindxed environment variables overwrite the existing slice with the comma-delimited values. Additionally, now only slices of leaf types (not structs) support setting via an unindexed comma-delimited value list. This did not make sense and could not set structs, so the change should not be visible to users. Finally, it is now an error to use indexed and unindexed environment variables for slices of leaf types. The previous behavior was surprising and unintuitive. It is highly unlikely users would using this as a feature, but it should be documented for users that do encounter it.
9e72fd5 to
19d259e
Compare
Fix issue with growing a slice of pointers to a scalar using unindexed CSV environment variables.
Closes: #22974