Skip to content

Comments

Improvements for bash associative arrays#1797

Open
IsaacCalligeros95 wants to merge 9 commits intomainfrom
isaac/bash-associative-arrays-improvements
Open

Improvements for bash associative arrays#1797
IsaacCalligeros95 wants to merge 9 commits intomainfrom
isaac/bash-associative-arrays-improvements

Conversation

@IsaacCalligeros95
Copy link
Collaborator

@IsaacCalligeros95 IsaacCalligeros95 commented Feb 23, 2026

Optimize octopus_parameters population with lazy loading and batch base64 decode

Problem

The BashParametersArrayFeatureToggle implementation had multiple issues:

  1. Hard dependency on xxd — not in the documented dependency list, not universally available, and the feature silently did nothing when missing
  2. SIGPIPE riskecho -n "$large_hex_string" | xxd -r -p runs as a pipe-backed process substitution; if the read -N loop exits early, xxd receives SIGPIPE and the bootstrap script can exit non-zero
  3. Always-on overheadoctopus_parameters was populated eagerly even when scripts never accessed it, adding ~6-7 seconds of parse time for large deployments (20,000+ variables)

An earlier attempt at using base64 was abandoned because it called openssl (or base64) once per variable — O(N) subprocess forks — which made ~3,000 variables take ~60 seconds.

Solution

1. Lazy Loading (Performance)

The bootstrapper now analyzes the user script to detect if it references octopus_parameters:

  • If used: the array is populated eagerly (existing behavior)
  • If not used: population is skipped entirely, avoiding the parse overhead

The check uses regex to match:

  • Direct array access: ${octopus_parameters[...]}, "${octopus_parameters[@]}"
  • Loop iteration: for ... in "${!octopus_parameters[@]}"

Scripts that only use get_octopusvariable (which uses the case statement, not the array) skip the expensive array population. This saves ~6-7 seconds for deployments with 20,000+ variables.

Configuration file impact:

  • Scripts using octopus_parameters: ~21 MB (includes both KVP data + case statement)
  • Scripts not using it: ~21 MB (case statement only, KVP data markers remain unreplaced)

The KVP data is only included in the encrypted blob when the script actually uses octopus_parameters, significantly reducing the configuration file size for scripts that don't need it.

2. Batch Base64 Decode (Compatibility + Performance)

Switch the wire format inside the encrypted blob from hex(key)$hex(value) to base64(key)$base64(value), and batch-decode using exactly two base64 -d calls regardless of variable count:

  1. Collect all base64-encoded keys and values into bash arrays (pure builtins, no subprocesses)
  2. Decode all keys in one pass: exec 3< <(printf '%s\n' "${b64_keys[@]}" | base64 -d)
  3. Decode all values in one pass: exec 4< <(printf '%s\n' "${b64_values[@]}" | base64 -d)
  4. Use LC_ALL=C read -r -N to slice from each fd using pre-calculated byte lengths

This preserves the O(1) subprocess count of the original hex+xxd approach. Process substitutions are used rather than temp files to keep decoded sensitive values (passwords, keys, certs) out of the filesystem entirely — data flows through kernel pipe buffers only. Deadlock is avoided because the read loop alternates between fd3 and fd4 on every iteration, draining both pipes continuously. LC_ALL=C on the read calls ensures multi-byte UTF-8 (including emoji) round-trips correctly by counting bytes rather than characters.

The C# change is minimal: GetEncryptedVariablesKvp uses the existing EncodeValue method (already used for the case-statement path) instead of EncodeAsHex. The xxd availability check in the feature gate is removed.

Performance Impact

20,000 variables:

  • Script using octopus_parameters: ~7.5 seconds (parse required)
  • Script not using octopus_parameters: ~1.7 seconds (lazy loading skips parse)
  • Time saved: ~5.8 seconds (77% reduction)

Note: For much larger variable sets (20K+ variables / ~15 MB), the runtime of 33 seconds remains too slow for this to be feasible for all deployments. See test results for further details. The lazy loading optimization ensures this overhead is only paid when octopus_parameters is actually used.

Testing

Added BashPerformanceFixture.cs (in previous commits) with two comprehensive tests:

  1. ShouldPopulateOctopusParametersPerformantly — validates parse performance at various scales (100, 500, 1K, 5K, 20K variables) with realistic payload distributions
  2. ShouldNotLoadOctopusParametersWhenNotUsed — verifies lazy loading by checking configuration file markers remain unreplaced and timing confirms array isn't populated

Test variables mirror real Octopus deployments:

  • 60% small (project names, ports)
  • 25% medium (connection strings)
  • 10% large (JSON config blobs)
  • 5% huge (PEM certificate bundles)

Risks

Process substitution approach (current)

  • Decode error visibility: base64 -d runs in a subshell inside <(...);
    its exit code isn't directly observable, making malformed input harder to detect.
  • SIGPIPE on early exit: If the read loop exits before consuming all bytes the
    producer gets SIGPIPE, which the shell handles silently and could mask a bug.

Mitigations

  • Lazy loading reduces exposure — array parsing only happens when actually needed
  • Performance tests validate correctness at scale
  • Existing case-statement fallback (get_octopusvariable) remains unaffected

Changes

  • Bootstrap.sh — new decrypt_and_parse_variables function, removes xxd gate, adds _ensure_octopus_parameters_loaded with lazy loading logic
  • BashScriptBootstrapper.csGetEncryptedVariablesKvp switches from EncodeAsHex to EncodeValue; ScriptUsesOctopusParameters analyzes script for array usage; markers only replaced when needed
  • BashPerformanceFixture.cs — new dedicated fixture with comprehensive performance tests using realistic variable distributions (60% small / 25% medium / 10% large JSON blobs / 5% PEM cert bundles)
  • BashFixture.cs — existing functional tests preserved;

Performance Test Results - from previous commits.

**Bash**
── Payload ─────────────────────────────────────────────────────
  Variables  : 20,001
  Keys       :  avg      40 B  │  min    12 B  │  max     145 B  │  total   787.4 KB
  Values     :  avg     711 B  │  min     2 B  │  max   11501 B  │  total 13886.8 KB
  Pairs      :  avg     751 B  │                              │  **total 14674.3 KB** - Size difference due to variables being included in the bootstrap script twice. This only happens when "octopus_parameters[" is included in the executing script.
── Timing ──────────────────────────────────────────────────────
  Total      :    7308 ms  (limit 300 s)
  Per var    :      0.37 ms
  Throughput :  2008.0 KB/s
────────────────────────────────────────────────────────────────

Note the bash total size is larger since we store both the variables in both the switch statement and associative array.

Should Not load Octopus Parameters when not used in script

Variables:       20,000
Config file size: 21258.6 KB
Without array:   1687 ms
With array:      7352 ms
Time saved:      5665 ms



**PowerShell equivilent**
── Payload ─────────────────────────────────────────────────────
  Variables  : 20,002
  Keys       :  avg      37 B  │  min    12 B  │  max      99 B  │  total   713.9 KB
  Values     :  avg     411 B  │  min     2 B  │  max    6283 B  │  total  8021.3 KB
  Pairs      :  avg     447 B  │                              │  total  8735.2 KB
── Timing ──────────────────────────────────────────────────────
  Total      :   11538 ms  (limit 600 s)
  Per var    :      0.58 ms
  Throughput :   757.1 KB/s
────────────────────────────────────────────────────────────────

Original changes context

Support enumerating variable in bash scripts. This change is based off this [2018 change](https://github.com/OctopusDeploy/Calamari/pull/337/files#diff-6cf26d1135d46ee2458052461a45a0843e0c0e86224290de4512618d594e781e) and will add support for accessing variables in bash scripts as associative arrays. In Powershell this is supported by the OctopusParameters property, in bash this will be backed by octopus_parameters. Details on iterating associative arrays are [available here](https://phoenixnap.com/kb/bash-associative-array). As an example this can be used with

The implementation for this is based off of the PowerShell implementation of OctopusParameters Decrypt-Variables.

For PowerShell we do the following in C#

Base 64 encode the variable name and value with a $ delimiter between them. (This is to make sure all special characters are removed)
Encrypt the combined string of all variables (We can a base string and IV)
Base64 encode the base string and convert the IV to hex.
These are then string replaced in the PowerShell bootstrap.ps1 script.

The PowerShell bootstrap script does the following:

Decodes the IV and Hex
Decrypts the whole string
Decodes one by one the variable name and values
This works well in Powershell ~3000 variables takes about 1 second in this process.
In base we use Openssl to decode base64 strings, this spins up a process and overall repeating this same process for ~3000 variables takes ~60 seconds.

To work around this I've settled on Hex encoding the variable names, unlike base64 hex encoded strings can be concatenated and decrypted all at once, this also doesn't depend on the openssl process being invoked. In this version I've settled on xxd, and if it's not available we do not populate the octopus_parameters variable. We can handle this ourselves but the implementation is messy and I did run into some issues with encoding/decoding emojis.

@gitguardian
Copy link

gitguardian bot commented Feb 23, 2026

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@IsaacCalligeros95 IsaacCalligeros95 changed the title Spike - Improvements for bash associative arrays Improvements for bash associative arrays Feb 24, 2026
@IsaacCalligeros95 IsaacCalligeros95 marked this pull request as ready for review February 24, 2026 01:16
@IsaacCalligeros95
Copy link
Collaborator Author

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them. While these secrets were previously flagged, we no longer have a reference to the specific commits where they were detected. Once a secret has been leaked into a git repository, you should consider it compromised, even if it was deleted immediately. Find here more information about risks.

🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

False positive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant