Skip to content

[SAP] modify select_datastore_by_name to use a cache#323

Open
hemna wants to merge 1 commit intostable/2025.1-m3from
sap/datastore-select-by-name-cache
Open

[SAP] modify select_datastore_by_name to use a cache#323
hemna wants to merge 1 commit intostable/2025.1-m3from
sap/datastore-select-by-name-cache

Conversation

@hemna
Copy link
Copy Markdown

@hemna hemna commented Apr 1, 2026

Optimize the select_datastore_by_name method to significantly reduce
vCenter API load during high-volume operations like boot-from-volume.

Problem:
The original implementation called _get_datastores() which fetches ALL
datastores with their 'host' and 'summary' properties. With ~40 datastores
and ~50 hosts each, this transferred ~500KB-2MB of data per call. Under
high load (e.g., 96 volume creates in 5 minutes), this caused vCenter
connection pool exhaustion and cascading timeouts averaging 530 seconds
per _select_ds_for_volume call.

Solution:

  1. Add _get_datastore_by_name() method that fetches properties for only
    the specific datastore needed, reducing data transfer to ~10-20KB.

  2. Add a 5-minute TTL cache for datastore name -> moref mappings in
    get_ds_ref_by_name(). Since volume creates are bursty and typically
    target the same datastores, this eliminates repeated vCenter queries
    for the lightweight name lookup on cache hits.

  3. The host availability check still queries vCenter on every call to
    ensure we always have fresh data about which hosts are connected
    and not in maintenance mode.

Performance impact:

  • First call: 1 lightweight query (names only) + 1 targeted query
  • Subsequent calls (within 5 min): 1 targeted query only (cache hit)
  • Original: 1 heavy query fetching all datastores with all host mounts

Backport of #322 from stable/2023.1-m3.

Change-Id: If9d46fe833418b67393535384af075a95f2ca4cb

Optimize the select_datastore_by_name method to significantly reduce
vCenter API load during high-volume operations like boot-from-volume.

Problem:
The original implementation called _get_datastores() which fetches ALL
datastores with their 'host' and 'summary' properties. With ~40 datastores
and ~50 hosts each, this transferred ~500KB-2MB of data per call. Under
high load (e.g., 96 volume creates in 5 minutes), this caused vCenter
connection pool exhaustion and cascading timeouts averaging 530 seconds
per _select_ds_for_volume call.

Solution:
1. Add _get_datastore_by_name() method that fetches properties for only
   the specific datastore needed, reducing data transfer to ~10-20KB.

2. Add a 5-minute TTL cache for datastore name -> moref mappings in
   get_ds_ref_by_name(). Since volume creates are bursty and typically
   target the same datastores, this eliminates repeated vCenter queries
   for the lightweight name lookup on cache hits.

3. The host availability check still queries vCenter on every call to
   ensure we always have fresh data about which hosts are connected
   and not in maintenance mode.

Performance impact:
- First call: 1 lightweight query (names only) + 1 targeted query
- Subsequent calls (within 5 min): 1 targeted query only (cache hit)
- Original: 1 heavy query fetching all datastores with all host mounts

Change-Id: If9d46fe833418b67393535384af075a95f2ca4cb
self._ds_regex = ds_regex
self._profile_id_cache = {}
self._ds_name_cache = {} # {name: (moref, timestamp)}
self._ds_cache_ttl = 300 # 5 minutes
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i get that this is a hotfix, but this should be a configurable. at least a #TODO(): make configurable.

for obj_content in objects:
props = self._get_object_properties(obj_content)
if props['name'] == name:
# Cache the result
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment not needed, it's right there in the _ds_name_cache attr name.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the other comments, "check cache", "delete from cache".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants