feat: Add registry metadata summary metrics endpoint (#5921)#5939
feat: Add registry metadata summary metrics endpoint (#5921)#5939shuvakant6623 wants to merge 6 commits intofeast-dev:masterfrom
Conversation
| entities = grpc_call( | ||
| grpc_handler.ListDataSources, | ||
| RegistryServer_pb2.ListDataSourcesRequest( | ||
| project=project_name, allow_cache=allow_cache | ||
| ), | ||
| ) |
There was a problem hiding this comment.
🔴 Wrong gRPC method called - ListDataSources instead of ListEntities
The code at lines 426-431 is labeled as "count entities" but actually calls ListDataSources instead of ListEntities. This means entities are never counted, and the entities variable actually contains data sources.
Click to expand
Code Analysis
The comment says #count entities but the code calls:
entities = grpc_call(
grpc_handler.ListDataSources, # Wrong method!
RegistryServer_pb2.ListDataSourcesRequest(...)
)Compare with the correct pattern in count_resources_for_project at metrics.py:69-74:
entities = grpc_call(
grpc_handler.ListEntities,
RegistryServer_pb2.ListEntitiesRequest(...)
)Impact
totalEntitiesin the response will always be 0 (sinceentities.get("entities", [])returns empty list from a DataSources response)- Entities are never actually fetched or counted
Was this helpful? React with 👍 or 👎 to provide feedback.
| except Exception: | ||
| features = {"features": []} |
There was a problem hiding this comment.
🔴 Exception handler sets wrong variable - features instead of saved_datasets
When the ListSavedDatasets call fails, the exception handler incorrectly sets features = {"features": []} instead of saved_datasets = {"savedDatasets": []}.
Click to expand
Code Analysis
try:
saved_datasets = grpc_call(
grpc_handler.ListSavedDatasets,
...
)
except Exception:
features = {"features": []} # Wrong! Should be saved_datasetsCompare with the correct pattern at metrics.py:88-89:
except Exception:
saved_datasets = {"savedDatasets": []}Impact
- If
ListSavedDatasetsfails,saved_datasetsremains undefined, causing aNameErrorat line 469 - The
featuresvariable is incorrectly set, which may mask a laterNameErrorfor features ifListFeaturesis never called
Was this helpful? React with 👍 or 👎 to provide feedback.
| feature_views = grpc_call( | ||
| grpc_handler.ListAllFeaturesViews, |
There was a problem hiding this comment.
🔴 Typo in gRPC method name - ListAllFeaturesViews should be ListAllFeatureViews
The method name ListAllFeaturesViews has an extra 's' - it should be ListAllFeatureViews.
Click to expand
Code Analysis
feature_views = grpc_call(
grpc_handler.ListAllFeaturesViews, # Typo: extra 's'
RegistryServer_pb2.ListAllFeatureViewsRequest(...)
)The correct method name is ListAllFeatureViews as used elsewhere in the codebase (e.g., metrics.py:101, feature_views.py:78).
Impact
This will cause an AttributeError at runtime since grpc_handler.ListAllFeaturesViews does not exist.
Was this helpful? React with 👍 or 👎 to provide feedback.
| feature_services = {"feature_services": []} | ||
|
|
||
| # Aggregate counts | ||
| total["entities"] += len(entities.get("entities", [])) |
There was a problem hiding this comment.
🔴 Typo in variable name - total instead of totals
Line 467 uses total["entities"] but the dictionary is named totals (with an 's').
Click to expand
Code Analysis
The dictionary is initialized as totals at line 409:
totals = {
"entities": 0,
...
}But line 467 references total (without 's'):
total["entities"] += len(entities.get("entities", []))Impact
This will cause a NameError: name 'total' is not defined at runtime.
Was this helpful? React with 👍 or 👎 to provide feedback.
| totals["dataSources"] += len(dataSources.get("dataSources", [])) | ||
| totals["savedDatasets"] += len(savedDatasets.get("savedDatsets", [])) | ||
| totals["features"] += len(features.get("features", [])) | ||
| totals["featureViews"] += len(featureViews.get("featureViews", [])) | ||
| totals["featureServices"] += len(featureServices.get("featureServices", [])) |
There was a problem hiding this comment.
🔴 Multiple undefined variables in aggregate counts section
Lines 468-472 reference undefined variables with incorrect casing: dataSources, savedDatasets, features, featureViews, featureServices instead of the actual variable names entities (which contains data sources), saved_datasets, features, feature_views, feature_services.
Click to expand
Code Analysis
The variables are defined with snake_case:
entities = grpc_call(grpc_handler.ListDataSources, ...) # Actually data sources
saved_datasets = grpc_call(grpc_handler.ListSavedDatasets, ...)
feature_views = grpc_call(grpc_handler.ListAllFeaturesViews, ...)
feature_services = grpc_call(grpc_handler.ListFeatureServices, ...)But the aggregation uses camelCase which are undefined:
totals["dataSources"] += len(dataSources.get("dataSources", [])) # dataSources undefined
totals["savedDatasets"] += len(savedDatasets.get("savedDatsets", [])) # savedDatasets undefined, also typo in key
totals["features"] += len(features.get("features", [])) # features may be undefined
totals["featureViews"] += len(featureViews.get("featureViews", [])) # featureViews undefined
totals["featureServices"] += len(featureServices.get("featureServices", [])) # featureServices undefinedImpact
This will cause NameError exceptions at runtime for each undefined variable.
Was this helpful? React with 👍 or 👎 to provide feedback.
| # Aggregate counts | ||
| total["entities"] += len(entities.get("entities", [])) | ||
| totals["dataSources"] += len(dataSources.get("dataSources", [])) | ||
| totals["savedDatasets"] += len(savedDatasets.get("savedDatsets", [])) |
There was a problem hiding this comment.
🔴 Typo in dictionary key - savedDatsets instead of savedDatasets
Line 469 has a typo in the dictionary key: savedDatsets is missing an 'a' and should be savedDatasets.
Click to expand
Code Analysis
totals["savedDatasets"] += len(savedDatasets.get("savedDatsets", [])) # Typo: savedDatsetsThe correct key based on the gRPC response format is savedDatasets (as seen in metrics.py:120).
Impact
Even if the variable name issue is fixed, this would always return 0 for saved datasets because the key doesn't match the actual response key.
Was this helpful? React with 👍 or 👎 to provide feedback.
| if ts: | ||
| if last_updates_ts is None or ts > last_updated_ts: | ||
| last_updated_ts |
There was a problem hiding this comment.
🔴 Timestamp tracking logic is broken - inconsistent variable names and missing assignment
The timestamp tracking code has multiple issues: inconsistent variable naming (last__updates_ts vs last_updates_ts vs last_updated_ts) and a missing assignment statement.
Click to expand
Code Analysis
- Line 418 initializes
last__updates_ts = None(double underscore) - Line 480 references
last_updates_ts(single underscore, different name) - Line 480 also references
last_updated_ts(yet another variation) - Line 481 is just
last_updated_ts- a bare expression that does nothing (should belast_updated_ts = ts) - Line 491 returns
last_updated_tswhich is undefined
last__updates_ts = None # Line 418: initialized with double underscore
...
if ts:
if last_updates_ts is None or ts > last_updated_ts: # Line 480: wrong variable names
last_updated_ts # Line 481: does nothing, should be: last_updated_ts = ts
...
"lastUpdatedTImestamp": last_updated_ts, # Line 491: undefined variableImpact
NameErrorat runtime when accessing undefinedlast_updates_tsorlast_updated_ts- Even if variable names were consistent, the timestamp would never be updated due to the missing assignment
Was this helpful? React with 👍 or 👎 to provide feedback.
| total["entities"] += len(entities.get("entities", [])) | ||
| totals["dataSources"] += len(dataSources.get("dataSources", [])) | ||
| totals["savedDatasets"] += len(savedDatasets.get("savedDatsets", [])) | ||
| totals["features"] += len(features.get("features", [])) |
There was a problem hiding this comment.
🔴 Missing ListFeatures call - features are never counted
The metrics_summary function never calls ListFeatures to count features, unlike the existing count_resources_for_project function.
Click to expand
Code Analysis
The existing count_resources_for_project function at metrics.py:91-98 calls ListFeatures:
try:
features = grpc_call(
grpc_handler.ListFeatures,
RegistryServer_pb2.ListFeaturesRequest(
project=project_name, allow_cache=allow_cache
),
)
except Exception:
features = {"features": []}The new metrics_summary function has no such call. The features variable is only set in the exception handler for ListSavedDatasets (which is itself a bug).
Impact
totalFeatureswill always be 0 or cause aNameErrorifListSavedDatasetssucceeds- Features are never actually fetched or counted
Was this helpful? React with 👍 or 👎 to provide feedback.
0f29942 to
39ffafe
Compare
Signed-off-by: Shuvakant Patra <scientefic2612@gmail.com>
39ffafe to
af10b33
Compare
| if last_updated_ts is None or ts > last_updated_ts: | ||
| last_updated_ts = ts | ||
|
|
||
| return { |
There was a problem hiding this comment.
This is very similar to "/metrics/resource_counts" endpoint, I think we can extend existing endpoint to include totalProjects and lastUpdatedTimestamp
There was a problem hiding this comment.
Thanks for the feedback!. I’ll update the existing /metrics/resource_counts endpoint to include totalProjects and lastUpdatedTimestamp and remove the separate summary endpoint.
Signed-off-by: Shuvakant Patra <scientefic2612@gmail.com>
Signed-off-by: Shuvakant Patra <scientefic2612@gmail.com>
What this PR does / why we need it:
This PR extends the existing Metrics API to provide registry-level metadata summary statistics.
It adds a new
/api/v1/metrics/summaryendpoint that aggregates resource counts across all projects and exposes high-level registry insights.This helps users quickly understand the overall state of their registry without querying multiple endpoints.
Which issue(s) this PR fixes:
Fixes #5921
Misc