DMA 2.0 Postgres collector review#569
Open
smpawar wants to merge 2 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🚀 Overview
This Pull Request delivers a comprehensive E2E audit compliance sweep to remediate stability, performance, and security gaps in the PostgreSQL metadata collector.
To ensure an extremely clean and reviewable Git history, all changes have been applied surgically, line-by-line to the original source scripts. Pristine query formats, comment layouts, and lowercase casing have been kept 100% intact, yielding a clean and easily reviewable diff.
🛠️ Key Improvements
Robust Connection Parsing: Added regex-based parsing (parse_connection_string) to support complex passwords containing enterprise characters (such as @ or /), eliminating cut-based string-splitting bugs.
Error Trapping & Fail-Fast: Enabled ON_ERROR_STOP=1 directly in the psql invocation, coupled with non-zero exit status suffix checks (_ERROR) for collection archives. Added || true checks to error grep targets to prevent script crashes when logs are clean.
OS Metrics Gating: Gated db-machine-specs.sh execution behind a strict check for COLLECT_OS_SPECS=true and non-empty vm_user_name to prevent automated pipelines from hanging.
Platform Version Routing: Gated major version parsing so that versions below PG 11 fallback cleanly to the "base" queries directory instead of crashing on missing paths.
OOM & Storage Protection (schema_objects.sql): Swapped the verbose object-by-object record listing for aggregated counts (COUNT(*)) grouped by owner, category, type, and schema. This reduces the footprint by up to 99% on systems with massive catalog footprints (e.g., >50k tables).
Lock & Catalog Protection (schema_details.sql): Stripped expensive disk relation sizing functions (pg_table_size/pg_total_relation_size) from schema detail joins to prevent production DDL lock contention.
Fast Dependency Scans (aws_extension_dependency.sql): Replaced slow decompiling regex scans with direct, indexed catalog joins on pg_depend and pg_extension, yielding an almost 28x local execution speedup (from 170 ms down to 6 ms).
Idle Relation Preservation (index_details.sql & data_types.sql): Coalesced metrics and resolved owners directly from pg_namespace so that newly created or idle indexes are no longer filtered out. Optimized type queries to scan strictly user relations.
Cluster-Wide Footprints (database_details.sql): Removed the restrictive current database filter, enabling cluster-wide databases statistics collection withTablespace and Owner left-join mapping.
New Diagnostic Metrics: Appended division-safe Heap Cache Hit Ratio, Index Cache Hit Ratio, and Index Usage Ratio to calculated_metrics.sql.
Least-Privilege Setup: Replaced the TBD placeholder in README.txt with a standard DDL role creation template for non-superuser read-only execution. Added a Performance Considerations section to guide users on high-scale footprints.
📊 local Profiling Highlight:
Local E2E execution and catalog profiling on AlloyDB Omni showed major CPU/IO savings:
aws_extension_dependency.sql (Original regex scan): 170.25 ms
aws_extension_dependency.sql (New optimized catalog-join): 6.11 ms (~28x speedup)
👥 Reviewers Requested:
Please review for E2E query accuracy and bash wrapper stability. All tests and staging validations are complete!