Skip to content

Latest commit

 

History

History
875 lines (646 loc) · 26.8 KB

File metadata and controls

875 lines (646 loc) · 26.8 KB

GCP Security Intelligence Platform

Version 1.1.0 | Production Ready ✅ | Performance Optimized 🚀

A comprehensive security monitoring and analysis platform for Google Cloud Platform, featuring an ADK-powered AI agent with BigQuery integration and multiple user interfaces.

🎯 Overview

The GCP Security Intelligence Platform provides a unified AI agent that queries BigQuery security data through natural language. It supports multiple interfaces (ADK Backend, Chainlit UI, MCP Server) and includes modular Cloud Functions for automated data collection.

Key Features

  • 🤖 AI-Powered Security Analysis - Natural language queries to BigQuery security data
  • 📊 BigQuery Native - Centralized data platform with real-time analysis
  • 🔌 Multiple Interfaces - ADK Backend, Chainlit UI, MCP Server
  • ☁️ Modular Cloud Functions - Deploy only what you need
  • 📚 Documentation Sync - Confluence → BigQuery integration
  • 🔒 Security Tools - 53 comprehensive security and operations tools
  • Performance Caching - Intelligent query caching for 3-10x faster responses

📦 Deployment Options

⚡ Cloud Deployment (Recommended for Production)

Deploy complete infrastructure to Google Cloud in 3 steps:

👉 DEPLOYMENT_QUICK_START.md - 15-minute deployment guide

# 1. Check prerequisites
./scripts/preflight_check.sh

# 2. Deploy everything (Cloud Run Job + Workflows + VPC)
./scripts/bootstrap_backfill.sh

What you get:

  • ✅ Cloud Run Job for data collection
  • ✅ Workflows for orchestration
  • ✅ VPC networking (optional, private)
  • ✅ Authenticated access only (org policy compliant)
  • ✅ Automated IAM setup
  • ✅ Self-healing with automatic restarts

Time: 15 minutes | Cost: ~$50-100/month


💻 Local Development

See Quick Start below for running locally

🚀 Quick Start

🗺️ Not Sure Where to Start?

Use our decision tree to find the right setup path for your needs:

👉 docs/SETUP_DECISION_TREE.md - Visual guide to choosing your setup path

I want to... Guide Time
Try it locally first GETTING_STARTED.md 5 min
Use Docker DOCKER_QUICKSTART.md 10 min
Deploy to production DEPLOYMENT_QUICK_START.md 15 min

⚡ Fastest Way to Get Running

Want to be up and running in 5 minutes?

👉 QUICKSTART.md - One-command setup guide

# Automated setup (recommended)
./scripts/quickstart.sh

🆕 First Time Setup (Detailed)

Need step-by-step instructions? Follow our comprehensive setup guide:

👉 GETTING_STARTED.md - Complete walkthrough for first-time users

This guide covers:

  • ✅ Creating GCP service account with proper permissions
  • ✅ Enabling required APIs (BigQuery, Vertex AI)
  • ✅ Setting up BigQuery dataset and tables
  • ✅ Loading sample security data
  • ✅ Verifying everything works
  • ✅ Troubleshooting common issues

Estimated time: 5-10 minutes

New Project? Use Bootstrap!

For brand new Google Cloud projects, use our one-command bootstrap:

# Complete setup from zero to deployed
./scripts/bootstrap_cloud_functions.sh YOUR_PROJECT_ID us-central1

This will:

  • ✅ Enable all required GCP APIs
  • ✅ Create service account with proper permissions
  • ✅ Set up BigQuery datasets and tables
  • ✅ Deploy Cloud Functions (with selection)
  • ✅ Configure Cloud Scheduler for automated data collection

See Bootstrap Guide for complete details.

Using Terraform (Infrastructure-as-Code)

For automated, reproducible deployments with version control:

# Navigate to terraform directory
cd terraform

# Copy and edit configuration
cp terraform.tfvars.example terraform.tfvars
vi terraform.tfvars  # Set your project_id

# Deploy everything
terraform init
terraform apply

See Terraform README for complete IaC deployment guide.

🐳 Docker Quick Start (Recommended)

Already have everything set up? Jump right in:

# 1. Configure environment
cp .env.example .env
# Edit .env with your GCP project details

# 2. Setup BigQuery (one-time)
./scripts/setup_bigquery.sh

# 3. Start with Docker
docker compose up --build

# Access at:
# - ADK Backend:  http://localhost:8031
# - Chainlit UI:  http://localhost:8033

🐳 Podman Alternative (Rootless Containers)

Prefer Podman over Docker? We've got you covered:

# Quick start with Podman
./scripts/podman_build.sh
./scripts/podman_run.sh

See docs/PODMAN_SETUP.md for complete Podman setup guide.

☁️ One-Command Cloud Bootstrap

To build the container, deploy the Cloud Run service, set up the collector job, and deploy the workflows in a single step:

GOOGLE_CLOUD_PROJECT=<your-project> ./scripts/bootstrap_project.sh

This will:

  • Build & deploy the chat application (Chainlit + ADK) to Cloud Run
  • Build & deploy the backfill Cloud Run Job and its workflows
  • Trigger the first backfill so BigQuery tables are warm
  • Print the service URL and reminder for starting the daily loop

💻 Local Development Setup

# 1. Clone repository
git clone https://github.com/stuagano/adk-python.git
cd contributing/samples/security_agent

# 2. Install dependencies
pip install -r requirements.txt

# 3. Configure environment
cp .env.example .env
# Edit .env with your GCP project details

# 4. Setup BigQuery (one-time)
./scripts/setup_bigquery.sh

# 5. Start services
./scripts/start_all.sh

# Stop all services
./scripts/stop_all.sh

Access Interfaces

Interface URL Purpose
ADK Backend http://localhost:8031 Direct API access, programmatic integration
Chainlit UI http://localhost:8033 Modern chat interface (recommended for end users)

Run with Docker

Prerequisites (one-time setup):

# 1. Run the preflight check to validate your setup
./scripts/docker_preflight.sh

# This checks for:
# - config/ directory exists
# - .env file is configured
# - Service account JSON is present
# - Docker is installed and running

First-time setup if preflight fails:

# Create config directory
mkdir -p config

# Copy environment template
cp .env.example .env

# Edit .env with your GCP project details
# Minimum required:
#   GOOGLE_CLOUD_PROJECT=your-project-id
#   GOOGLE_APPLICATION_CREDENTIALS=config/service-account-key.json
#   BQ_DEFAULT_DATASET=security_insights
#   BQ_DEFAULT_TABLE=security_findings

# Place your GCP service account JSON in config/
# See config/README.md for detailed instructions on creating a service account
cp /path/to/your-key.json config/service-account-key.json
chmod 600 config/service-account-key.json

Build and run with Docker Compose (recommended):

docker compose up --build

# Or run in detached mode
docker compose up -d --build

# View logs
docker compose logs -f

# Stop services
docker compose down

Alternative: Build and run with scripts:

# Build the container image
./scripts/docker_build.sh [image-name]

# Run the container
./scripts/docker_run.sh [image-name]

Access the interfaces:

Need a quick reference for the helper scripts? See scripts/README.md.

Troubleshooting:

  • Run ./scripts/docker_preflight.sh to diagnose issues
  • Check logs: docker compose logs -f or tail -f logs/*.log
  • Verify credentials: See config/README.md for setup guide

☁️ Cloud Run Deployment

Deploy the platform to Google Cloud Run for fully managed, serverless hosting:

# One-command deployment
./scripts/deploy_to_cloud_run.sh

What gets deployed:

  • ✅ Container built and pushed to Artifact Registry
  • ✅ Service account credentials stored securely in Secret Manager
  • ✅ Auto-scaling Cloud Run service (0-10 instances)
  • ✅ HTTPS endpoint with health checks
  • ✅ Same BigQuery integration as local deployment

Access your deployment:

https://security-intelligence-platform-<hash>-uc.a.run.app

👉 CLOUD_RUN_DEPLOYMENT.md - Complete deployment guide

This guide covers:

  • ✅ Automated deployment script
  • ✅ Manual step-by-step deployment
  • ✅ Security hardening (authentication, VPC, custom service accounts)
  • ✅ Cost optimization strategies
  • ✅ Monitoring and debugging
  • ✅ CI/CD integration with GitHub Actions
  • ✅ Multi-region deployment

Key differences from local Docker:

  • Credentials: Stored in Secret Manager (not mounted files)
  • Scaling: Automatic based on traffic (0-N instances)
  • Cost: Pay-per-use (~$8/month for typical usage)
  • Networking: Public HTTPS URL with optional VPC

Cloud Run Job + Workflow (collectors)

The ingestion collectors run separately from the interactive service. Deploy the job once per project:

GOOGLE_CLOUD_PROJECT=<your-project> ./scripts/bootstrap_backfill.sh

If you prefer manual steps, build the job image and deploy resources individually:

gcloud builds submit --config cloudbuild-job.yaml .

Then deploy the workflow that triggers it:

gcloud workflows deploy security-data-workflow \
  --source workflows/collector_trigger.yaml \
  --region us-central1

Trigger whenever data should be refreshed (manually or from another scheduler):

gcloud workflows run security-data-workflow \
  --region us-central1 \
  --data '{"jobName":"security-data-backfill"}'

Each execution invokes the collectors defined in batch_collectors/job_runner.py and populates the security_insights dataset. You can adjust the cadence by triggering the workflow on a schedule of your choice (or use the loop workflow below).

Optional: Continuous loop workflow

To keep everything self-contained, deploy the daily_backfill.yaml workflow. It triggers security-data-workflow, sleeps for the interval you provide (default 24 hours) and repeats until you cancel the execution.

gcloud workflows deploy security-data-daily \
  --source workflows/daily_backfill.yaml \
  --location us-central1

# Start the loop (runs until the execution is cancelled)
gcloud workflows run security-data-daily --location us-central1 \
  --data '{"intervalSeconds":86400,"workflow":"security-data-workflow"}'

Override intervalSeconds (seconds) or the workflow name in the payload as needed. Stop the loop via Cloud Console or gcloud workflows executions cancel.

🛠️ Comprehensive Tool Suite

Architecture diagrams (optional)

  • Install Graphviz + diagrams as noted in docs/DIAGRAMS_EVALUATION.md.

  • Generate the architecture view (uses live BigQuery data):

    python tools/generate_architecture_diagram.py
  • Output saved to diagrams/security_agent_architecture.png.

The platform includes 53 specialized tools organized into 10 categories:

Core Analysis Tools

1. get_security_insights_summary()

Returns overview of security findings table with metrics:

  • Total records, categories, severity levels
  • Unique resources affected
  • Date range of findings

2. query_security_insights(query_filter, limit)

Query security findings with SQL WHERE clause filtering.

Available columns:

  • id (INTEGER) - Unique identifier
  • name (STRING) - Finding name
  • category (STRING) - Security category
  • severity (STRING) - Severity level (HIGH, MEDIUM, LOW)
  • resource_name (STRING) - Affected resource
  • description (STRING) - Finding description
  • recommendation (STRING) - Remediation steps
  • state (STRING) - Current state
  • created_at (STRING) - Creation timestamp
  • project_id (STRING) - GCP project ID

Example filters:

query_security_insights("severity = 'HIGH'")
query_security_insights("created_at >= '2025-10-06'")
query_security_insights("category = 'VULNERABILITY'", limit=10)

3. get_security_statistics(group_by)

Aggregated statistics grouped by field.

Valid group_by values:

  • severity - Group by severity level
  • category - Group by security category
  • state - Group by finding state
  • project_id - Group by GCP project

🆕 Enhanced Analysis Tools (v1.0.2)

4. get_resources_by_severity(severity="HIGH")

List all unique resources affected by findings of a specific severity level.

Severity levels:

  • CRITICAL - Critical issues requiring immediate attention
  • HIGH - High severity, address soon
  • MEDIUM - Medium severity, scheduled remediation
  • LOW - Low severity, eventual remediation

Output includes:

  • Resource name
  • Finding count per resource
  • Categories of findings
  • Latest finding timestamp

Example:

get_resources_by_severity("CRITICAL")  # Show all critical resources
get_resources_by_severity("HIGH")      # Show high-severity resources

5. get_recent_findings(days=7)

Get security findings from the last N days with severity breakdown.

Features:

  • Time-based filtering (1-365 days)
  • Severity breakdown and counts
  • Ordered by severity (CRITICAL → LOW)
  • Shows first 20 findings with full details

Example:

get_recent_findings(7)    # Last week
get_recent_findings(30)   # Last month
get_recent_findings(1)    # Last 24 hours

6. export_findings_to_csv(query_filter="", output_file="security_findings.csv")

Export security findings to CSV file for analysis in Excel/Sheets.

Features:

  • Optional SQL filtering
  • Automatic .csv extension
  • All columns included
  • Ordered by creation date (newest first)

Example:

export_findings_to_csv()                                    # Export all
export_findings_to_csv("severity = 'HIGH'", "high.csv")    # Export high severity only
export_findings_to_csv("created_at >= '2025-10-01'")      # Export October findings

📦 Complete Tool Categories

🔒 Security Analysis Tools (16 tools)

  • Core Security: get_security_insights_summary, query_security_insights, get_security_statistics, get_resources_by_severity, get_recent_findings, export_findings_to_csv
  • IAM Security: get_primitive_role_accounts, get_old_service_account_keys, analyze_iam_security_posture, analyze_all_custom_roles, analyze_custom_role_tool
  • Network Security: get_open_firewall_rules, get_ssh_accessible_resources, analyze_network_security_posture
  • Storage Security: get_public_storage_buckets, get_unencrypted_buckets
  • Critical Findings: get_critical_security_findings, get_high_severity_findings_by_resource

📊 BigQuery Tools (9 tools)

  • Basic Operations: hello_world, list_datasets, list_tables, get_table_schema
  • Query Operations: run_query, analyze_query_cost, get_table_sample
  • Exploration: explore_all_tables_and_views, analyze_table_or_view

📚 Documentation Tools (5 tools)

  • Confluence: search_confluence_documentation, get_confluence_document, analyze_confluence_coverage, get_confluence_statistics, refresh_confluence_cache

📡 Security Feed Tools (4 tools)

  • Threat Intelligence: query_gcp_release_notes, query_security_threat_feeds, get_feed_statistics, search_feeds_by_keyword

🔍 Service Discovery Tools (8 tools)

  • Discovery: discover_gcp_services, analyze_gcp_service, get_service_resources, suggest_service_analysis
  • Learning: learn_service_from_url, discover_new_gcp_services, register_new_service, learn_from_api_spec

📝 Service Documentation Tools (4 tools)

  • Parsing: parse_service_documentation, discover_new_services, learn_service_from_api_spec_parser, register_custom_service

🚀 Service Onboarding Tools (1 tool)

  • Onboarding: onboard_service

📦 Release Analysis Tools (2 tools)

  • MSA Analysis: analyze_releases, analyze_gcp_releases

⚡ Performance Tools (2 tools)

  • Monitoring: get_cache_statistics - View cache hit rates and performance metrics
  • Management: clear_query_cache - Clear cached results for fresh data

Total: 53 Tools across 10 categories providing comprehensive security analysis, operations, and GCP service management capabilities.

⚡ Performance Optimization

Intelligent Query Caching

The platform now includes automatic query result caching for the most frequently used security tools:

  • get_security_insights_summary() - Cached for 5 minutes
  • query_security_insights() - Cached for 3 minutes
  • get_security_statistics() - Cached for 5 minutes

Benefits:

  • 🚀 3-10x faster response times on repeated queries
  • 💰 Reduced BigQuery costs - fewer query executions
  • 📊 No external dependencies - in-memory caching with file persistence
  • 🔄 Automatic expiration - Fresh data guaranteed within TTL window

Cache Management:

# View cache performance
get_cache_statistics()  # Shows hit rate, cache size, request counts

# Clear cache for fresh data
clear_query_cache()  # Forces next query to fetch fresh results

📊 BigQuery Schema

Security Findings Table

Dataset: security_insights Table: security_findings

Columns:

CREATE TABLE security_insights.security_findings (
  id INTEGER,
  name STRING,
  category STRING,
  severity STRING,
  resource_name STRING,
  description STRING,
  recommendation STRING,
  state STRING,
  created_at STRING,
  project_id STRING
)

Example Queries

-- High severity findings
SELECT * FROM `project.security_insights.security_findings`
WHERE severity = 'HIGH'
ORDER BY created_at DESC;

-- Findings by category
SELECT category, COUNT(*) as count
FROM `project.security_insights.security_findings`
GROUP BY category
ORDER BY count DESC;

-- Recent findings (last 24 hours)
SELECT * FROM `project.security_insights.security_findings`
WHERE created_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR);

🔌 Chainlit Integration

Standalone Usage

chainlit run chainlit_app.py --port 8001

Spotlight prompts

The Chainlit landing screen highlights three high-value workflows:

  • Custom Role Analyzer: Run the custom role analyzer for <project>
  • New Service Onboarding: Onboard a new service using this documentation URL (replace https://example.com/docs with your own)
  • MSA Analyzer: Summarize the latest MSA release analysis

Integrate with Existing Chainlit App

Method 1: One-Line Integration

from chainlit_agent import register_security_agent

@cl.set_chat_profiles
async def chat_profile():
    return register_security_agent(get_my_profiles())

Method 2: Manual Integration

from chainlit_agent import SecurityAgentProfile

@cl.set_chat_profiles
async def chat_profile():
    profiles = SecurityAgentProfile.get_profiles()
    # Add your profiles here
    return profiles

@cl.on_chat_start
async def start():
    await SecurityAgentProfile.on_chat_start()

@cl.on_message
async def main(message: cl.Message):
    await SecurityAgentProfile.on_message(message)

See docs/CHAINLIT_PLUGIN_INTEGRATION.md for details.

🏗️ Architecture

┌─────────────────────────────────────────┐
│           User Interfaces                │
│     Chainlit UI | MCP Server             │
└──────────────────┬──────────────────────┘
                   │
         ┌─────────▼──────────┐
         │   ADK Backend      │
         │  (port 8031)       │
         │  Gemini 2.5 Flash  │
         └─────────┬──────────┘
                   │
    ┌──────────────┼──────────────┐
    │              │              │
┌───▼────┐   ┌────▼─────┐   ┌───▼────┐
│Security│   │BigQuery  │   │Service │
│Tools   │   │Tools     │   │Discovery│
│(3)     │   │(~10)     │   │(~10)   │
└───┬────┘   └────┬─────┘   └───┬────┘
    │             │              │
    └─────────────▼──────────────┘
                  │
         ┌────────▼────────┐
         │    BigQuery     │
         │  Data Platform  │
         └────────┬────────┘
                  │
    ┌─────────────┴─────────────┐
    │                           │
┌───▼──────────┐    ┌──────────▼───┐
│Cloud Functions│   │External APIs │
│(IAM, Compute, │   │(GCP, RSS,    │
│ Storage, etc.)│   │ Confluence)  │
└──────────────┘    └──────────────┘

Key Architectural Principles

  1. Separation of Concerns: Agent queries BigQuery, Cloud Functions populate data
  2. Modular Deployment: Deploy only the Cloud Functions you need
  3. Direct Access: Agent has full BigQuery access for flexible queries
  4. No Coupling: Agent never calls Cloud Functions directly
  5. Scheduled Updates: Cloud Functions run on schedules via Cloud Scheduler

☁️ Cloud Functions (Optional)

Deploy modular Cloud Functions to populate BigQuery with security data:

IAM & Security (5 functions)

  • fetch_iam_accounts - Users, groups, service accounts
  • fetch_service_account_roles - Service account permissions
  • fetch_user_roles - User role assignments
  • fetch_custom_roles - Custom IAM roles
  • fetch_standard_roles - Google-managed roles

Infrastructure (3 functions)

  • fetch_compute_instances - VM security analysis
  • fetch_firewall_rules - Network security, risk scoring
  • fetch_storage_buckets - Storage security

Feeds & Documentation (4 functions)

  • fetch_security_findings - Security Command Center
  • fetch_security_feeds - RSS security feeds
  • fetch_gcp_release_notes - Platform updates
  • confluence_sync - Documentation → BigQuery

See cloud_functions/README.md for deployment instructions.

🧪 Testing & Validation

Dependency Check

# Quick validation (runs in startup script)
python3 -c "import flask, google.cloud.aiplatform, requests, dotenv"

# Comprehensive validation
python3 tests/test_dependencies.py

Test Services

# Start services
./scripts/start_all.sh

# Test ADK Backend
curl http://localhost:8031/health

# Test Chainlit UI
curl http://localhost:8033

📚 Documentation

Getting Started

Integration Guides

Architecture & Development

🔧 Configuration

Environment Variables (.env)

# GCP Configuration (Required)
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_APPLICATION_CREDENTIALS=config/service-account.json
GOOGLE_CLOUD_LOCATION=us-central1

# BigQuery Configuration (Required)
BQ_DEFAULT_DATASET=security_insights
BQ_DEFAULT_TABLE=security_findings

# ADK Configuration (Required)
ADK_BASE_URL=http://localhost:8031
ADK_AGENT_MODEL=gemini-2.5-flash
GOOGLE_GENAI_USE_VERTEXAI=1

# Confluence Configuration (Optional)
CONFLUENCE_URL=https://your-domain.atlassian.net
CONFLUENCE_USERNAME=your-email@example.com
CONFLUENCE_API_TOKEN=your-api-token
CONFLUENCE_SPACES=SEC,POLICY,GCP

Chainlit Configuration

Located in .chainlit/config.toml:

[project]
enable_telemetry = false
user_env = []  # Empty for local development

[UI]
name = "GCP Security Agent"
default_collapse_content = true

🔧 Recent Fixes (v1.0.1)

ADK Compatibility

  • ✅ Fixed return types: StructuredToolResponsestr for ADK automatic function calling
  • ✅ ADK requires simple types (str, dict, int) - custom dataclasses not supported

BigQuery Schema

  • ✅ Fixed column reference: resource_typeresource_name
  • ✅ Added schema documentation to tool docstrings

Chainlit

  • ✅ Fixed directory structure: .chainlit file → .chainlit/config.toml directory
  • ✅ Configured user_env = [] for local development
  • ✅ Prevented duplicate ADK session creation

See CHANGELOG.md for complete version history.

📝 Example Usage

Natural Language Queries (via Chainlit)

"Show me security findings from the last 24 hours"
"List all HIGH severity vulnerabilities"
"Get security statistics grouped by category"
"Find findings related to storage buckets"
"What are the most common security issues?"

Programmatic Access (via Python)

import requests

# Query ADK backend
response = requests.post('http://localhost:8031/run', json={
    'user_id': 'test-user',
    'message': 'Show me high severity findings'
})

results = response.json()
print(results)

🚢 Production Deployment

Deploy to Cloud Run

# Build container
gcloud builds submit --tag gcr.io/$PROJECT_ID/security-agent

# Deploy security agent with Chainlit UI
gcloud run deploy security-agent \
  --image gcr.io/$PROJECT_ID/security-agent \
  --port 8033 \
  --set-env-vars GOOGLE_CLOUD_PROJECT=$PROJECT_ID

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is part of the Google ADK Python samples.

🙏 Acknowledgments

  • Google Cloud Platform team for the ADK framework
  • Gemini team for powerful language models
  • All contributors to the security platform

Status: ✅ Production Ready (v1.0.1) Last Updated: October 7, 2025 Built with ❤️ for GCP Security