Skip to content

Latest commit

 

History

History
212 lines (158 loc) · 5.18 KB

File metadata and controls

212 lines (158 loc) · 5.18 KB

Module 1: Azure Blob Storage Implementation

Overview

This module implements Azure Blob Storage as the document repository for TrustGuard. All uploaded documents (PDFs, images, audio, etc.) are stored here and remain the source of truth.

Architecture

User Upload
    ↓
FastAPI (routes/documents.py)
    ↓
BlobStorageService (services/blob_storage.py)
    ↓
Azure Blob Storage Containers:
  - trustguard-documents   (raw uploads)
  - trustguard-chunks      (processed text chunks)
  - trustguard-temp        (temporary processing files)

Components

1. Infrastructure (Bicep)

File: infrastructure/main.bicep

Creates:

  • Storage Account (Standard_LRS, Hot tier)
  • 3 Containers:
    • trustguard-documents - Raw uploads
    • trustguard-chunks - Processed chunks
    • trustguard-temp - Temp files (auto-deleted after 7 days)
  • Lifecycle policy for auto-cleanup
# Deploy
cd infrastructure
./deploy.sh trustguard-rg eastus

2. Python Service (BlobStorageService)

File: backend/services/blob_storage.py

Core functionality:

  • upload_blob() - Upload files with metadata
  • download_blob() - Retrieve file content
  • list_blobs() - List documents with filtering
  • delete_blob() - Remove documents
  • generate_blob_sas_url() - Create temporary download links
  • generate_container_sas_url() - Container-level SAS
  • get_blob_properties() - Retrieve metadata

Authentication: Uses either:

  • Managed Identity (production) - No credentials needed
  • Storage Account Key (development)

3. FastAPI Routes

File: backend/routes/documents.py

Endpoints:

  • POST /api/v1/documents/upload - Upload document
  • GET /api/v1/documents/list - List documents
  • GET /api/v1/documents/download/{folder}/{filename} - Download
  • POST /api/v1/documents/sas-url/{folder}/{filename} - Generate temp URL
  • DELETE /api/v1/documents/{folder}/{filename} - Delete document
  • GET /api/v1/documents/{folder}/{filename}/properties - Get metadata

Usage Examples

Upload a Document

curl -X POST "http://localhost:8000/api/v1/documents/upload?folder=documents" \
  -F "file=@claim_123.pdf"

Response:

{
  "blob_name": "documents/claim_123.pdf",
  "container_name": "trustguard-documents",
  "blob_uri": "https://trustguardstg.blob.core.windows.net/trustguard-documents/documents/claim_123.pdf",
  "size": 1024576,
  "created": "2025-11-23T10:30:00+00:00"
}

List Documents

curl "http://localhost:8000/api/v1/documents/list?folder=documents"

Generate Temporary Download Link

curl -X POST "http://localhost:8000/api/v1/documents/sas-url/documents/claim_123.pdf?expires_in_hours=24"

Response:

{
  "blob_name": "documents/claim_123.pdf",
  "sas_url": "https://trustguardstg.blob.core.windows.net/trustguard-documents/documents/claim_123.pdf?sv=2023-01-01&...",
  "expires_in_hours": 24
}

Security Best Practices

What we've implemented:

  • No public blob access (allowBlobPublicAccess: false)
  • HTTPS-only (supportsHttpsTrafficOnly: true)
  • Managed Identity for production
  • SAS tokens for temporary access
  • Encryption at rest (default)

⚠️ What you should add:

  • RBAC roles (Storage Blob Data Contributor)
  • Network restrictions (Firewall, VNets)
  • Soft delete policy
  • Private endpoints
  • Audit logging to Log Analytics

Configuration

Environment Variables

# Storage Account
STORAGE_ACCOUNT_NAME=trustguardstg          # Set by deployment
STORAGE_ACCOUNT_KEY=                        # Leave empty for Managed Identity
BLOB_CONTAINER_NAME=trustguard-documents

Managed Identity Setup (Production)

  1. Enable Managed Identity on Container Apps
  2. Assign role: "Storage Blob Data Contributor"
  3. Leave STORAGE_ACCOUNT_KEY empty in environment

Deployment Commands

Option 1: Bicep Deployment

cd infrastructure
./deploy.sh trustguard-rg eastus

Option 2: Azure CLI

az group create --name trustguard-rg --location eastus

az deployment group create \
  --resource-group trustguard-rg \
  --template-file main.bicep \
  --parameters parameters.json

Testing

Local Testing

# Install dependencies
cd backend
pip install -r ../requirements.txt

# Run FastAPI
uvicorn main:app --reload

# Upload test file
curl -X POST "http://localhost:8000/api/v1/documents/upload?folder=documents" \
  -F "file=@sample.pdf"

# List documents
curl "http://localhost:8000/api/v1/documents/list"

Using Postman

Import the collection from docs/postman/trustguard-module1.postman_collection.json (we'll create this next).

Costs

Monthly estimate (Standard_LRS, Hot tier):

  • Storage: $0.018 per GB
  • Transactions: ~$0.0004 per 10K operations
  • Example: 100GB + 1M transactions = ~$2.40/month

This is well within Azure student credits!

Next Steps

Module 2 will implement Azure Functions to:

  • Trigger on blob uploads
  • Extract text using Document Intelligence
  • Store chunks in the trustguard-chunks container

Learning Outcomes: ✅ Understand Azure Blob Storage architecture ✅ Implement storage service with Python SDK ✅ Create FastAPI endpoints for file operations ✅ Generate SAS tokens for temporary access ✅ Deploy infrastructure with Bicep