📊 Riksdagsmonitor — Future Data Architecture Model

🚀 Evolution: CSV → API Gateway → AWS Serverless Intelligence
🎯 Neptune Graph · Aurora Relational · OpenSearch Vector · Bedrock AI

📋 Document Owner: CEO | 📄 Version: 2.0 | 📅 Last Updated: 2026-02-24 (UTC)
🔄 Review Cycle: Annual | ⏰ Next Review: 2027-02-24
🏢 Owner: Hack23 AB (Org.nr 5595347807) | 🏷️ Classification: Public

📚 Architecture Documentation Map

Document	Type	Description
Architecture	🏛️ Current	C4 model showing system structure
Data Model	📊 Current	Data entities and relationships
Flowcharts	🔄 Current	Process flows and pipelines
State Diagrams	🔄 Current	System state transitions
Mindmap	🗺️ Current	System conceptual map
SWOT	💼 Current	Strategic analysis
Future Architecture	🏗️ Future	System evolution roadmap
Future Data Model	📊 Future	Enhanced data architecture (this doc)
Future Flowcharts	🔄 Future	Advanced process flows
Future State Diagrams	🔄 Future	Advanced state management
Future Mindmap	🗺️ Future	Future capability map
Future SWOT	💼 Future	Strategic outlook
Security Architecture	🛡️ Security	Defense-in-depth controls
Future Security Architecture	🛡️ Future	Security roadmap
Threat Model	🎯 Security	STRIDE analysis

📊 Executive Summary

Riksdagsmonitor's data architecture evolves over 2026-2037 from static CSV files to a fully-managed AWS Serverless intelligence platform. This transformation enables real-time political analytics, AI-powered insights, and scalable processing of Swedish parliamentary data.

Strategic Vision (2026-2037):

🔄 Phase 1 (2026-2027): CSV → CIA JSON API Gateway integration
☁️ Phase 2 (2027-2028): AWS Serverless migration (Neptune, Aurora, DynamoDB, OpenSearch)
🤖 Phase 3 (2028-2030): AI/ML with Amazon Bedrock (embeddings, RAG, forecasting)
📊 Phase 4 (2030-2032): Advanced analytics with Timestream and real-time streaming
🧠 Phase 5 (2033-2035): Pre-AGI data architecture with autonomous schema evolution
🌐 Phase 6 (2036-2037): AGI-era data platform supporting 195 global parliaments

Key Transformations:

Aspect	Current (2026)	Future (2037)
Data Source	CIA CSV exports (static)	CIA JSON API Gateway (real-time)
Database	GitHub repository files	Neptune Graph + Aurora Serverless v2
Search	Text matching	OpenSearch Serverless + semantic vectors
Query	JavaScript filters	AWS AppSync GraphQL API
Analytics	Static aggregations	Timestream time-series + Lambda analytics
AI/ML	None	Bedrock Titan Embeddings (8192-dim) + RAG
Scale	109K documents	100M+ documents with global parliament coverage
Compute	Static site	AWS Lambda serverless functions
Orchestration	GitHub Actions	AWS Step Functions

Current Baseline:

2,494 Politicians → Future: Complete career graphs in Neptune
3.5M+ Voting Records → Future: Real-time vote prediction models
109K Documents → Future: Semantic search with Bedrock embeddings
19 CIA Products → Future: 100+ intelligence products via API Gateway

🔄 1. Current State vs Future State

1.1 Architecture Comparison

Component	Current (2026)	Phase 2 (2028)	Phase 4 (2032)
Data Ingestion	Manual CSV downloads	CIA API Gateway polling	Real-time event streaming
Storage Layer	GitHub repo (< 1GB)	Aurora 100GB + Neptune 500GB	Aurora 500GB + Neptune 5TB
Graph Database	None	Neptune Serverless (Gremlin)	Neptune Analytics + ML
Relational DB	None	Aurora Serverless v2 (PostgreSQL)	Aurora Global Database
Vector Search	None	OpenSearch Serverless	OpenSearch + Bedrock KB
Time-Series	None	Timestream (historical trends)	Timestream (forecasting)
API Layer	Static JSON files	AppSync GraphQL	AppSync + Lambda resolvers
AI/ML	None	Bedrock Titan Embeddings	Bedrock + SageMaker
Compute	Static site	Lambda functions	Lambda + Step Functions
Monitoring	None	CloudWatch Logs	CloudWatch + X-Ray tracing

1.2 Data Volume Projections

Metric	2026	2028	2032
Politicians	2,494	10,000	50,000
Voting Records	3.5M	10M	100M
Documents	109K	500K	10M
Graph Relationships	0	5M	100M
Vector Embeddings	0	500K	10M
API Requests/Day	0	10K	1M

☁️ 2. AWS Serverless Data Architecture

2.1 Amazon Neptune Serverless (Graph Database)

Purpose: Store political relationships, influence networks, coalition patterns.

2.1.1 Core Node Types

// Politician Vertex
g.addV('Politician').
  property('person_id', '0479479309').
  property('first_name', 'Anna').
  property('last_name', 'Svensson').
  property('party', 'S').
  property('born_year', 1975).
  property('district', 'Stockholm').
  property('risk_score', 42.5).
  property('risk_level', 'MEDIUM')

// Party Vertex
g.addV('Party').
  property('party_id', 'S').
  property('party_name', 'Socialdemokraterna').
  property('founded_year', 1889).
  property('current_seats', 107)

// Document Vertex
g.addV('Document').
  property('document_id', 'H901FiU1').
  property('document_type', 'bet').
  property('title', 'Finansutskottets betänkande').
  property('published_date', '2024-11-15').
  property('status', 'BESLUTAD')

// Vote Vertex
g.addV('Vote').
  property('vote_id', 'V202400123').
  property('ballot_id', 'B20240056').
  property('vote', 'Ja').
  property('vote_date', '2024-11-20').
  property('is_rebel_vote', false)

// Committee Vertex
g.addV('Committee').
  property('committee_id', 'FiU').
  property('committee_name', 'Finansutskottet').
  property('established_year', 1867).
  property('total_members', 17)

2.1.2 Relationship Edges

// Political relationships
g.V().has('Politician','person_id','0479479309').
  addE('MEMBER_OF').property('since', '2018-01-01').
  to(g.V().has('Party','party_id','S'))

g.V().has('Politician','person_id','0479479309').
  addE('CAST_VOTE').property('vote', 'Ja').
  to(g.V().has('Vote','vote_id','V202400123'))

g.V().has('Politician','person_id','0479479309').
  addE('AUTHORED').property('author_order', 1).
  to(g.V().has('Document','document_id','H901FiU1'))

// Coalition edges
g.V().has('Party','party_id','M').
  addE('COALITION_WITH').property('government_id', 'GOV_2022').
  to(g.V().has('Party','party_id','SD'))

// Influence network
g.V().has('Politician','person_id','P1').
  addE('INFLUENCES').property('strength', 0.75).
  to(g.V().has('Politician','person_id','P2'))

2.1.3 Gremlin Query Examples

Example 1: Find MPs with highest rebellion rate

g.V().hasLabel('Politician').
  project('name','party','rebel_count').
    by(values('first_name','last_name').fold()).
    by(values('party')).
    by(outE('CAST_VOTE').has('is_rebel', true).count()).
  order().by('rebel_count', desc).
  limit(10)

Example 2: Coalition formation patterns

g.V().hasLabel('Party').as('party1').
  outE('COALITION_WITH').as('coalition').
  inV().as('party2').
  group().
    by(select('party1').values('party_name')).
    by(select('party2').values('party_name').fold()).
  unfold()

Example 3: Document influence cascades

g.V().has('Document','document_type','prop').
  repeat(out('REFERENCES')).
  times(3).
  path().
  by('title').
  limit(20)

2.2 Amazon Aurora Serverless v2 (Relational)

Purpose: Core structured data with ACID guarantees (politicians, parties, documents, votes).

2.2.1 Critical Tables Schema

Politicians Table

CREATE TABLE politicians (
    person_id VARCHAR(20) PRIMARY KEY,
    first_name VARCHAR(100) NOT NULL,
    last_name VARCHAR(100) NOT NULL,
    party VARCHAR(10) REFERENCES parties(party_id),
    born_year INTEGER,
    gender VARCHAR(20),
    status VARCHAR(100),
    district VARCHAR(100),
    risk_score DECIMAL(5,2),
    risk_level VARCHAR(20),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_politicians_party ON politicians(party);
CREATE INDEX idx_politicians_risk ON politicians(risk_level, risk_score);

Parties Table

CREATE TABLE parties (
    party_id VARCHAR(10) PRIMARY KEY,
    party_name VARCHAR(200) NOT NULL,
    party_name_en VARCHAR(200),
    founded_year INTEGER,
    ideology VARCHAR(200),
    riksdag_status VARCHAR(50),
    avg_win_rate DECIMAL(5,2),
    current_seats INTEGER,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Documents Table

CREATE TABLE documents (
    document_id VARCHAR(50) PRIMARY KEY,
    document_type VARCHAR(20) NOT NULL,
    title TEXT NOT NULL,
    subtitle TEXT,
    summary TEXT,
    published_date DATE,
    rm VARCHAR(20),
    organ VARCHAR(20),
    status VARCHAR(50),
    fulltext TEXT,
    embedding_id VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_documents_type ON documents(document_type);
CREATE INDEX idx_documents_date ON documents(published_date DESC);
CREATE INDEX idx_documents_organ ON documents(organ);
CREATE INDEX idx_documents_fulltext ON documents USING GIN (
    to_tsvector('simple', 
        coalesce(title, '') || ' ' || 
        coalesce(summary, '') || ' ' || 
        coalesce(fulltext, '')
    )
);

Votes Table

CREATE TABLE votes (
    vote_id VARCHAR(50) PRIMARY KEY,
    ballot_id VARCHAR(50) NOT NULL,
    person_id VARCHAR(20) REFERENCES politicians(person_id),
    party VARCHAR(10) REFERENCES parties(party_id),
    vote VARCHAR(20) NOT NULL,
    vote_date DATE NOT NULL,
    vote_time TIME,
    is_rebel_vote BOOLEAN DEFAULT FALSE,
    is_winning_vote BOOLEAN,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_votes_person ON votes(person_id);
CREATE INDEX idx_votes_ballot ON votes(ballot_id);
CREATE INDEX idx_votes_date ON votes(vote_date DESC);
CREATE INDEX idx_votes_rebel ON votes(is_rebel_vote) WHERE is_rebel_vote = TRUE;

Committees Table

CREATE TABLE committees (
    committee_id VARCHAR(20) PRIMARY KEY,
    committee_name VARCHAR(200) NOT NULL,
    committee_name_en VARCHAR(200),
    established_year INTEGER,
    total_members INTEGER,
    productivity_score DECIMAL(5,2),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

2.2.2 Key Performance Queries

Most Active Politicians

SELECT 
    p.person_id,
    p.first_name,
    p.last_name,
    p.party,
    COUNT(DISTINCT v.vote_id) as vote_count,
    COUNT(DISTINCT CASE WHEN v.is_rebel_vote THEN v.vote_id END) as rebel_count
FROM politicians p
LEFT JOIN votes v ON v.person_id = p.person_id
WHERE p.status = 'Tjänstgörande riksdagsledamot'
GROUP BY p.person_id, p.first_name, p.last_name, p.party
ORDER BY vote_count DESC
LIMIT 20;

Party Voting Discipline

SELECT 
    p.party,
    pa.party_name,
    COUNT(v.vote_id) as total_votes,
    COUNT(CASE WHEN v.is_rebel_vote THEN 1 END) as rebel_votes,
    ROUND(COUNT(CASE WHEN v.is_rebel_vote THEN 1 END) * 100.0 / COUNT(v.vote_id), 2) as rebel_rate
FROM votes v
JOIN politicians p ON p.person_id = v.person_id
JOIN parties pa ON pa.party_id = p.party
WHERE v.vote_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY p.party, pa.party_name
ORDER BY rebel_rate DESC;

2.3 Amazon DynamoDB (NoSQL)

Purpose: Low-latency real-time data access, session storage, API caching.

2.3.1 Table Designs

Politician Profiles (Fast Lookup)

{
  "TableName": "PoliticianProfiles",
  "KeySchema": [
    {"AttributeName": "person_id", "KeyType": "HASH"}
  ],
  "AttributeDefinitions": [
    {"AttributeName": "person_id", "AttributeType": "S"},
    {"AttributeName": "party", "AttributeType": "S"},
    {"AttributeName": "risk_level", "AttributeType": "S"}
  ],
  "GlobalSecondaryIndexes": [
    {
      "IndexName": "PartyIndex",
      "KeySchema": [
        {"AttributeName": "party", "KeyType": "HASH"}
      ]
    },
    {
      "IndexName": "RiskIndex",
      "KeySchema": [
        {"AttributeName": "risk_level", "KeyType": "HASH"}
      ]
    }
  ]
}

Recent Votes (Time-Ordered)

{
  "TableName": "RecentVotes",
  "KeySchema": [
    {"AttributeName": "ballot_id", "KeyType": "HASH"},
    {"AttributeName": "person_id", "KeyType": "RANGE"}
  ],
  "AttributeDefinitions": [
    {"AttributeName": "ballot_id", "AttributeType": "S"},
    {"AttributeName": "person_id", "AttributeType": "S"},
    {"AttributeName": "vote_date", "AttributeType": "S"}
  ],
  "GlobalSecondaryIndexes": [
    {
      "IndexName": "DateIndex",
      "KeySchema": [
        {"AttributeName": "vote_date", "KeyType": "HASH"}
      ]
    }
  ],
  "TimeToLiveSpecification": {
    "Enabled": true,
    "AttributeName": "expiration_time"
  }
}

API Response Cache

{
  "TableName": "APICache",
  "KeySchema": [
    {"AttributeName": "cache_key", "KeyType": "HASH"}
  ],
  "AttributeDefinitions": [
    {"AttributeName": "cache_key", "AttributeType": "S"}
  ],
  "TimeToLiveSpecification": {
    "Enabled": true,
    "AttributeName": "ttl"
  }
}

2.3.2 Access Patterns

Get Politician Profile

const params = {
  TableName: 'PoliticianProfiles',
  Key: { person_id: '0479479309' }
};
const result = await dynamodb.get(params).promise();

Query Party Members

const params = {
  TableName: 'PoliticianProfiles',
  IndexName: 'PartyIndex',
  KeyConditionExpression: 'party = :party',
  ExpressionAttributeValues: { ':party': 'S' }
};
const result = await dynamodb.query(params).promise();

2.4 Amazon OpenSearch Serverless (Search/Vector)

Purpose: Full-text search, semantic search with vector embeddings, aggregations.

2.4.1 Index Mappings

Documents Index

{
  "mappings": {
    "properties": {
      "document_id": {"type": "keyword"},
      "document_type": {"type": "keyword"},
      "title": {
        "type": "text",
        "fields": {
          "keyword": {"type": "keyword"}
        },
        "analyzer": "swedish"
      },
      "summary": {"type": "text", "analyzer": "swedish"},
      "fulltext": {"type": "text", "analyzer": "swedish"},
      "published_date": {"type": "date"},
      "rm": {"type": "keyword"},
      "organ": {"type": "keyword"},
      "status": {"type": "keyword"},
      "authors": {"type": "keyword"},
      "party": {"type": "keyword"},
      "embedding_vector": {
        "type": "knn_vector",
        "dimension": 8192,
        "method": {
          "name": "hnsw",
          "space_type": "cosinesimilarity",
          "engine": "nmslib"
        }
      }
    }
  }
}

Politicians Index

{
  "mappings": {
    "properties": {
      "person_id": {"type": "keyword"},
      "full_name": {"type": "text", "analyzer": "swedish"},
      "party": {"type": "keyword"},
      "district": {"type": "keyword"},
      "risk_level": {"type": "keyword"},
      "risk_score": {"type": "float"}
    }
  }
}

2.4.2 Query Examples

Full-Text Search

{
  "query": {
    "multi_match": {
      "query": "budget finanspolitik",
      "fields": ["title^3", "summary^2", "fulltext"],
      "type": "best_fields",
      "operator": "and"
    }
  },
  "highlight": {
    "fields": {
      "title": {},
      "summary": {}
    }
  }
}

Semantic Vector Search with Bedrock Embeddings

{
  "query": {
    "knn": {
      "embedding_vector": {
        "vector": [/* 8192-dim vector from Bedrock */],
        "k": 10
      }
    }
  },
  "filter": {
    "bool": {
      "must": [
        {"term": {"document_type": "prop"}},
        {"range": {"published_date": {"gte": "2024-01-01"}}}
      ]
    }
  }
}

Aggregations (Party Distribution)

{
  "query": {"match_all": {}},
  "aggs": {
    "by_party": {
      "terms": {"field": "party", "size": 10},
      "aggs": {
        "avg_risk": {"avg": {"field": "risk_score"}}
      }
    }
  }
}

2.5 Amazon Timestream (Time-Series)

Purpose: Historical trends, vote patterns over time, forecasting data.

2.5.1 Table Schema

Vote Trends Table

CREATE TABLE VoteTrends (
    ballot_id VARCHAR,
    vote_date TIMESTAMP,
    party VARCHAR,
    vote_type VARCHAR,  -- Ja, Nej, Avstår
    vote_count BIGINT,
    rebel_count BIGINT,
    PRIMARY KEY (ballot_id, vote_date)
);

Party Popularity Trends

CREATE TABLE PartyPopularityTrends (
    party VARCHAR,
    measurement_date TIMESTAMP,
    polling_percentage DOUBLE,
    riksdag_seats INTEGER,
    approval_rating DOUBLE,
    PRIMARY KEY (party, measurement_date)
);

2.5.2 Query Examples

Party Voting Patterns (Last 90 Days)

SELECT 
    party,
    BIN(vote_date, 7d) as week,
    SUM(vote_count) as total_votes,
    SUM(rebel_count) as total_rebels,
    SUM(rebel_count) * 100.0 / SUM(vote_count) as rebel_rate
FROM VoteTrends
WHERE vote_date > ago(90d)
GROUP BY party, BIN(vote_date, 7d)
ORDER BY party, week DESC;

Trending Topics

SELECT 
    topic,
    COUNT(*) as mention_count,
    BIN(published_date, 1d) as day
FROM DocumentTopics
WHERE published_date > ago(30d)
GROUP BY topic, BIN(published_date, 1d)
ORDER BY mention_count DESC
LIMIT 10;

2.6 Amazon Bedrock (AI/ML)

Purpose: Text embeddings, semantic search, RAG (Retrieval Augmented Generation), content generation.

2.6.1 Titan Embeddings v2

Generate 8192-Dimensional Embeddings

import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

const client = new BedrockRuntimeClient({ region: "us-east-1" });

async function generateEmbedding(text) {
  const command = new InvokeModelCommand({
    modelId: "amazon.titan-embed-text-v2:0",
    contentType: "application/json",
    accept: "application/json",
    body: JSON.stringify({
      inputText: text,
      dimensions: 8192,
      normalize: true
    })
  });
  
  const response = await client.send(command);
  const responseBody = JSON.parse(new TextDecoder().decode(response.body));
  return responseBody.embedding; // 8192-dimensional vector
}

Embed Document for Semantic Search

// Initialize OpenSearch Serverless client
const { Client } = require('@opensearch-project/opensearch');
const { defaultProvider } = require('@aws-sdk/credential-provider-node');
const aws4 = require('aws4');

const opensearch = new Client({
  node: process.env.OPENSEARCH_ENDPOINT,
  ...aws4.sign({ 
    service: 'aoss',
    region: 'us-east-1'
  }, defaultProvider())
});

async function embedDocument(document) {
  const fullText = [
    document.title,
    document.subtitle,
    document.summary,
    document.fulltext.slice(0, 10000)
  ].filter(Boolean).join("\n\n");
  
  const embedding = await generateEmbedding(fullText);
  
  // Store in OpenSearch
  await opensearch.index({
    index: 'documents',
    id: document.document_id,
    body: {
      ...document,
      embedding_vector: embedding
    }
  });
}

2.6.2 Bedrock Knowledge Bases (RAG)

Create Knowledge Base

const kbConfig = {
  name: "Riksdagsmonitor-KB",
  description: "Swedish parliamentary documents and intelligence",
  roleArn: "arn:aws:iam::ACCOUNT:role/BedrockKBRole",
  storageConfiguration: {
    type: "OPENSEARCH_SERVERLESS",
    opensearchServerlessConfiguration: {
      collectionArn: "arn:aws:aoss:us-east-1:ACCOUNT:collection/riksdag-docs",
      vectorIndexName: "documents",
      fieldMapping: {
        vectorField: "embedding_vector",
        textField: "fulltext",
        metadataField: "metadata"
      }
    }
  },
  embeddingModelArn: "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
};

Query Knowledge Base

// Required imports
import { 
  BedrockAgentRuntimeClient, 
  RetrieveAndGenerateCommand 
} from "@aws-sdk/client-bedrock-agent-runtime";

// Initialize Bedrock Agent Runtime client
const bedrockAgent = new BedrockAgentRuntimeClient({ 
  region: "us-east-1" 
});

async function queryKnowledgeBase(question) {
  const command = new RetrieveAndGenerateCommand({
    input: {
      text: question
    },
    retrieveAndGenerateConfiguration: {
      type: "KNOWLEDGE_BASE",
      knowledgeBaseConfiguration: {
        knowledgeBaseId: "KB12345",
        modelArn: "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-opus-6-v1:0",
        retrievalConfiguration: {
          vectorSearchConfiguration: {
            numberOfResults: 5
          }
        }
      }
    }
  });
  
  const response = await bedrockAgent.send(command);
  return {
    answer: response.output.text,
    citations: response.citations,
    retrievedReferences: response.retrievedReferences
  };
}

🔗 3. CIA JSON API Gateway Integration

3.1 Current State: CSV Exports (Temporary)

Current Data Flow:

CIA Platform → CSV Exports → GitHub Repo → Static Site → End Users

Limitations:

Manual updates required
No real-time data
Limited to 19 intelligence products
No query capabilities
No authentication/authorization

3.2 Phase 1: CIA JSON API Gateway (2026-2027)

CIA Platform Roadmap:

REST API endpoints for all 19 intelligence products
GraphQL API for complex queries
OAuth 2.0 authentication
Rate limiting and quotas
Real-time webhooks for updates

Expected API Structure:

GET /api/v1/politicians
GET /api/v1/politicians/{person_id}
GET /api/v1/documents?type={type}&rm={rm}
GET /api/v1/votes?ballot_id={ballot_id}
GET /api/v1/parties
GET /api/v1/committees

GraphQL Endpoint: POST /graphql
Webhook Subscriptions: POST /webhooks/subscribe

Authentication:

const response = await fetch('https://api.cia-platform.se/v1/politicians', {
  headers: {
    'Authorization': `Bearer ${ACCESS_TOKEN}`,
    'X-API-Key': API_KEY
  }
});

3.3 Phase 2: Native AWS Integration (2027-2028)

AWS Lambda Consumers:

CIA API Gateway → EventBridge → Lambda → Aurora/Neptune/DynamoDB/OpenSearch

Lambda Function Example:

exports.handler = async (event) => {
  // EventBridge event from CIA API webhook (payload in `detail`)
  const ciaData = event.detail;
  
  // Store in Aurora
  await aurora.query('INSERT INTO politicians VALUES (...)');
  
  // Update Neptune graph
  await neptune.executeGremlin('g.addV("Politician")...');
  
  // Generate Bedrock embedding
  const embedding = await generateEmbedding(ciaData.summary);
  
  // Index in OpenSearch
  await opensearch.index({
    index: 'documents',
    body: { ...ciaData, embedding_vector: embedding }
  });
  
  return { statusCode: 200 };
};

3.4 Phase 3: Real-Time Streaming (2028-2030)

EventBridge + Kinesis Data Streams:

CIA Platform → Kinesis Stream → Lambda/Firehose → S3/Aurora/OpenSearch

Real-Time Processing Pipeline:

New document published → Immediate indexing in OpenSearch
New vote cast → Real-time update in Timestream
Risk score change → SNS notification to subscribers

3.5 Data Migration Strategy

Phase	Source	Target	Method	Timeline
Phase 1	CSV files	Aurora Serverless v2	Lambda batch import	Q1 2027
Phase 2	CSV files	Neptune Serverless	Bulk Loader API	Q2 2027
Phase 3	Aurora	OpenSearch Serverless	Lambda + Bedrock embeddings	Q3 2027
Phase 4	CIA API	Real-time Lambda consumers	EventBridge integration	Q1 2028

🔌 4. GraphQL API Schema

4.1 Core Types

type Politician {
  person_id: ID!
  first_name: String!
  last_name: String!
  party: Party!
  born_year: Int
  gender: String
  status: String!
  district: String
  risk_score: Float
  risk_level: RiskLevel!
  votes: [Vote!]!
  documents: [Document!]!
  committees: [Committee!]!
}

type Party {
  party_id: ID!
  party_name: String!
  party_name_en: String
  founded_year: Int
  ideology: String
  current_seats: Int
  avg_win_rate: Float
  members: [Politician!]!
  coalitions: [Party!]!
}

type Document {
  document_id: ID!
  document_type: String!
  title: String!
  subtitle: String
  summary: String
  published_date: String!
  rm: String
  organ: Committee
  status: String!
  authors: [Politician!]!
  votes: [Vote!]!
  similar_documents: [Document!]!
}

type Vote {
  vote_id: ID!
  ballot_id: String!
  person: Politician!
  party: Party!
  vote: VoteType!
  vote_date: String!
  is_rebel_vote: Boolean!
  is_winning_vote: Boolean
}

type Committee {
  committee_id: ID!
  committee_name: String!
  committee_name_en: String
  established_year: Int
  total_members: Int
  members: [Politician!]!
  documents: [Document!]!
}

enum RiskLevel {
  LOW
  MEDIUM
  HIGH
  CRITICAL
}

enum VoteType {
  Ja
  Nej
  Avstår
  Frånvarande
}

4.2 Queries

type Query {
  politician(person_id: ID!): Politician
  politicians(party: String, district: String, risk_level: RiskLevel): [Politician!]!
  
  party(party_id: ID!): Party
  parties(riksdag_status: String): [Party!]!
  
  document(document_id: ID!): Document
  documents(type: String, rm: String, organ: String, limit: Int): [Document!]!
  searchDocuments(query: String!, limit: Int): [Document!]!
  semanticSearchDocuments(query: String!, limit: Int): [Document!]!
  
  vote(vote_id: ID!): Vote
  votes(ballot_id: String, person_id: ID): [Vote!]!
  
  committee(committee_id: ID!): Committee
  committees: [Committee!]!
  
  # Advanced queries
  highRiskPoliticians(threshold: Float): [Politician!]!
  rebelVoters(party: String, limit: Int): [Politician!]!
  coalitionProbabilities: [CoalitionPrediction!]!
}

type CoalitionPrediction {
  parties: [String!]!
  probability: Float!
  projected_seats: Int!
}

4.3 Mutations

type Mutation {
  # Admin operations
  updatePoliticianRiskScore(person_id: ID!, risk_score: Float!): Politician
  
  # AI operations
  generateDocumentSummary(document_id: ID!): String!
  predictVote(person_id: ID!, ballot_id: String!): VotePrediction!
  
  # Subscription management
  subscribeToUpdates(entity_type: String!, entity_id: ID!): Subscription!
}

type VotePrediction {
  predicted_vote: VoteType!
  confidence: Float!
  probabilities: VoteProbabilities!
}

type VoteProbabilities {
  Ja: Float!
  Nej: Float!
  Avstår: Float!
  Frånvarande: Float!
}

4.4 Subscriptions

type Subscription {
  newDocument(organ: String): Document!
  newVote(ballot_id: String): Vote!
  riskScoreChange(person_id: ID): Politician!
  coalitionUpdate: CoalitionPrediction!
}

📐 5. Data Model Diagrams

5.1 Political Entities ERD

erDiagram
    POLITICIAN ||--o{ VOTE : casts
    POLITICIAN }o--|| PARTY : member_of
    POLITICIAN ||--o{ DOCUMENT : authors
    POLITICIAN }o--o{ COMMITTEE : assigned_to
    
    PARTY ||--o{ POLITICIAN : has_members
    PARTY }o--o{ PARTY : coalition_with
    
    DOCUMENT }o--|| COMMITTEE : processed_by
    DOCUMENT ||--o{ VOTE : triggers
    
    POLITICIAN {
        string person_id PK "0479479309"
        string first_name "Anna"
        string last_name "Svensson"
        string party FK "S"
        int born_year "1975"
        string gender "Female"
        string status "Tjänstgörande"
        string district "Stockholm"
        float risk_score "42.5"
        string risk_level "MEDIUM"
    }
    
    PARTY {
        string party_id PK "S"
        string party_name "Socialdemokraterna"
        int founded_year "1889"
        string ideology "Social Democracy"
        int current_seats "107"
        float avg_win_rate "68.5"
    }
    
    DOCUMENT {
        string document_id PK "H901FiU1"
        string document_type "bet"
        string title "Finansutskottets betänkande"
        date published_date "2024-11-15"
        string rm "2024/25"
        string organ FK "FiU"
        string status "BESLUTAD"
        text fulltext
    }
    
    VOTE {
        string vote_id PK "V202400123"
        string ballot_id FK "B20240056"
        string person_id FK "0479479309"
        string vote "Ja"
        date vote_date "2024-11-20"
        boolean is_rebel_vote "false"
    }
    
    COMMITTEE {
        string committee_id PK "FiU"
        string committee_name "Finansutskottet"
        int established_year "1867"
        int total_members "17"
    }

5.2 AWS Service Integration

graph TB
    subgraph "Data Sources"
        CIA[CIA JSON API Gateway]
        CSV[Legacy CSV Files]
    end
    
    subgraph "AWS Ingestion Layer"
        EB[EventBridge]
        Lambda1[Lambda Ingest]
        S3[S3 Raw Data Lake]
    end
    
    subgraph "AWS Storage Layer"
        Aurora[(Aurora Serverless v2<br/>PostgreSQL)]
        Neptune[(Neptune Serverless<br/>Graph DB)]
        DynamoDB[(DynamoDB<br/>NoSQL)]
        OpenSearch[(OpenSearch Serverless<br/>Search + Vector)]
        Timestream[(Timestream<br/>Time-Series)]
    end
    
    subgraph "AWS AI/ML Layer"
        Bedrock[Bedrock Titan<br/>Embeddings v2]
        BedrockKB[Bedrock<br/>Knowledge Bases]
    end
    
    subgraph "AWS API Layer"
        AppSync[AWS AppSync<br/>GraphQL API]
        Lambda2[Lambda Resolvers]
    end
    
    subgraph "Clients"
        Web[Static Website]
        Mobile[Mobile Apps]
        API[External APIs]
    end
    
    CIA --> EB
    CSV --> Lambda1
    EB --> Lambda1
    Lambda1 --> S3
    Lambda1 --> Aurora
    Lambda1 --> Neptune
    Lambda1 --> DynamoDB
    Lambda1 --> OpenSearch
    Lambda1 --> Timestream
    Lambda1 --> Bedrock
    
    Aurora --> AppSync
    Neptune --> AppSync
    DynamoDB --> AppSync
    OpenSearch --> AppSync
    Timestream --> AppSync
    
    Bedrock --> BedrockKB
    OpenSearch --> BedrockKB
    BedrockKB --> AppSync
    
    AppSync --> Lambda2
    AppSync --> Web
    AppSync --> Mobile
    AppSync --> API
    
    style CIA fill:#D32F2F,color:#fff
    style Aurora fill:#4CAF50,color:#fff
    style Neptune fill:#FF9800,color:#fff
    style DynamoDB fill:#FFC107,color:#000
    style OpenSearch fill:#9E9E9E,color:#fff
    style Bedrock fill:#455A64,color:#fff
    style AppSync fill:#4CAF50,color:#fff

5.3 Data Flow Sequence

sequenceDiagram
    participant CIA as CIA API Gateway
    participant EB as EventBridge
    participant Lambda as Lambda Function
    participant Aurora as Aurora Serverless
    participant Bedrock as Bedrock Titan
    participant OpenSearch as OpenSearch Serverless
    participant AppSync as AWS AppSync
    participant Client as Static Site
    
    CIA->>EB: New document published (webhook)
    EB->>Lambda: Trigger ingestion function
    Lambda->>Aurora: INSERT INTO documents
    Lambda->>Bedrock: Generate embedding (8192-dim)
    Bedrock-->>Lambda: Return embedding vector
    Lambda->>OpenSearch: Index document + embedding
    Lambda->>EB: Publish DocumentIndexed event
    
    Client->>AppSync: GraphQL query (semantic search)
    AppSync->>Lambda: Resolver function
    Lambda->>Bedrock: Generate query embedding
    Bedrock-->>Lambda: Query vector
    Lambda->>OpenSearch: KNN vector search
    OpenSearch-->>Lambda: Top 10 similar documents
    Lambda->>Aurora: Fetch full document metadata
    Aurora-->>Lambda: Document details
    Lambda-->>AppSync: GraphQL response
    AppSync-->>Client: Search results

5.4 Neptune Graph Visualization

graph LR
    P1[Politician: Anna Svensson<br/>S, Stockholm<br/>Risk: MEDIUM]
    P2[Politician: Johan Andersson<br/>M, Göteborg<br/>Risk: LOW]
    P3[Politician: Maria Karlsson<br/>SD, Malmö<br/>Risk: HIGH]
    
    Party_S[Party: Socialdemokraterna<br/>107 seats]
    Party_M[Party: Moderaterna<br/>68 seats]
    Party_SD[Party: Sverigedemokraterna<br/>73 seats]
    
    D1[Document: Budget Bill<br/>H901FiU1]
    V1[Vote: Ja<br/>2024-11-20]
    C1[Committee: Finansutskottet]
    
    P1 -->|MEMBER_OF| Party_S
    P2 -->|MEMBER_OF| Party_M
    P3 -->|MEMBER_OF| Party_SD
    
    P1 -->|AUTHORED| D1
    P1 -->|CAST_VOTE| V1
    P1 -->|ASSIGNED_TO| C1
    
    P2 -->|CAST_VOTE| V1
    P3 -->|CAST_VOTE| V1
    
    Party_M -->|COALITION_WITH| Party_SD
    
    D1 -->|PROCESSED_BY| C1
    D1 -->|TRIGGERED_VOTE| V1
    
    style P1 fill:#4CAF50,color:#fff
    style P2 fill:#4CAF50,color:#fff
    style P3 fill:#D32F2F,color:#fff
    style Party_S fill:#9E9E9E,color:#fff
    style Party_M fill:#9E9E9E,color:#fff
    style Party_SD fill:#9E9E9E,color:#fff
    style D1 fill:#455A64,color:#fff
    style V1 fill:#FFC107,color:#000
    style C1 fill:#FF9800,color:#fff

5.5 Time-Series Data Flow

graph TB
    subgraph "Data Collection (Hourly)"
        Collector[Lambda Collector]
        CIA_API[CIA API]
    end
    
    subgraph "Amazon Timestream"
        VT[Vote Trends Table]
        PP[Party Popularity Table]
        DT[Document Trends Table]
    end
    
    subgraph "Analytics"
        QuickSight[QuickSight Dashboards]
        Lambda_Analysis[Lambda Analytics]
    end
    
    CIA_API --> Collector
    Collector --> VT
    Collector --> PP
    Collector --> DT
    
    VT --> QuickSight
    PP --> QuickSight
    DT --> QuickSight
    
    VT --> Lambda_Analysis
    PP --> Lambda_Analysis
    DT --> Lambda_Analysis
    
    Lambda_Analysis --> Forecast[Forecast Models]
    
    style VT fill:#4CAF50,color:#fff
    style PP fill:#4CAF50,color:#fff
    style DT fill:#4CAF50,color:#fff
    style QuickSight fill:#455A64,color:#fff

5.6 Bedrock Knowledge Base RAG Pipeline

graph TB
    subgraph "Data Sources"
        Aurora_DB[(Aurora<br/>Documents)]
        S3_Docs[S3 Document Storage]
    end
    
    subgraph "Embedding Generation"
        Bedrock_Titan[Bedrock Titan<br/>Embeddings v2<br/>8192-dim]
    end
    
    subgraph "Vector Storage"
        OpenSearch_VS[(OpenSearch Serverless<br/>Vector Index)]
    end
    
    subgraph "Bedrock Knowledge Base"
        KB[Knowledge Base<br/>Riksdagsmonitor-KB]
        Claude[Claude Opus 6.0<br/>Generation Model]
    end
    
    subgraph "Application"
        AppSync_API[AppSync GraphQL]
        Lambda_RAG[Lambda RAG Function]
        Client[Static Site]
    end
    
    Aurora_DB --> Bedrock_Titan
    S3_Docs --> Bedrock_Titan
    Bedrock_Titan --> OpenSearch_VS
    
    OpenSearch_VS --> KB
    KB --> Claude
    
    Client --> AppSync_API
    AppSync_API --> Lambda_RAG
    Lambda_RAG --> KB
    KB --> Lambda_RAG
    Lambda_RAG --> AppSync_API
    AppSync_API --> Client
    
    style Bedrock_Titan fill:#FF9800,color:#fff
    style OpenSearch_VS fill:#4CAF50,color:#fff
    style KB fill:#455A64,color:#fff
    style Claude fill:#D32F2F,color:#fff

🗓️ 6. Implementation Roadmap

6.1 Four-Phase Evolution (2026-2032)

gantt
    title Riksdagsmonitor Data Architecture Roadmap (2026-2032)
    dateFormat YYYY-MM
    
    section Phase 1: CSV → API Gateway
    CIA API Integration :p1, 2026-01, 12M
    Lambda Polling Functions :p1a, 2026-06, 6M
    Data Validation Pipeline :p1b, 2026-09, 3M
    
    section Phase 2: AWS Serverless Migration
    Aurora Serverless v2 Setup :p2, 2027-01, 3M
    Neptune Serverless Graph :p2a, 2027-04, 4M
    DynamoDB Tables :p2b, 2027-06, 2M
    OpenSearch Serverless :p2c, 2027-08, 3M
    
    section Phase 3: AI/ML Integration
    Bedrock Titan Embeddings :p3, 2028-01, 4M
    Bedrock Knowledge Bases :p3a, 2028-05, 3M
    Semantic Search :p3b, 2028-08, 4M
    Predictive Models :p3c, 2029-01, 6M
    
    section Phase 4: Advanced Analytics
    Timestream Integration :p4, 2030-01, 3M
    Real-Time Streaming :p4a, 2030-04, 4M
    Advanced Forecasting :p4b, 2030-08, 6M
    ML Model Optimization :p4c, 2031-01, 12M

6.2 Phase Details

Phase 1: CSV → API Gateway (2026-2027)

Quarter	Milestone	Deliverables
Q1 2026	CIA API Integration Planning	API specification, authentication setup
Q2 2026	Lambda Polling Functions	Automated data ingestion from CIA API
Q3 2026	Data Validation Pipeline	Schema validation, error handling
Q4 2026	Hybrid System	CSV + API Gateway dual sources

Key Metrics:

API uptime: 99.9%
Data freshness: < 1 hour
Error rate: < 0.1%

Phase 2: AWS Serverless Migration (2027-2028)

Quarter	Milestone	Deliverables
Q1 2027	Aurora Setup	Relational database with 2,494 politicians
Q2 2027	Neptune Graph	Graph database with 5M relationships
Q3 2027	DynamoDB + AppSync	NoSQL + GraphQL API layer
Q4 2027	OpenSearch Indexing	Full-text search across 109K documents

Key Metrics:

Aurora ACU: 0.5-2 (auto-scaling)
Neptune NCU: 2.5 (serverless)
DynamoDB RCU/WCU: On-demand
OpenSearch OCU: 2 (compute + indexing)

Phase 3: AI/ML Integration (2028-2030)

Quarter	Milestone	Deliverables
Q1 2028	Bedrock Titan Embeddings	8192-dim vectors for all documents
Q2 2028	Bedrock Knowledge Bases	RAG pipeline with Claude Opus 6.0
Q3 2028	Semantic Search	Vector similarity search
Q1 2029	Predictive Models	Vote prediction, election forecasting

Key Metrics:

Embedding generation: 1000 docs/hour
Semantic search latency: < 500ms
RAG response time: < 3s
Model accuracy: > 85%

Phase 4: Advanced Analytics (2030-2032)

Quarter	Milestone	Deliverables
Q1 2030	Timestream Integration	Historical trends, time-series analytics
Q2 2030	Real-Time Streaming	EventBridge + Kinesis pipelines
Q3 2030	Advanced Forecasting	Coalition prediction, risk assessment
2031-2032	Optimization	ML model tuning, cost optimization

Key Metrics:

Time-series queries: < 1s
Real-time latency: < 100ms
Forecast accuracy: > 90%
Total AWS cost: < $5000/month

🔧 7. Technology Stack Evolution

7.1 Current vs Future Stack

Component	Current (2026)	Phase 2 (2028)	Phase 4 (2032)
Frontend	Static HTML/CSS/JS	Static HTML/CSS/JS	Static HTML/CSS/JS
Hosting	GitHub Pages	GitHub Pages	GitHub Pages
API Layer	None	AWS AppSync GraphQL	AppSync + Lambda
Database	GitHub files	Aurora Serverless v2	Aurora Global DB
Graph DB	None	Neptune Serverless	Neptune Analytics
NoSQL	None	DynamoDB	DynamoDB Global Tables
Search	None	OpenSearch Serverless	OpenSearch + Bedrock KB
Time-Series	None	None	Timestream
Embeddings	None	Bedrock Titan v2 (8192-dim)	Bedrock Titan v3
AI/ML	None	Bedrock Knowledge Bases	Bedrock + SageMaker
Compute	None	AWS Lambda	Lambda + Step Functions
Orchestration	GitHub Actions	EventBridge	EventBridge + SQS
Monitoring	None	CloudWatch	CloudWatch + X-Ray
Security	GitHub ISMS	AWS IAM + Secrets Manager	IAM + GuardDuty + Macie

7.2 Cost Projections

Service	2026	2028	2032
Aurora Serverless v2	$0	$50/month	$200/month
Neptune Serverless	$0	$100/month	$500/month
DynamoDB	$0	$20/month	$100/month
OpenSearch Serverless	$0	$150/month	$500/month
Timestream	$0	$0	$100/month
Bedrock (embeddings)	$0	$200/month	$1000/month
Lambda	$0	$50/month	$200/month
AppSync	$0	$30/month	$100/month
EventBridge	$0	$10/month	$50/month
S3 + Data Transfer	$0	$20/month	$100/month
CloudWatch	$0	$20/month	$50/month
Total Monthly Cost	$0	$650/month	$2900/month

7.3 Scalability Targets

Metric	2026	2028	2032
Documents	109K	500K	10M
Politicians	2,494	10K	50K
Votes	3.5M	10M	100M
Graph Relationships	0	5M	100M
Vector Embeddings	0	500K	10M
API Requests/Day	0	10K	1M
Data Storage	1GB	100GB	5TB
Concurrent Users	100	1K	10K

🔐 8. ISMS Compliance & Data Governance

8.1 ISO 27001:2022 Controls

A.8 Asset Management:

Aurora/Neptune/DynamoDB data classification (Public/Internal)
Automated asset inventory via AWS Config
Data retention policies (7 years for political data)

A.18 Compliance:

GDPR Article 17 (Right to erasure) via Lambda deletion functions
GDPR Article 20 (Data portability) via AppSync export queries
Swedish Archive Act compliance for parliamentary records

8.2 NIST CSF 2.0 Mapping

ID.AM (Asset Management):

AWS Systems Manager inventory
Automated tagging strategy

PR.DS (Data Security):

Aurora/Neptune encryption at rest (AWS KMS)
TLS 1.3 for data in transit
Bedrock model access controls

DE.CM (Continuous Monitoring):

CloudWatch anomaly detection
GuardDuty threat detection
VPC Flow Logs analysis

8.3 CIS Controls v8.1

Control 1 (Inventory):

AWS Config tracking all resources
Quarterly audit reports

Control 3 (Data Protection):

S3 bucket versioning + lifecycle policies
Aurora automated backups (35 days retention)
Neptune backups (daily snapshots)

Control 11 (Data Recovery):

Multi-region Aurora Global Database
Neptune cross-region replication
DynamoDB point-in-time recovery (PITR)

8.4 Data Governance

Data Classification:

Data Type	Classification	Retention	Encryption
Politician personal data	Public	Permanent	KMS (at rest)
Voting records	Public	7 years	KMS (at rest)
Documents	Public	Permanent	KMS (at rest)
Risk scores	Internal	2 years	KMS (at rest + in transit)
API access logs	Internal	1 year	KMS (at rest)
Bedrock model inputs/outputs	Internal	30 days	KMS (ephemeral)

Data Lifecycle:

stateDiagram-v2
    [*] --> Ingested: CIA API
    Ingested --> Validated: Schema check
    Validated --> Stored: Aurora/Neptune/DynamoDB
    Stored --> Indexed: OpenSearch + Bedrock
    Indexed --> Published: AppSync GraphQL
    Published --> Archived: After 7 years
    Archived --> [*]
    
    Stored --> Deleted: GDPR request
    Deleted --> [*]

Privacy by Design:

No PII beyond public records
Anonymized analytics data
GDPR-compliant deletion via Lambda functions
Bedrock model data retention: 30 days max (AWS configuration)

📚 9. Related Documentation

9.1 Architecture Documentation

ARCHITECTURE.md - Current static site architecture
DATA_MODEL.md - Current data model (CSV-based)
FUTURE_FLOWCHART.md - Current data flow diagrams

9.2 Security Documentation

SECURITY_ARCHITECTURE.md - Current security controls
FUTURE_SECURITY_ARCHITECTURE.md - Future AWS security architecture
THREAT_MODEL.md - STRIDE threat analysis
Hack23 ISMS - Organization-wide ISMS policies

9.3 Technical Documentation

TRANSLATION_GUIDE.md - Multi-language support (14 languages)
WORKFLOWS.md - GitHub Actions CI/CD pipelines
LABELS.md - Issue management taxonomy

9.4 External References

AWS Neptune Serverless - Graph database documentation
AWS Aurora Serverless v2 - Relational database
Amazon OpenSearch Serverless - Search and vector store
Amazon Bedrock - AI/ML services (Titan, Knowledge Bases)
Amazon Timestream - Time-series database
AWS AppSync - GraphQL API service

🤖 AI/LLM Data Architecture Evolution (2026-2037)

Data Model Impact of AI Evolution

AI Model Update Cadence: Anthropic Opus minor updates every ~2.3 months, major versions annually

Period	AI Model	Data Architecture Impact	New Data Entities
2026-2027	Opus 4.7-5.x	Enhanced embeddings, improved entity extraction	AI audit logs, model version tracking
2028-2029	Opus 6.x-7.x	Multi-modal data storage, video/audio political content	Media assets, content provenance records
2030-2032	Opus 8.x-10.x	Near-expert analysis data, global parliament schemas	Cross-parliament entities, policy impact models
2033-2035	Pre-AGI systems	Autonomous schema evolution, self-organizing knowledge graphs	Emergent relationship types, dynamic taxonomies
2036-2037	AGI / Post-AGI	Universal political data ontology, real-time global coverage	195 parliament datasets, global democracy metrics

AI-Driven Data Capabilities

Continuous Model Integration (Every ~2.3 Months):

Embedding dimension upgrades (768 → 1024 → 2048 → 8192+) tracked in vector DB metadata
Schema versioning aligned with AI model capabilities
Backward-compatible data migration for each model update
Automated data quality assessment using latest model capabilities

Competitor Model Data Considerations:

Multi-model embedding storage (separate vector spaces per model family)
Model-agnostic entity extraction pipeline
Cross-model consistency validation for political entity resolution
Data portability across AI providers via standardized schemas

Extended Data Scale Projections

Metric	2026	2028	2030	2033	2037
Politicians tracked	2,494	5,000+	15,000+	50,000+	500,000+
Documents indexed	109K	500K	2M+	10M+	100M+
Voting records	3.5M	10M+	25M+	100M+	1B+
Languages	14	30+	50+	100+	All UN
Parliaments	1	4	10+	50+	195
AI model versions	1	5+	10+	20+	30+
Data refresh	Daily	Hourly	Real-time	Sub-second	Predictive

📋 Document Control

Document Information:

Repository: github.com/Hack23/riksdagsmonitor
Path: /FUTURE_DATA_MODEL.md
Format: Markdown with Mermaid diagrams
Classification: Public
Language: English (technical documentation)

Version History:

Version	Date	Author	Changes
1.0	2026-02-15	CEO	Initial version - AWS Serverless architecture
2.0	2026-02-24	CEO	Extended to 2037 vision, AI/LLM data architecture, global scale projections

Approval:

Document Owner: CEO, Hack23 AB
Approved Date: 2026-02-15
Next Review: 2027-02-24 (Annual)

Distribution:

Public repository: github.com/Hack23/riksdagsmonitor
Documentation site: riksdagsmonitor.se/docs

🏢 Hack23 AB (Org.nr 5595347807)
📍 Stockholm, Sweden
🌐 hack23.com | riksdagsmonitor.se
📧 Contact: GitHub Issues

This document is part of Riksdagsmonitor's comprehensive documentation portfolio, demonstrating commitment to transparency, security, and technical excellence in Swedish political intelligence.

📚 Related Documents

Riksdagsmonitor Architecture Portfolio

Document	Focus	Description
🏛️ Architecture	🏗️ C4 Models	System context, containers, components
📊 Data Model	📊 Data	Current entity relationships and data dictionary
📊 Future Data Model	🔮 Data	Enhanced data architecture plans (this document)
🔄 Flowchart	🔄 Processes	Business and data flow diagrams
📈 State Diagram	📈 States	System state transitions and lifecycles
🧠 Mindmap	🧠 Concepts	System conceptual relationships
💼 SWOT	💼 Strategy	Strategic analysis and positioning
🛡️ Security Architecture	🔒 Security	Current security controls and design
🎯 Threat Model	🎯 Threats	STRIDE/MITRE ATT&CK analysis
🚀 Future Architecture	🔮 Evolution	Architectural evolution roadmap

Hack23 ISMS Policies

🛡️ Secure Development Policy — Architecture documentation requirements
🏷️ Classification Framework — CIA triad classification
📉 Risk Register — Enterprise risk management

📋 Document Control:
✅ Approved by: James Pether Sörling, CEO
📤 Distribution: Public
🏷️ Classification:
📅 Effective Date: 2026-02-24
⏰ Next Review: 2027-02-24
🎯 Framework Compliance:

FilesExpand file tree

FUTURE_DATA_MODEL.md

Latest commit

History

FUTURE_DATA_MODEL.md

File metadata and controls

📊 Riksdagsmonitor — Future Data Architecture Model

📚 Architecture Documentation Map

📊 Executive Summary

📚 Table of Contents

🔄 1. Current State vs Future State

1.1 Architecture Comparison

1.2 Data Volume Projections

☁️ 2. AWS Serverless Data Architecture

2.1 Amazon Neptune Serverless (Graph Database)

2.1.1 Core Node Types

2.1.2 Relationship Edges

2.1.3 Gremlin Query Examples

2.2 Amazon Aurora Serverless v2 (Relational)

2.2.1 Critical Tables Schema

2.2.2 Key Performance Queries

2.3 Amazon DynamoDB (NoSQL)

2.3.1 Table Designs

2.3.2 Access Patterns

2.4 Amazon OpenSearch Serverless (Search/Vector)

2.4.1 Index Mappings

2.4.2 Query Examples

2.5 Amazon Timestream (Time-Series)

2.5.1 Table Schema

2.5.2 Query Examples

2.6 Amazon Bedrock (AI/ML)

2.6.1 Titan Embeddings v2

2.6.2 Bedrock Knowledge Bases (RAG)

🔗 3. CIA JSON API Gateway Integration

3.1 Current State: CSV Exports (Temporary)

3.2 Phase 1: CIA JSON API Gateway (2026-2027)

3.3 Phase 2: Native AWS Integration (2027-2028)

3.4 Phase 3: Real-Time Streaming (2028-2030)

3.5 Data Migration Strategy

🔌 4. GraphQL API Schema

4.1 Core Types

4.2 Queries

4.3 Mutations

4.4 Subscriptions

📐 5. Data Model Diagrams

5.1 Political Entities ERD

5.2 AWS Service Integration

5.3 Data Flow Sequence

5.4 Neptune Graph Visualization

5.5 Time-Series Data Flow

5.6 Bedrock Knowledge Base RAG Pipeline

🗓️ 6. Implementation Roadmap

6.1 Four-Phase Evolution (2026-2032)

6.2 Phase Details

Phase 1: CSV → API Gateway (2026-2027)

Phase 2: AWS Serverless Migration (2027-2028)

Phase 3: AI/ML Integration (2028-2030)

Phase 4: Advanced Analytics (2030-2032)

🔧 7. Technology Stack Evolution

7.1 Current vs Future Stack

7.2 Cost Projections

7.3 Scalability Targets

🔐 8. ISMS Compliance & Data Governance

8.1 ISO 27001:2022 Controls

8.2 NIST CSF 2.0 Mapping

8.3 CIS Controls v8.1

8.4 Data Governance

📚 9. Related Documentation

9.1 Architecture Documentation

9.2 Security Documentation

9.3 Technical Documentation

9.4 External References

🤖 AI/LLM Data Architecture Evolution (2026-2037)

Data Model Impact of AI Evolution

AI-Driven Data Capabilities

Extended Data Scale Projections

📋 Document Control

📚 Related Documents

Riksdagsmonitor Architecture Portfolio

Hack23 ISMS Policies