🚀 Evolution: CSV → API Gateway → AWS Serverless Intelligence
🎯 Neptune Graph · Aurora Relational · OpenSearch Vector · Bedrock AI
📋 Document Owner: CEO | 📄 Version: 2.0 | 📅 Last Updated: 2026-02-24 (UTC)
🔄 Review Cycle: Annual | ⏰ Next Review: 2027-02-24
🏢 Owner: Hack23 AB (Org.nr 5595347807) | 🏷️ Classification: Public
| Document | Type | Description |
|---|---|---|
| Architecture | 🏛️ Current | C4 model showing system structure |
| Data Model | 📊 Current | Data entities and relationships |
| Flowcharts | 🔄 Current | Process flows and pipelines |
| State Diagrams | 🔄 Current | System state transitions |
| Mindmap | 🗺️ Current | System conceptual map |
| SWOT | 💼 Current | Strategic analysis |
| Future Architecture | 🏗️ Future | System evolution roadmap |
| Future Data Model | 📊 Future | Enhanced data architecture (this doc) |
| Future Flowcharts | 🔄 Future | Advanced process flows |
| Future State Diagrams | 🔄 Future | Advanced state management |
| Future Mindmap | 🗺️ Future | Future capability map |
| Future SWOT | 💼 Future | Strategic outlook |
| Security Architecture | 🛡️ Security | Defense-in-depth controls |
| Future Security Architecture | 🛡️ Future | Security roadmap |
| Threat Model | 🎯 Security | STRIDE analysis |
Riksdagsmonitor's data architecture evolves over 2026-2037 from static CSV files to a fully-managed AWS Serverless intelligence platform. This transformation enables real-time political analytics, AI-powered insights, and scalable processing of Swedish parliamentary data.
Strategic Vision (2026-2037):
- 🔄 Phase 1 (2026-2027): CSV → CIA JSON API Gateway integration
- ☁️ Phase 2 (2027-2028): AWS Serverless migration (Neptune, Aurora, DynamoDB, OpenSearch)
- 🤖 Phase 3 (2028-2030): AI/ML with Amazon Bedrock (embeddings, RAG, forecasting)
- 📊 Phase 4 (2030-2032): Advanced analytics with Timestream and real-time streaming
- 🧠 Phase 5 (2033-2035): Pre-AGI data architecture with autonomous schema evolution
- 🌐 Phase 6 (2036-2037): AGI-era data platform supporting 195 global parliaments
Key Transformations:
| Aspect | Current (2026) | Future (2037) |
|---|---|---|
| Data Source | CIA CSV exports (static) | CIA JSON API Gateway (real-time) |
| Database | GitHub repository files | Neptune Graph + Aurora Serverless v2 |
| Search | Text matching | OpenSearch Serverless + semantic vectors |
| Query | JavaScript filters | AWS AppSync GraphQL API |
| Analytics | Static aggregations | Timestream time-series + Lambda analytics |
| AI/ML | None | Bedrock Titan Embeddings (8192-dim) + RAG |
| Scale | 109K documents | 100M+ documents with global parliament coverage |
| Compute | Static site | AWS Lambda serverless functions |
| Orchestration | GitHub Actions | AWS Step Functions |
Current Baseline:
- 2,494 Politicians → Future: Complete career graphs in Neptune
- 3.5M+ Voting Records → Future: Real-time vote prediction models
- 109K Documents → Future: Semantic search with Bedrock embeddings
- 19 CIA Products → Future: 100+ intelligence products via API Gateway
- Current State vs Future State
- AWS Serverless Data Architecture
- CIA JSON API Gateway Integration
- GraphQL API Schema
- Data Model Diagrams
- Implementation Roadmap
- Technology Stack Evolution
- ISMS Compliance & Data Governance
- Related Documentation
| Component | Current (2026) | Phase 2 (2028) | Phase 4 (2032) |
|---|---|---|---|
| Data Ingestion | Manual CSV downloads | CIA API Gateway polling | Real-time event streaming |
| Storage Layer | GitHub repo (< 1GB) | Aurora 100GB + Neptune 500GB | Aurora 500GB + Neptune 5TB |
| Graph Database | None | Neptune Serverless (Gremlin) | Neptune Analytics + ML |
| Relational DB | None | Aurora Serverless v2 (PostgreSQL) | Aurora Global Database |
| Vector Search | None | OpenSearch Serverless | OpenSearch + Bedrock KB |
| Time-Series | None | Timestream (historical trends) | Timestream (forecasting) |
| API Layer | Static JSON files | AppSync GraphQL | AppSync + Lambda resolvers |
| AI/ML | None | Bedrock Titan Embeddings | Bedrock + SageMaker |
| Compute | Static site | Lambda functions | Lambda + Step Functions |
| Monitoring | None | CloudWatch Logs | CloudWatch + X-Ray tracing |
| Metric | 2026 | 2028 | 2032 |
|---|---|---|---|
| Politicians | 2,494 | 10,000 | 50,000 |
| Voting Records | 3.5M | 10M | 100M |
| Documents | 109K | 500K | 10M |
| Graph Relationships | 0 | 5M | 100M |
| Vector Embeddings | 0 | 500K | 10M |
| API Requests/Day | 0 | 10K | 1M |
Purpose: Store political relationships, influence networks, coalition patterns.
// Politician Vertex
g.addV('Politician').
property('person_id', '0479479309').
property('first_name', 'Anna').
property('last_name', 'Svensson').
property('party', 'S').
property('born_year', 1975).
property('district', 'Stockholm').
property('risk_score', 42.5).
property('risk_level', 'MEDIUM')
// Party Vertex
g.addV('Party').
property('party_id', 'S').
property('party_name', 'Socialdemokraterna').
property('founded_year', 1889).
property('current_seats', 107)
// Document Vertex
g.addV('Document').
property('document_id', 'H901FiU1').
property('document_type', 'bet').
property('title', 'Finansutskottets betänkande').
property('published_date', '2024-11-15').
property('status', 'BESLUTAD')
// Vote Vertex
g.addV('Vote').
property('vote_id', 'V202400123').
property('ballot_id', 'B20240056').
property('vote', 'Ja').
property('vote_date', '2024-11-20').
property('is_rebel_vote', false)
// Committee Vertex
g.addV('Committee').
property('committee_id', 'FiU').
property('committee_name', 'Finansutskottet').
property('established_year', 1867).
property('total_members', 17)
// Political relationships
g.V().has('Politician','person_id','0479479309').
addE('MEMBER_OF').property('since', '2018-01-01').
to(g.V().has('Party','party_id','S'))
g.V().has('Politician','person_id','0479479309').
addE('CAST_VOTE').property('vote', 'Ja').
to(g.V().has('Vote','vote_id','V202400123'))
g.V().has('Politician','person_id','0479479309').
addE('AUTHORED').property('author_order', 1).
to(g.V().has('Document','document_id','H901FiU1'))
// Coalition edges
g.V().has('Party','party_id','M').
addE('COALITION_WITH').property('government_id', 'GOV_2022').
to(g.V().has('Party','party_id','SD'))
// Influence network
g.V().has('Politician','person_id','P1').
addE('INFLUENCES').property('strength', 0.75).
to(g.V().has('Politician','person_id','P2'))
Example 1: Find MPs with highest rebellion rate
g.V().hasLabel('Politician').
project('name','party','rebel_count').
by(values('first_name','last_name').fold()).
by(values('party')).
by(outE('CAST_VOTE').has('is_rebel', true).count()).
order().by('rebel_count', desc).
limit(10)
Example 2: Coalition formation patterns
g.V().hasLabel('Party').as('party1').
outE('COALITION_WITH').as('coalition').
inV().as('party2').
group().
by(select('party1').values('party_name')).
by(select('party2').values('party_name').fold()).
unfold()
Example 3: Document influence cascades
g.V().has('Document','document_type','prop').
repeat(out('REFERENCES')).
times(3).
path().
by('title').
limit(20)
Purpose: Core structured data with ACID guarantees (politicians, parties, documents, votes).
Politicians Table
CREATE TABLE politicians (
person_id VARCHAR(20) PRIMARY KEY,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL,
party VARCHAR(10) REFERENCES parties(party_id),
born_year INTEGER,
gender VARCHAR(20),
status VARCHAR(100),
district VARCHAR(100),
risk_score DECIMAL(5,2),
risk_level VARCHAR(20),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_politicians_party ON politicians(party);
CREATE INDEX idx_politicians_risk ON politicians(risk_level, risk_score);Parties Table
CREATE TABLE parties (
party_id VARCHAR(10) PRIMARY KEY,
party_name VARCHAR(200) NOT NULL,
party_name_en VARCHAR(200),
founded_year INTEGER,
ideology VARCHAR(200),
riksdag_status VARCHAR(50),
avg_win_rate DECIMAL(5,2),
current_seats INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);Documents Table
CREATE TABLE documents (
document_id VARCHAR(50) PRIMARY KEY,
document_type VARCHAR(20) NOT NULL,
title TEXT NOT NULL,
subtitle TEXT,
summary TEXT,
published_date DATE,
rm VARCHAR(20),
organ VARCHAR(20),
status VARCHAR(50),
fulltext TEXT,
embedding_id VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_documents_type ON documents(document_type);
CREATE INDEX idx_documents_date ON documents(published_date DESC);
CREATE INDEX idx_documents_organ ON documents(organ);
CREATE INDEX idx_documents_fulltext ON documents USING GIN (
to_tsvector('simple',
coalesce(title, '') || ' ' ||
coalesce(summary, '') || ' ' ||
coalesce(fulltext, '')
)
);Votes Table
CREATE TABLE votes (
vote_id VARCHAR(50) PRIMARY KEY,
ballot_id VARCHAR(50) NOT NULL,
person_id VARCHAR(20) REFERENCES politicians(person_id),
party VARCHAR(10) REFERENCES parties(party_id),
vote VARCHAR(20) NOT NULL,
vote_date DATE NOT NULL,
vote_time TIME,
is_rebel_vote BOOLEAN DEFAULT FALSE,
is_winning_vote BOOLEAN,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_votes_person ON votes(person_id);
CREATE INDEX idx_votes_ballot ON votes(ballot_id);
CREATE INDEX idx_votes_date ON votes(vote_date DESC);
CREATE INDEX idx_votes_rebel ON votes(is_rebel_vote) WHERE is_rebel_vote = TRUE;Committees Table
CREATE TABLE committees (
committee_id VARCHAR(20) PRIMARY KEY,
committee_name VARCHAR(200) NOT NULL,
committee_name_en VARCHAR(200),
established_year INTEGER,
total_members INTEGER,
productivity_score DECIMAL(5,2),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);Most Active Politicians
SELECT
p.person_id,
p.first_name,
p.last_name,
p.party,
COUNT(DISTINCT v.vote_id) as vote_count,
COUNT(DISTINCT CASE WHEN v.is_rebel_vote THEN v.vote_id END) as rebel_count
FROM politicians p
LEFT JOIN votes v ON v.person_id = p.person_id
WHERE p.status = 'Tjänstgörande riksdagsledamot'
GROUP BY p.person_id, p.first_name, p.last_name, p.party
ORDER BY vote_count DESC
LIMIT 20;Party Voting Discipline
SELECT
p.party,
pa.party_name,
COUNT(v.vote_id) as total_votes,
COUNT(CASE WHEN v.is_rebel_vote THEN 1 END) as rebel_votes,
ROUND(COUNT(CASE WHEN v.is_rebel_vote THEN 1 END) * 100.0 / COUNT(v.vote_id), 2) as rebel_rate
FROM votes v
JOIN politicians p ON p.person_id = v.person_id
JOIN parties pa ON pa.party_id = p.party
WHERE v.vote_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY p.party, pa.party_name
ORDER BY rebel_rate DESC;Purpose: Low-latency real-time data access, session storage, API caching.
Politician Profiles (Fast Lookup)
{
"TableName": "PoliticianProfiles",
"KeySchema": [
{"AttributeName": "person_id", "KeyType": "HASH"}
],
"AttributeDefinitions": [
{"AttributeName": "person_id", "AttributeType": "S"},
{"AttributeName": "party", "AttributeType": "S"},
{"AttributeName": "risk_level", "AttributeType": "S"}
],
"GlobalSecondaryIndexes": [
{
"IndexName": "PartyIndex",
"KeySchema": [
{"AttributeName": "party", "KeyType": "HASH"}
]
},
{
"IndexName": "RiskIndex",
"KeySchema": [
{"AttributeName": "risk_level", "KeyType": "HASH"}
]
}
]
}Recent Votes (Time-Ordered)
{
"TableName": "RecentVotes",
"KeySchema": [
{"AttributeName": "ballot_id", "KeyType": "HASH"},
{"AttributeName": "person_id", "KeyType": "RANGE"}
],
"AttributeDefinitions": [
{"AttributeName": "ballot_id", "AttributeType": "S"},
{"AttributeName": "person_id", "AttributeType": "S"},
{"AttributeName": "vote_date", "AttributeType": "S"}
],
"GlobalSecondaryIndexes": [
{
"IndexName": "DateIndex",
"KeySchema": [
{"AttributeName": "vote_date", "KeyType": "HASH"}
]
}
],
"TimeToLiveSpecification": {
"Enabled": true,
"AttributeName": "expiration_time"
}
}API Response Cache
{
"TableName": "APICache",
"KeySchema": [
{"AttributeName": "cache_key", "KeyType": "HASH"}
],
"AttributeDefinitions": [
{"AttributeName": "cache_key", "AttributeType": "S"}
],
"TimeToLiveSpecification": {
"Enabled": true,
"AttributeName": "ttl"
}
}Get Politician Profile
const params = {
TableName: 'PoliticianProfiles',
Key: { person_id: '0479479309' }
};
const result = await dynamodb.get(params).promise();Query Party Members
const params = {
TableName: 'PoliticianProfiles',
IndexName: 'PartyIndex',
KeyConditionExpression: 'party = :party',
ExpressionAttributeValues: { ':party': 'S' }
};
const result = await dynamodb.query(params).promise();Purpose: Full-text search, semantic search with vector embeddings, aggregations.
Documents Index
{
"mappings": {
"properties": {
"document_id": {"type": "keyword"},
"document_type": {"type": "keyword"},
"title": {
"type": "text",
"fields": {
"keyword": {"type": "keyword"}
},
"analyzer": "swedish"
},
"summary": {"type": "text", "analyzer": "swedish"},
"fulltext": {"type": "text", "analyzer": "swedish"},
"published_date": {"type": "date"},
"rm": {"type": "keyword"},
"organ": {"type": "keyword"},
"status": {"type": "keyword"},
"authors": {"type": "keyword"},
"party": {"type": "keyword"},
"embedding_vector": {
"type": "knn_vector",
"dimension": 8192,
"method": {
"name": "hnsw",
"space_type": "cosinesimilarity",
"engine": "nmslib"
}
}
}
}
}Politicians Index
{
"mappings": {
"properties": {
"person_id": {"type": "keyword"},
"full_name": {"type": "text", "analyzer": "swedish"},
"party": {"type": "keyword"},
"district": {"type": "keyword"},
"risk_level": {"type": "keyword"},
"risk_score": {"type": "float"}
}
}
}Full-Text Search
{
"query": {
"multi_match": {
"query": "budget finanspolitik",
"fields": ["title^3", "summary^2", "fulltext"],
"type": "best_fields",
"operator": "and"
}
},
"highlight": {
"fields": {
"title": {},
"summary": {}
}
}
}Semantic Vector Search with Bedrock Embeddings
{
"query": {
"knn": {
"embedding_vector": {
"vector": [/* 8192-dim vector from Bedrock */],
"k": 10
}
}
},
"filter": {
"bool": {
"must": [
{"term": {"document_type": "prop"}},
{"range": {"published_date": {"gte": "2024-01-01"}}}
]
}
}
}Aggregations (Party Distribution)
{
"query": {"match_all": {}},
"aggs": {
"by_party": {
"terms": {"field": "party", "size": 10},
"aggs": {
"avg_risk": {"avg": {"field": "risk_score"}}
}
}
}
}Purpose: Historical trends, vote patterns over time, forecasting data.
Vote Trends Table
CREATE TABLE VoteTrends (
ballot_id VARCHAR,
vote_date TIMESTAMP,
party VARCHAR,
vote_type VARCHAR, -- Ja, Nej, Avstår
vote_count BIGINT,
rebel_count BIGINT,
PRIMARY KEY (ballot_id, vote_date)
);Party Popularity Trends
CREATE TABLE PartyPopularityTrends (
party VARCHAR,
measurement_date TIMESTAMP,
polling_percentage DOUBLE,
riksdag_seats INTEGER,
approval_rating DOUBLE,
PRIMARY KEY (party, measurement_date)
);Party Voting Patterns (Last 90 Days)
SELECT
party,
BIN(vote_date, 7d) as week,
SUM(vote_count) as total_votes,
SUM(rebel_count) as total_rebels,
SUM(rebel_count) * 100.0 / SUM(vote_count) as rebel_rate
FROM VoteTrends
WHERE vote_date > ago(90d)
GROUP BY party, BIN(vote_date, 7d)
ORDER BY party, week DESC;Trending Topics
SELECT
topic,
COUNT(*) as mention_count,
BIN(published_date, 1d) as day
FROM DocumentTopics
WHERE published_date > ago(30d)
GROUP BY topic, BIN(published_date, 1d)
ORDER BY mention_count DESC
LIMIT 10;Purpose: Text embeddings, semantic search, RAG (Retrieval Augmented Generation), content generation.
Generate 8192-Dimensional Embeddings
import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";
const client = new BedrockRuntimeClient({ region: "us-east-1" });
async function generateEmbedding(text) {
const command = new InvokeModelCommand({
modelId: "amazon.titan-embed-text-v2:0",
contentType: "application/json",
accept: "application/json",
body: JSON.stringify({
inputText: text,
dimensions: 8192,
normalize: true
})
});
const response = await client.send(command);
const responseBody = JSON.parse(new TextDecoder().decode(response.body));
return responseBody.embedding; // 8192-dimensional vector
}Embed Document for Semantic Search
// Initialize OpenSearch Serverless client
const { Client } = require('@opensearch-project/opensearch');
const { defaultProvider } = require('@aws-sdk/credential-provider-node');
const aws4 = require('aws4');
const opensearch = new Client({
node: process.env.OPENSEARCH_ENDPOINT,
...aws4.sign({
service: 'aoss',
region: 'us-east-1'
}, defaultProvider())
});
async function embedDocument(document) {
const fullText = [
document.title,
document.subtitle,
document.summary,
document.fulltext.slice(0, 10000)
].filter(Boolean).join("\n\n");
const embedding = await generateEmbedding(fullText);
// Store in OpenSearch
await opensearch.index({
index: 'documents',
id: document.document_id,
body: {
...document,
embedding_vector: embedding
}
});
}Create Knowledge Base
const kbConfig = {
name: "Riksdagsmonitor-KB",
description: "Swedish parliamentary documents and intelligence",
roleArn: "arn:aws:iam::ACCOUNT:role/BedrockKBRole",
storageConfiguration: {
type: "OPENSEARCH_SERVERLESS",
opensearchServerlessConfiguration: {
collectionArn: "arn:aws:aoss:us-east-1:ACCOUNT:collection/riksdag-docs",
vectorIndexName: "documents",
fieldMapping: {
vectorField: "embedding_vector",
textField: "fulltext",
metadataField: "metadata"
}
}
},
embeddingModelArn: "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
};Query Knowledge Base
// Required imports
import {
BedrockAgentRuntimeClient,
RetrieveAndGenerateCommand
} from "@aws-sdk/client-bedrock-agent-runtime";
// Initialize Bedrock Agent Runtime client
const bedrockAgent = new BedrockAgentRuntimeClient({
region: "us-east-1"
});
async function queryKnowledgeBase(question) {
const command = new RetrieveAndGenerateCommand({
input: {
text: question
},
retrieveAndGenerateConfiguration: {
type: "KNOWLEDGE_BASE",
knowledgeBaseConfiguration: {
knowledgeBaseId: "KB12345",
modelArn: "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-opus-6-v1:0",
retrievalConfiguration: {
vectorSearchConfiguration: {
numberOfResults: 5
}
}
}
}
});
const response = await bedrockAgent.send(command);
return {
answer: response.output.text,
citations: response.citations,
retrievedReferences: response.retrievedReferences
};
}Current Data Flow:
CIA Platform → CSV Exports → GitHub Repo → Static Site → End Users
Limitations:
- Manual updates required
- No real-time data
- Limited to 19 intelligence products
- No query capabilities
- No authentication/authorization
CIA Platform Roadmap:
- REST API endpoints for all 19 intelligence products
- GraphQL API for complex queries
- OAuth 2.0 authentication
- Rate limiting and quotas
- Real-time webhooks for updates
Expected API Structure:
GET /api/v1/politicians
GET /api/v1/politicians/{person_id}
GET /api/v1/documents?type={type}&rm={rm}
GET /api/v1/votes?ballot_id={ballot_id}
GET /api/v1/parties
GET /api/v1/committees
GraphQL Endpoint: POST /graphql
Webhook Subscriptions: POST /webhooks/subscribe
Authentication:
const response = await fetch('https://api.cia-platform.se/v1/politicians', {
headers: {
'Authorization': `Bearer ${ACCESS_TOKEN}`,
'X-API-Key': API_KEY
}
});AWS Lambda Consumers:
CIA API Gateway → EventBridge → Lambda → Aurora/Neptune/DynamoDB/OpenSearch
Lambda Function Example:
exports.handler = async (event) => {
// EventBridge event from CIA API webhook (payload in `detail`)
const ciaData = event.detail;
// Store in Aurora
await aurora.query('INSERT INTO politicians VALUES (...)');
// Update Neptune graph
await neptune.executeGremlin('g.addV("Politician")...');
// Generate Bedrock embedding
const embedding = await generateEmbedding(ciaData.summary);
// Index in OpenSearch
await opensearch.index({
index: 'documents',
body: { ...ciaData, embedding_vector: embedding }
});
return { statusCode: 200 };
};EventBridge + Kinesis Data Streams:
CIA Platform → Kinesis Stream → Lambda/Firehose → S3/Aurora/OpenSearch
Real-Time Processing Pipeline:
- New document published → Immediate indexing in OpenSearch
- New vote cast → Real-time update in Timestream
- Risk score change → SNS notification to subscribers
| Phase | Source | Target | Method | Timeline |
|---|---|---|---|---|
| Phase 1 | CSV files | Aurora Serverless v2 | Lambda batch import | Q1 2027 |
| Phase 2 | CSV files | Neptune Serverless | Bulk Loader API | Q2 2027 |
| Phase 3 | Aurora | OpenSearch Serverless | Lambda + Bedrock embeddings | Q3 2027 |
| Phase 4 | CIA API | Real-time Lambda consumers | EventBridge integration | Q1 2028 |
type Politician {
person_id: ID!
first_name: String!
last_name: String!
party: Party!
born_year: Int
gender: String
status: String!
district: String
risk_score: Float
risk_level: RiskLevel!
votes: [Vote!]!
documents: [Document!]!
committees: [Committee!]!
}
type Party {
party_id: ID!
party_name: String!
party_name_en: String
founded_year: Int
ideology: String
current_seats: Int
avg_win_rate: Float
members: [Politician!]!
coalitions: [Party!]!
}
type Document {
document_id: ID!
document_type: String!
title: String!
subtitle: String
summary: String
published_date: String!
rm: String
organ: Committee
status: String!
authors: [Politician!]!
votes: [Vote!]!
similar_documents: [Document!]!
}
type Vote {
vote_id: ID!
ballot_id: String!
person: Politician!
party: Party!
vote: VoteType!
vote_date: String!
is_rebel_vote: Boolean!
is_winning_vote: Boolean
}
type Committee {
committee_id: ID!
committee_name: String!
committee_name_en: String
established_year: Int
total_members: Int
members: [Politician!]!
documents: [Document!]!
}
enum RiskLevel {
LOW
MEDIUM
HIGH
CRITICAL
}
enum VoteType {
Ja
Nej
Avstår
Frånvarande
}type Query {
politician(person_id: ID!): Politician
politicians(party: String, district: String, risk_level: RiskLevel): [Politician!]!
party(party_id: ID!): Party
parties(riksdag_status: String): [Party!]!
document(document_id: ID!): Document
documents(type: String, rm: String, organ: String, limit: Int): [Document!]!
searchDocuments(query: String!, limit: Int): [Document!]!
semanticSearchDocuments(query: String!, limit: Int): [Document!]!
vote(vote_id: ID!): Vote
votes(ballot_id: String, person_id: ID): [Vote!]!
committee(committee_id: ID!): Committee
committees: [Committee!]!
# Advanced queries
highRiskPoliticians(threshold: Float): [Politician!]!
rebelVoters(party: String, limit: Int): [Politician!]!
coalitionProbabilities: [CoalitionPrediction!]!
}
type CoalitionPrediction {
parties: [String!]!
probability: Float!
projected_seats: Int!
}type Mutation {
# Admin operations
updatePoliticianRiskScore(person_id: ID!, risk_score: Float!): Politician
# AI operations
generateDocumentSummary(document_id: ID!): String!
predictVote(person_id: ID!, ballot_id: String!): VotePrediction!
# Subscription management
subscribeToUpdates(entity_type: String!, entity_id: ID!): Subscription!
}
type VotePrediction {
predicted_vote: VoteType!
confidence: Float!
probabilities: VoteProbabilities!
}
type VoteProbabilities {
Ja: Float!
Nej: Float!
Avstår: Float!
Frånvarande: Float!
}type Subscription {
newDocument(organ: String): Document!
newVote(ballot_id: String): Vote!
riskScoreChange(person_id: ID): Politician!
coalitionUpdate: CoalitionPrediction!
}erDiagram
POLITICIAN ||--o{ VOTE : casts
POLITICIAN }o--|| PARTY : member_of
POLITICIAN ||--o{ DOCUMENT : authors
POLITICIAN }o--o{ COMMITTEE : assigned_to
PARTY ||--o{ POLITICIAN : has_members
PARTY }o--o{ PARTY : coalition_with
DOCUMENT }o--|| COMMITTEE : processed_by
DOCUMENT ||--o{ VOTE : triggers
POLITICIAN {
string person_id PK "0479479309"
string first_name "Anna"
string last_name "Svensson"
string party FK "S"
int born_year "1975"
string gender "Female"
string status "Tjänstgörande"
string district "Stockholm"
float risk_score "42.5"
string risk_level "MEDIUM"
}
PARTY {
string party_id PK "S"
string party_name "Socialdemokraterna"
int founded_year "1889"
string ideology "Social Democracy"
int current_seats "107"
float avg_win_rate "68.5"
}
DOCUMENT {
string document_id PK "H901FiU1"
string document_type "bet"
string title "Finansutskottets betänkande"
date published_date "2024-11-15"
string rm "2024/25"
string organ FK "FiU"
string status "BESLUTAD"
text fulltext
}
VOTE {
string vote_id PK "V202400123"
string ballot_id FK "B20240056"
string person_id FK "0479479309"
string vote "Ja"
date vote_date "2024-11-20"
boolean is_rebel_vote "false"
}
COMMITTEE {
string committee_id PK "FiU"
string committee_name "Finansutskottet"
int established_year "1867"
int total_members "17"
}
graph TB
subgraph "Data Sources"
CIA[CIA JSON API Gateway]
CSV[Legacy CSV Files]
end
subgraph "AWS Ingestion Layer"
EB[EventBridge]
Lambda1[Lambda Ingest]
S3[S3 Raw Data Lake]
end
subgraph "AWS Storage Layer"
Aurora[(Aurora Serverless v2<br/>PostgreSQL)]
Neptune[(Neptune Serverless<br/>Graph DB)]
DynamoDB[(DynamoDB<br/>NoSQL)]
OpenSearch[(OpenSearch Serverless<br/>Search + Vector)]
Timestream[(Timestream<br/>Time-Series)]
end
subgraph "AWS AI/ML Layer"
Bedrock[Bedrock Titan<br/>Embeddings v2]
BedrockKB[Bedrock<br/>Knowledge Bases]
end
subgraph "AWS API Layer"
AppSync[AWS AppSync<br/>GraphQL API]
Lambda2[Lambda Resolvers]
end
subgraph "Clients"
Web[Static Website]
Mobile[Mobile Apps]
API[External APIs]
end
CIA --> EB
CSV --> Lambda1
EB --> Lambda1
Lambda1 --> S3
Lambda1 --> Aurora
Lambda1 --> Neptune
Lambda1 --> DynamoDB
Lambda1 --> OpenSearch
Lambda1 --> Timestream
Lambda1 --> Bedrock
Aurora --> AppSync
Neptune --> AppSync
DynamoDB --> AppSync
OpenSearch --> AppSync
Timestream --> AppSync
Bedrock --> BedrockKB
OpenSearch --> BedrockKB
BedrockKB --> AppSync
AppSync --> Lambda2
AppSync --> Web
AppSync --> Mobile
AppSync --> API
style CIA fill:#D32F2F,color:#fff
style Aurora fill:#4CAF50,color:#fff
style Neptune fill:#FF9800,color:#fff
style DynamoDB fill:#FFC107,color:#000
style OpenSearch fill:#9E9E9E,color:#fff
style Bedrock fill:#455A64,color:#fff
style AppSync fill:#4CAF50,color:#fff
sequenceDiagram
participant CIA as CIA API Gateway
participant EB as EventBridge
participant Lambda as Lambda Function
participant Aurora as Aurora Serverless
participant Bedrock as Bedrock Titan
participant OpenSearch as OpenSearch Serverless
participant AppSync as AWS AppSync
participant Client as Static Site
CIA->>EB: New document published (webhook)
EB->>Lambda: Trigger ingestion function
Lambda->>Aurora: INSERT INTO documents
Lambda->>Bedrock: Generate embedding (8192-dim)
Bedrock-->>Lambda: Return embedding vector
Lambda->>OpenSearch: Index document + embedding
Lambda->>EB: Publish DocumentIndexed event
Client->>AppSync: GraphQL query (semantic search)
AppSync->>Lambda: Resolver function
Lambda->>Bedrock: Generate query embedding
Bedrock-->>Lambda: Query vector
Lambda->>OpenSearch: KNN vector search
OpenSearch-->>Lambda: Top 10 similar documents
Lambda->>Aurora: Fetch full document metadata
Aurora-->>Lambda: Document details
Lambda-->>AppSync: GraphQL response
AppSync-->>Client: Search results
graph LR
P1[Politician: Anna Svensson<br/>S, Stockholm<br/>Risk: MEDIUM]
P2[Politician: Johan Andersson<br/>M, Göteborg<br/>Risk: LOW]
P3[Politician: Maria Karlsson<br/>SD, Malmö<br/>Risk: HIGH]
Party_S[Party: Socialdemokraterna<br/>107 seats]
Party_M[Party: Moderaterna<br/>68 seats]
Party_SD[Party: Sverigedemokraterna<br/>73 seats]
D1[Document: Budget Bill<br/>H901FiU1]
V1[Vote: Ja<br/>2024-11-20]
C1[Committee: Finansutskottet]
P1 -->|MEMBER_OF| Party_S
P2 -->|MEMBER_OF| Party_M
P3 -->|MEMBER_OF| Party_SD
P1 -->|AUTHORED| D1
P1 -->|CAST_VOTE| V1
P1 -->|ASSIGNED_TO| C1
P2 -->|CAST_VOTE| V1
P3 -->|CAST_VOTE| V1
Party_M -->|COALITION_WITH| Party_SD
D1 -->|PROCESSED_BY| C1
D1 -->|TRIGGERED_VOTE| V1
style P1 fill:#4CAF50,color:#fff
style P2 fill:#4CAF50,color:#fff
style P3 fill:#D32F2F,color:#fff
style Party_S fill:#9E9E9E,color:#fff
style Party_M fill:#9E9E9E,color:#fff
style Party_SD fill:#9E9E9E,color:#fff
style D1 fill:#455A64,color:#fff
style V1 fill:#FFC107,color:#000
style C1 fill:#FF9800,color:#fff
graph TB
subgraph "Data Collection (Hourly)"
Collector[Lambda Collector]
CIA_API[CIA API]
end
subgraph "Amazon Timestream"
VT[Vote Trends Table]
PP[Party Popularity Table]
DT[Document Trends Table]
end
subgraph "Analytics"
QuickSight[QuickSight Dashboards]
Lambda_Analysis[Lambda Analytics]
end
CIA_API --> Collector
Collector --> VT
Collector --> PP
Collector --> DT
VT --> QuickSight
PP --> QuickSight
DT --> QuickSight
VT --> Lambda_Analysis
PP --> Lambda_Analysis
DT --> Lambda_Analysis
Lambda_Analysis --> Forecast[Forecast Models]
style VT fill:#4CAF50,color:#fff
style PP fill:#4CAF50,color:#fff
style DT fill:#4CAF50,color:#fff
style QuickSight fill:#455A64,color:#fff
graph TB
subgraph "Data Sources"
Aurora_DB[(Aurora<br/>Documents)]
S3_Docs[S3 Document Storage]
end
subgraph "Embedding Generation"
Bedrock_Titan[Bedrock Titan<br/>Embeddings v2<br/>8192-dim]
end
subgraph "Vector Storage"
OpenSearch_VS[(OpenSearch Serverless<br/>Vector Index)]
end
subgraph "Bedrock Knowledge Base"
KB[Knowledge Base<br/>Riksdagsmonitor-KB]
Claude[Claude Opus 6.0<br/>Generation Model]
end
subgraph "Application"
AppSync_API[AppSync GraphQL]
Lambda_RAG[Lambda RAG Function]
Client[Static Site]
end
Aurora_DB --> Bedrock_Titan
S3_Docs --> Bedrock_Titan
Bedrock_Titan --> OpenSearch_VS
OpenSearch_VS --> KB
KB --> Claude
Client --> AppSync_API
AppSync_API --> Lambda_RAG
Lambda_RAG --> KB
KB --> Lambda_RAG
Lambda_RAG --> AppSync_API
AppSync_API --> Client
style Bedrock_Titan fill:#FF9800,color:#fff
style OpenSearch_VS fill:#4CAF50,color:#fff
style KB fill:#455A64,color:#fff
style Claude fill:#D32F2F,color:#fff
gantt
title Riksdagsmonitor Data Architecture Roadmap (2026-2032)
dateFormat YYYY-MM
section Phase 1: CSV → API Gateway
CIA API Integration :p1, 2026-01, 12M
Lambda Polling Functions :p1a, 2026-06, 6M
Data Validation Pipeline :p1b, 2026-09, 3M
section Phase 2: AWS Serverless Migration
Aurora Serverless v2 Setup :p2, 2027-01, 3M
Neptune Serverless Graph :p2a, 2027-04, 4M
DynamoDB Tables :p2b, 2027-06, 2M
OpenSearch Serverless :p2c, 2027-08, 3M
section Phase 3: AI/ML Integration
Bedrock Titan Embeddings :p3, 2028-01, 4M
Bedrock Knowledge Bases :p3a, 2028-05, 3M
Semantic Search :p3b, 2028-08, 4M
Predictive Models :p3c, 2029-01, 6M
section Phase 4: Advanced Analytics
Timestream Integration :p4, 2030-01, 3M
Real-Time Streaming :p4a, 2030-04, 4M
Advanced Forecasting :p4b, 2030-08, 6M
ML Model Optimization :p4c, 2031-01, 12M
| Quarter | Milestone | Deliverables |
|---|---|---|
| Q1 2026 | CIA API Integration Planning | API specification, authentication setup |
| Q2 2026 | Lambda Polling Functions | Automated data ingestion from CIA API |
| Q3 2026 | Data Validation Pipeline | Schema validation, error handling |
| Q4 2026 | Hybrid System | CSV + API Gateway dual sources |
Key Metrics:
- API uptime: 99.9%
- Data freshness: < 1 hour
- Error rate: < 0.1%
| Quarter | Milestone | Deliverables |
|---|---|---|
| Q1 2027 | Aurora Setup | Relational database with 2,494 politicians |
| Q2 2027 | Neptune Graph | Graph database with 5M relationships |
| Q3 2027 | DynamoDB + AppSync | NoSQL + GraphQL API layer |
| Q4 2027 | OpenSearch Indexing | Full-text search across 109K documents |
Key Metrics:
- Aurora ACU: 0.5-2 (auto-scaling)
- Neptune NCU: 2.5 (serverless)
- DynamoDB RCU/WCU: On-demand
- OpenSearch OCU: 2 (compute + indexing)
| Quarter | Milestone | Deliverables |
|---|---|---|
| Q1 2028 | Bedrock Titan Embeddings | 8192-dim vectors for all documents |
| Q2 2028 | Bedrock Knowledge Bases | RAG pipeline with Claude Opus 6.0 |
| Q3 2028 | Semantic Search | Vector similarity search |
| Q1 2029 | Predictive Models | Vote prediction, election forecasting |
Key Metrics:
- Embedding generation: 1000 docs/hour
- Semantic search latency: < 500ms
- RAG response time: < 3s
- Model accuracy: > 85%
| Quarter | Milestone | Deliverables |
|---|---|---|
| Q1 2030 | Timestream Integration | Historical trends, time-series analytics |
| Q2 2030 | Real-Time Streaming | EventBridge + Kinesis pipelines |
| Q3 2030 | Advanced Forecasting | Coalition prediction, risk assessment |
| 2031-2032 | Optimization | ML model tuning, cost optimization |
Key Metrics:
- Time-series queries: < 1s
- Real-time latency: < 100ms
- Forecast accuracy: > 90%
- Total AWS cost: < $5000/month
| Component | Current (2026) | Phase 2 (2028) | Phase 4 (2032) |
|---|---|---|---|
| Frontend | Static HTML/CSS/JS | Static HTML/CSS/JS | Static HTML/CSS/JS |
| Hosting | GitHub Pages | GitHub Pages | GitHub Pages |
| API Layer | None | AWS AppSync GraphQL | AppSync + Lambda |
| Database | GitHub files | Aurora Serverless v2 | Aurora Global DB |
| Graph DB | None | Neptune Serverless | Neptune Analytics |
| NoSQL | None | DynamoDB | DynamoDB Global Tables |
| Search | None | OpenSearch Serverless | OpenSearch + Bedrock KB |
| Time-Series | None | None | Timestream |
| Embeddings | None | Bedrock Titan v2 (8192-dim) | Bedrock Titan v3 |
| AI/ML | None | Bedrock Knowledge Bases | Bedrock + SageMaker |
| Compute | None | AWS Lambda | Lambda + Step Functions |
| Orchestration | GitHub Actions | EventBridge | EventBridge + SQS |
| Monitoring | None | CloudWatch | CloudWatch + X-Ray |
| Security | GitHub ISMS | AWS IAM + Secrets Manager | IAM + GuardDuty + Macie |
| Service | 2026 | 2028 | 2032 |
|---|---|---|---|
| Aurora Serverless v2 | $0 | $50/month | $200/month |
| Neptune Serverless | $0 | $100/month | $500/month |
| DynamoDB | $0 | $20/month | $100/month |
| OpenSearch Serverless | $0 | $150/month | $500/month |
| Timestream | $0 | $0 | $100/month |
| Bedrock (embeddings) | $0 | $200/month | $1000/month |
| Lambda | $0 | $50/month | $200/month |
| AppSync | $0 | $30/month | $100/month |
| EventBridge | $0 | $10/month | $50/month |
| S3 + Data Transfer | $0 | $20/month | $100/month |
| CloudWatch | $0 | $20/month | $50/month |
| Total Monthly Cost | $0 | $650/month | $2900/month |
| Metric | 2026 | 2028 | 2032 |
|---|---|---|---|
| Documents | 109K | 500K | 10M |
| Politicians | 2,494 | 10K | 50K |
| Votes | 3.5M | 10M | 100M |
| Graph Relationships | 0 | 5M | 100M |
| Vector Embeddings | 0 | 500K | 10M |
| API Requests/Day | 0 | 10K | 1M |
| Data Storage | 1GB | 100GB | 5TB |
| Concurrent Users | 100 | 1K | 10K |
A.8 Asset Management:
- Aurora/Neptune/DynamoDB data classification (Public/Internal)
- Automated asset inventory via AWS Config
- Data retention policies (7 years for political data)
A.18 Compliance:
- GDPR Article 17 (Right to erasure) via Lambda deletion functions
- GDPR Article 20 (Data portability) via AppSync export queries
- Swedish Archive Act compliance for parliamentary records
ID.AM (Asset Management):
- AWS Systems Manager inventory
- Automated tagging strategy
PR.DS (Data Security):
- Aurora/Neptune encryption at rest (AWS KMS)
- TLS 1.3 for data in transit
- Bedrock model access controls
DE.CM (Continuous Monitoring):
- CloudWatch anomaly detection
- GuardDuty threat detection
- VPC Flow Logs analysis
Control 1 (Inventory):
- AWS Config tracking all resources
- Quarterly audit reports
Control 3 (Data Protection):
- S3 bucket versioning + lifecycle policies
- Aurora automated backups (35 days retention)
- Neptune backups (daily snapshots)
Control 11 (Data Recovery):
- Multi-region Aurora Global Database
- Neptune cross-region replication
- DynamoDB point-in-time recovery (PITR)
Data Classification:
| Data Type | Classification | Retention | Encryption |
|---|---|---|---|
| Politician personal data | Public | Permanent | KMS (at rest) |
| Voting records | Public | 7 years | KMS (at rest) |
| Documents | Public | Permanent | KMS (at rest) |
| Risk scores | Internal | 2 years | KMS (at rest + in transit) |
| API access logs | Internal | 1 year | KMS (at rest) |
| Bedrock model inputs/outputs | Internal | 30 days | KMS (ephemeral) |
Data Lifecycle:
stateDiagram-v2
[*] --> Ingested: CIA API
Ingested --> Validated: Schema check
Validated --> Stored: Aurora/Neptune/DynamoDB
Stored --> Indexed: OpenSearch + Bedrock
Indexed --> Published: AppSync GraphQL
Published --> Archived: After 7 years
Archived --> [*]
Stored --> Deleted: GDPR request
Deleted --> [*]
Privacy by Design:
- No PII beyond public records
- Anonymized analytics data
- GDPR-compliant deletion via Lambda functions
- Bedrock model data retention: 30 days max (AWS configuration)
- ARCHITECTURE.md - Current static site architecture
- DATA_MODEL.md - Current data model (CSV-based)
- FUTURE_FLOWCHART.md - Current data flow diagrams
- SECURITY_ARCHITECTURE.md - Current security controls
- FUTURE_SECURITY_ARCHITECTURE.md - Future AWS security architecture
- THREAT_MODEL.md - STRIDE threat analysis
- Hack23 ISMS - Organization-wide ISMS policies
- TRANSLATION_GUIDE.md - Multi-language support (14 languages)
- WORKFLOWS.md - GitHub Actions CI/CD pipelines
- LABELS.md - Issue management taxonomy
- AWS Neptune Serverless - Graph database documentation
- AWS Aurora Serverless v2 - Relational database
- Amazon OpenSearch Serverless - Search and vector store
- Amazon Bedrock - AI/ML services (Titan, Knowledge Bases)
- Amazon Timestream - Time-series database
- AWS AppSync - GraphQL API service
AI Model Update Cadence: Anthropic Opus minor updates every ~2.3 months, major versions annually
| Period | AI Model | Data Architecture Impact | New Data Entities |
|---|---|---|---|
| 2026-2027 | Opus 4.7-5.x | Enhanced embeddings, improved entity extraction | AI audit logs, model version tracking |
| 2028-2029 | Opus 6.x-7.x | Multi-modal data storage, video/audio political content | Media assets, content provenance records |
| 2030-2032 | Opus 8.x-10.x | Near-expert analysis data, global parliament schemas | Cross-parliament entities, policy impact models |
| 2033-2035 | Pre-AGI systems | Autonomous schema evolution, self-organizing knowledge graphs | Emergent relationship types, dynamic taxonomies |
| 2036-2037 | AGI / Post-AGI | Universal political data ontology, real-time global coverage | 195 parliament datasets, global democracy metrics |
Continuous Model Integration (Every ~2.3 Months):
- Embedding dimension upgrades (768 → 1024 → 2048 → 8192+) tracked in vector DB metadata
- Schema versioning aligned with AI model capabilities
- Backward-compatible data migration for each model update
- Automated data quality assessment using latest model capabilities
Competitor Model Data Considerations:
- Multi-model embedding storage (separate vector spaces per model family)
- Model-agnostic entity extraction pipeline
- Cross-model consistency validation for political entity resolution
- Data portability across AI providers via standardized schemas
| Metric | 2026 | 2028 | 2030 | 2033 | 2037 |
|---|---|---|---|---|---|
| Politicians tracked | 2,494 | 5,000+ | 15,000+ | 50,000+ | 500,000+ |
| Documents indexed | 109K | 500K | 2M+ | 10M+ | 100M+ |
| Voting records | 3.5M | 10M+ | 25M+ | 100M+ | 1B+ |
| Languages | 14 | 30+ | 50+ | 100+ | All UN |
| Parliaments | 1 | 4 | 10+ | 50+ | 195 |
| AI model versions | 1 | 5+ | 10+ | 20+ | 30+ |
| Data refresh | Daily | Hourly | Real-time | Sub-second | Predictive |
Document Information:
- Repository: github.com/Hack23/riksdagsmonitor
- Path:
/FUTURE_DATA_MODEL.md - Format: Markdown with Mermaid diagrams
- Classification: Public
- Language: English (technical documentation)
Version History:
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2026-02-15 | CEO | Initial version - AWS Serverless architecture |
| 2.0 | 2026-02-24 | CEO | Extended to 2037 vision, AI/LLM data architecture, global scale projections |
Approval:
- Document Owner: CEO, Hack23 AB
- Approved Date: 2026-02-15
- Next Review: 2027-02-24 (Annual)
Distribution:
- Public repository: github.com/Hack23/riksdagsmonitor
- Documentation site: riksdagsmonitor.se/docs
🏢 Hack23 AB (Org.nr 5595347807)
📍 Stockholm, Sweden
🌐 hack23.com | riksdagsmonitor.se
📧 Contact: GitHub Issues
This document is part of Riksdagsmonitor's comprehensive documentation portfolio, demonstrating commitment to transparency, security, and technical excellence in Swedish political intelligence.
| Document | Focus | Description |
|---|---|---|
| 🏛️ Architecture | 🏗️ C4 Models | System context, containers, components |
| 📊 Data Model | 📊 Data | Current entity relationships and data dictionary |
| 📊 Future Data Model | 🔮 Data | Enhanced data architecture plans (this document) |
| 🔄 Flowchart | 🔄 Processes | Business and data flow diagrams |
| 📈 State Diagram | 📈 States | System state transitions and lifecycles |
| 🧠 Mindmap | 🧠 Concepts | System conceptual relationships |
| 💼 SWOT | 💼 Strategy | Strategic analysis and positioning |
| 🛡️ Security Architecture | 🔒 Security | Current security controls and design |
| 🎯 Threat Model | 🎯 Threats | STRIDE/MITRE ATT&CK analysis |
| 🚀 Future Architecture | 🔮 Evolution | Architectural evolution roadmap |
- 🛡️ Secure Development Policy — Architecture documentation requirements
- 🏷️ Classification Framework — CIA triad classification
- 📉 Risk Register — Enterprise risk management
📋 Document Control:
✅ Approved by: James Pether Sörling, CEO
📤 Distribution: Public
🏷️ Classification:
📅 Effective Date: 2026-02-24
⏰ Next Review: 2027-02-24
🎯 Framework Compliance: