A full-stack large-scale imageboard platform and application, mirroring production locally as close as possible.
The tech stack so far: A custom Python orchestrator, TypeScript, Next.js, Storybook, Node/Express services, shared Zod contracts, generated OpenAPI docs and Swagger UI, PostgreSQL, Kubernetes, Kind, Helm, Istio, NGINX, Prometheus, Grafana, Jaeger.
I plan to soon integrate Redis, Kafka, Cassandra, Elasticsearch, Logstash, and Sentry, then eventually CI/CD tools alongside testing.
This is how the project evolved:
Figma System/backend design reference -> Figma Frontend design reference -> Frontend implementation -> Backend implementation
The Figma design for the frontend covers the entire application and is feature-complete, including almost all different pages and states.
The backend design includes all databases, schemas, more services I plan to integrate and many implementation notes, while some of it changed upon implementation, its core remains the same.
The frontend implementation for the reusable component library is mostly finished, what's left is wiring everything together based on the frontend design reference for the frontend itself.
The backend implementation has been my focus lately.
Work in progress: the component library and platform/orchestration are mostly done, current focus is on the backend and eventually frontend wiring.
- Showcase
- Introduction
- Architecture
- Prerequisites
- Usage
- User-facing endpoints
- Orchestrator
- Future plans
Note: All videos are heavily compressed, sped up, and edited to stay under GitHub's 10MB upload limit.
Full-quality references are available in the Figma links and the screenshots below, or you can run the application locally.
orchestrator.mp4
This was made as an early design reference.
The frontend design is mostly up to date with the current implementation, and contains the entire planned frontend.
URL: https://www.figma.com/design/0T3UGwiZtPdJiPpDhgFCmf/Homepage?node-id=22-2&t=b0YfpkwNYqBQAepM-1
figma_frontend.mp4
All "errors" are intentional and expected behavior for testing scenarios.
Some component story interactions intentionally succeed or fail at random (50% each) to demonstrate debouncing on repeated like/dislike taps, client-state rollback on server-side errors, error and edge-case handling, and tooltips that update based on the interaction outcome.
storybook.mp4
frontend.mp4
api.mp4
monitoring.mp4
This was made as an early design reference.
The backend/system design differs in some areas from the current implementation, such as missing certain services I later implemented, the orchestrator itself, not yet implemented databases, and changes to parts of the request flow.
URL: https://www.figma.com/design/0T3UGwiZtPdJiPpDhgFCmf/Homepage?node-id=2-37&t=2CoUf9Zsl0qgkKSw-1
figma_backend.mp4
The goal is to build a large-scale global imageboard website, learn and integrate as many technologies at once as possible, all while attempting to replicate a production/real-world environment as close as possible locally.
Everything has been mostly designed in Figma (as a reference and subject to adjustments in the implementation) since mid 2025, including the backend/system and frontend.
Frontend Figma reference: https://www.figma.com/design/0T3UGwiZtPdJiPpDhgFCmf/Homepage?node-id=22-2&t=b0YfpkwNYqBQAepM-1
Backend/System Figma reference: https://www.figma.com/design/0T3UGwiZtPdJiPpDhgFCmf/Homepage?node-id=2-37&t=2CoUf9Zsl0qgkKSw-1
These links are references mostly made 1 year ago. The frontend implementation is mostly the same, while the backend/system design has changed a little in some areas (though the core idea remains the same).
The reusable component library (and therefore most of the frontend) is mostly finished. As I have made it using Storybook, all that's left is wiring together the components into pages for the website and connecting to the backend.
My goal is for the application to be interactive and engaging, simple, intuitive, and lightweight, which is what the average user wants, while also being customizable for those who want a more personalized experience.
As the average user doesn't bother with too much customization, I decided to hide most of them by default in the homepage, where the user can open them by checking the "Display tags, filters, and preferences" box.
The following reference designs illustrate the idea behind the significance of customization and filtering features.
The application was developed desktop-first but with mobile in mind as well, having a notification system in the plans (which will eventually include mobile push notifications), including a feed-like swipe mode, clearly visible and separated icons to avoid tapping the wrong one on accident (as taps on a phone are less precise than a mouse pointer), etc.
A core area of the application will be auto-tagging images on upload using AI models, then also creating a recommendation algorithm based on how much a user engages with posts that have certain tags.
Another interesting feature will be grouping/searching images by color palette.
On a related note, the user is also be able to, in real-time, arbitrarily pick theme colors for the application through a custom theme picker at the bottom right corner, as I purposefully used 4 color variants throughout the entire application (namely --primary, --secondary, --tertiary, and --accent).
I also plan to create an advanced filter for searching/filtering through images that is highly customizable and allows very specific searching and sorting from metadata.
My current priority is the backend.
This is what the backend consists of thus far:
- Docker and Kubernetes for containerization and service orchestration
- Kind for creating and running the Kubernetes cluster locally
- Helm charts for templating, sensible defaults and deployment
- A custom Python orchestrator for controlling the project, logging, and dynamically manipulating manifests
- NGINX as the external reverse proxy
- Istio as the cluster ingress, internal API gateway and service mesh
- Zod contracts shared between services
- OpenAPI spec generation from Zod contracts
- Swagger UI using the generated OpenAPI schema for API documentation
- Prometheus for metrics aggregation
- Grafana for visualization
- Jaeger for tracing requests
- PostgreSQL databases
Some things I had planned with production in mind are not feasible, practical, or reasonable locally, such as replicated pods/containers, sharding, failover, load balancing, worldwide server distribution, cross-region cluster synchronization, cold storage backups, CDNs, etc.
This is either because my machine can't handle everything or because the implementation/maintenance slows development down drastically for little to no benefit, or even to detriment.
In fact, my computer barely handles it right now, as it reaches 10-20GB+ RAM while idling under WSL2, which made me give up emulating many nodes with Kind, among other things.
That is to say, in this project I tried to emulate a large production environment as close as possible without making extreme or unreasonable compromises.
The general idea is:
- The frontend is an application made alongside a reusable component library, with TypeScript, Next.js, and Storybook.
- Public API calls go through BFF (Backend for Frontend) services.
- Shared Zod contracts live under
contractsand are used to validate requests and generate OpenAPI documentation. - BFF services communicate with internal services such as data-access layer and/or others.
- These services then perform internal actions, such as the data-access layer services which communicate with databases to retrieve query data.
- The whole environment is deployed locally through Kubernetes, Helm, Istio, NGINX, and the custom orchestrator.
I tried to emulate a real production request flow as close as possible:
Client browser -> NGINX -> Kubernetes cluster/Istio ingress gateway -> Services
This makes the local setup overly complex compared to a normal development or even production environment, but that also means the project deals with many of the real-world challenges a real system would have and more: external routing, internal service communication, service discovery, generated API documentation, shared contracts, persistent storage, observability, deployment ordering, orchestration, permissions across host/container boundaries, etc.
Some relevant data can be modified through the
.envfile, such as the ports for the NGINX reverse proxy/Frontend Next.js dev server, Istio ingress, etc. More info on the Environment variables section.
This is what is currently implemented. More will be added over time as planned.
| Type | Technology | Usage |
|---|---|---|
| Frontend | TypeScript, Next.js, Node.js, Jest | Frontend application |
| Frontend | Storybook | Component library development and testing |
| Frontend | CSS Modules | Used everywhere besides global styling |
| Backend | TypeScript, Node.js, Express.js | BFF and DAL service runtime |
| Database | PostgreSQL | Used for databases such as users-auth which is used in user authentication |
| Contracts | TypeScript, Zod | Shared validation schemas for all services |
| Contracts | OpenAPI | Schema generated from Zod |
| Docs | Swagger UI | Browser UI for the generated OpenAPI schema |
| Monitoring | Prometheus | Metrics aggregation |
| Monitoring | Grafana | Metrics visualization |
| Monitoring | Jaeger | Distributed tracing |
| Networking | Istio | Cluster ingress, internal API gateway and service mesh |
| Networking | NGINX | External reverse proxy/load balancer |
| Cluster | Kubernetes, Kind | Local Kubernetes cluster |
| Containers | Docker | Containers and NGINX reverse proxy |
| Deployment | Helm | Managing releases and chart templating |
| Orchestration | Python, Typer | Custom orchestration CLI |
| Path | Description |
|---|---|
| frontend | Frontend service descriptor and values |
| frontend/src | Frontend project including component library and Storybook stories |
| frontend/src/src | Frontend source code |
| frontend/src/src/components | Reusable component library |
| backend | Backend services |
| backend/bff | Backend for Frontend services |
| backend/dal | Data-access layer services |
| contracts | Shared Zod contracts and generated OpenAPI schema |
| orchestrator | Project's custom Python orchestrator |
| orchestrator/post-renderer | Helm post renderer plugin for the orchestrator |
| platform | Platform services, charts, templates, generated deployment files |
| platform/deployment/charts | Shared base Helm charts used by multiple services |
| platform/deployment/templates | Base templates for YAML manifests used by the orchestrator for generating PVs, PVCs, and migration jobs |
| platform/deployment/generated | Generated deployment files for each service |
| install.sh | Helper shell script for installing the orchestrator |
| Namespace | Description |
|---|---|
| cert-manager | Where cert-manager lives. |
| istio-system | Where all Istio deployments go, including the ingress. |
| registry | Reserved for the future central registry API services, such as NPM packages, Docker images, etc. |
| messaging | Reserved for the future messaging services such as Kafka, RabbitMQ, etc. |
| monitoring | Where all monitoring services live, such as Prometheus, Grafana, Jaeger, etc. |
| databases | Where all database services live, such as PostgreSQL, Redis, Cassandra, etc. |
| apps | Where application services live, such as the frontend, Swagger UI, BFF, DAL, etc. |
You can see the Figma reference for the frontend here: https://www.figma.com/design/0T3UGwiZtPdJiPpDhgFCmf/Homepage?node-id=22-2&t=b0YfpkwNYqBQAepM-1
It includes pages and states for the homepage/imageboard, focused image/swipe mode, tags, filters, albums, comments, authentication, profile/settings, messages, notifications, create post, upload states, success/failure states, multiple hover/active/error states, etc.
My goal is for the application to be interactive, engaging, simple, intuitive, and lightweight, which is what the average user wants, while also being customizable for those who want a more personalized experience. As the average user doesn't bother with too much customization and it may clutter the interface, I decided to hide most of it by default in the homepage, where the user can open them by checking the "Display tags, filters, and preferences" box.
It was developed desktop-first, but with mobile in mind as well, having a notification system in the plans (which will eventually include mobile push notifications), including a feed-like swipe mode, clearly visible and separated icons to avoid tapping the same one on accident (as taps on a phone are less precise than a mouse pointer), etc.
I decided to implement the frontend component-first instead of hardcoding every page at once.
The reusable component library lives under frontend/src/src/components and was developed with TypeScript and Next.js, using Storybook so they can be tested in isolation before being integrated into the final Next.js pages.
This includes buttons, icon buttons, text inputs, text areas, checkboxes, radio groups, toggles, dropdowns, modals, login/signup dialogs, album selection, comments overlay, post tags, create post states, linked accounts, notification counters, user placeholders, and other reusable UI components.
Most components are flexible, composable, and easily extensible, while also allowing for controlled and uncontrolled behavior.
Some component stories also intentionally test unsuccessful interaction states and some edge-cases as well.
I use those states to test behavior such as debounce (when the user spams the same button such as like/dislike), reverting state from unsuccessful actions on the server-side, successful/failed actions, different tooltip messages, and other edge cases.
Due to this, errors can be ignored on the showcase videos as they're intentional and expected, often having a 50% chance of success or failure for a given action.
The ThemePicker and CommentsOverlay components are a work in progress.
The ThemePicker lets the user change the app colors through its variables in real-time. It mostly works already, but this is not its final design and it's also missing some functionality.
It is meant to be available in all pages and opened from clicking/tapping the icon in the bottom-right corner of the app.
The current implementation is not persistent yet, but the future plan is to add reset/save/cancel behavior, allowing resetting the colors to the original palette, caching the current color palette in the user's browser, and potentially eventually coupling that to the user's account so they always have the same color palette from anywhere.
These databases and their schemas were planned before I had ever worked with any of these databases, let alone started implementing anything, so expect the field types of some of them to be pseudocode placeholders.
System/backend Figma design reference: https://www.figma.com/design/0T3UGwiZtPdJiPpDhgFCmf/Homepage?node-id=2-37&t=2CoUf9Zsl0qgkKSw-1
You can see all database schemas, as well as potential solutions/approaches to some of the bottlenecks and/or problems I could foresee with the design (especially at scale) on the system/backend Figma design above, or on its showcase video.
These are the planned database domains (not the same as schemas, this is essentially a "grouping" by behavior):
| Domain | Use case |
|---|---|
| Users | Credentials (authentication data) and personally identifiable information. |
| Profiles | Public and private profile data, and related engagement data such as profile views, follows, etc. |
| Images | Image metadata, ownership, dimensions, generated variants, blurhash/thumbnail/original references, and storage/cache strategy. |
| Posts | User-facing post data and relationships between users, posts, images, tags, comments, and related engagement data such as views and likes. |
| Comments/replies | Comments, comment/reply relationship, and related engagement metadata such as likes. |
| Albums | User-created collections of posts/images. |
| Tags/interests | Image tags, user interests, and recommendation weights. |
| Preferences/settings | User settings and personalization such as notification preferences, feed layout, UI language, post/profile visibility, profile settings, etc. |
| Central registry | Exposes data as the shared global source of truth, such as the metadata regarding AI models, countries, languages, etc. |
The planned AI tagging flow would tag images on upload.
Those tags would then later be used in the recommendation system, where whenever a user engages with posts containing a given tag, that user's interest in the tag increases.
The recommendation engine would then weight that proportionally for the feed.
There is also a planned optional color-palette search/grouping feature.
The idea is that images can be searched and/or grouped by color palette, where if only grouping (not searching), the ones that match the color palette the most are re-ordered to the top-left of the imageboard.
When searching, color similarity weights can also be weighted with the recommendation weights.
The project uses shared Zod contracts which are defined under contracts and used across multiple services.
The idea is for the API shape to be described once through Zod schemas, then used where possible such as the BFF services, DAL services, generating OpenAPI documentation, and also eventually the frontend.
The project also generates an OpenAPI schema at contracts/generated/openapi.json everytime it starts through the orchestrator up command, which is what Swagger UI uses.
This is relevant because otherwise the frontend, backend, and API documentation would easily drift apart as there would be no single source of truth, while also resulting in a lot of redundant code duplication, making it much harder to maintain, scale, validate, etc.
The frontend currently has some legacy local types under frontend/src/src/types, but I will eventually migrate them to use Zod contracts instead for any API related work (unless they are used for mock UI testing/prototyping).
Connection is meant to mirror a real-world pathway. Thus, I separated it in two layers:
-
External: What the end user sees - public-facing extra-cluster edge server.
- This is the first barrier of entry, where a public-facing NGINX server runs, behaving as a load balancer/reverse proxy.
- My approach was to run it directly on the host machine through a docker container - extra-cluster.
- However, this is not where the request stops, as NGINX proxies it to the Istio ingress in the cluster (which is why Istio is also exposed through port-forwarding)
-
Internal: What the user doesn't see - the cluster itself and internal services.
- This is where the request from NGINX arrives in the cluster, at the cluster edge through Istio as an API gateway.
- Istio also acts as a service mesh, so it is also responsible for (through Envoy sidecars) handling inter-service communication where sidecar injection is enabled.
This distinction is important as although you can access these services externally due to proxy routing and redirection at the aforementioned addresses and ports, this is not where or how service inter-communication and direct calls to services happen (only indirectly).
Internally, services have their own names and ports, likely distinct from those exposed to the users, and like in a real-world scenario, as a user who has access to these public-facing services/interfaces, you cannot directly query them without being part of the cluster (which the services themselves are, but NGINX isn't, and neither are you, unless you enter the pods/containers and/or execute commands from within).
All that to say: the service names and ports are usually different intra-cluster from those I've listed, and are not directly accessible.
You can only access them directly through kubectl run or kubectl exec, through proxying platform/networking/istio/istio-ingress-cfg/chart/templates/routing_table.yaml, or by querying from another service within the cluster itself, through app-svc:svc-port if on the same namespace, or app-svc.namespace.svc.cluster.local from any namespace.
Root access is necessary as the request/file path flows through multiple mutually exclusive environments:
Host (WSL2) -> Kind -> Nodes (Kubernetes) -> Pods (Kubernetes) -> Containers (Docker)
Folders are sometimes created by the user or the container and neither side usually has read/write access over the other. Some containers also expect specific user and group ids.
This is necessary due to the unconventional and relatively complex setup where I use a hacky combination of attempting to emulate production within the same development environment, while also using and handling WSL2, Kind, local storage file mounts, persistence, permission requirements within containers (which also often reset permissions through init containers or require explicit users/permissions), etc.
In this setup, permissions are shared bidirectionally in such a way that the system is practically write-agnostic.
This means for example, that hot-reloading works in servers that expect changes such as BFF, DAL, Frontend, Storybook, etc. whether the container itself modifies the files or I do from the host system. The same goes for shared files such as the Zod schemas under contracts.
Ideally, in a production environment you would instead use the service's VM host for persistence or a network volume.
It would also be preferred to build Docker images for each service, through a local development environment and a CI layer, where the production/staging image should already include the necessary files, avoiding the need for all of the cognitive, theoretical, and practical burden of synchronizing these mutually exclusive environments.
| Dependency | Version |
|---|---|
| Python | v3.12.3 |
| Docker | v29.4.0 |
| Kubernetes client | v1.35 |
| Kind | v0.31.0 |
| Helm | v4.1.4 |
(Optional for creating database migrations)
golang-migrate: v4.19.1
Run install.sh once, then call the orchestrator through the orchestrator command.
./install.sh
orchestrator upSee orchestrator for information on the available commands and options.
There is no need to reinstall when changes are made to the orchestrator, they are automatically accounted for.
Everything below assumes default settings.
| Service | URL |
|---|---|
| Frontend Next.js dev server / NGINX reverse proxy | localhost:8080 |
| Storybook | storybook.localhost:8080 |
| Public API | localhost:8080/api |
| Swagger UI | swagger.localhost:8080 |
| Prometheus | prometheus.localhost:8080 |
| Grafana | grafana.localhost:8080 |
| Jaeger | jaeger.localhost:8080 |
| Istio ingress/API gateway | localhost:5000 |
Grafana credentials:
| Key | Value |
|---|---|
| User | admin |
| Password | 12345 |
The APIs are described in (besides implementation in services):
- Zod schemas at
contracts - OpenAPI schema generated at
contracts/generated/openapi.json - Swagger UI as a frontend for the OpenAPI schema
Persistent data location can be modified in each service's descriptor. I have opted for colocating it with the service definition.
Persistent data location for monitoring services:
- Prometheus:
platform/monitoring/prometheus/data - Grafana:
platform/monitoring/grafana/data - Jaeger:
platform/monitoring/jaeger/instance/data
The public API is exposed as a REST API through BFF (backend for frontend) services (e.g. bff-users), which acts as an intermediate layer between public API calls and internal API calls such as the DAL (data-access layer) services (e.g. users-auth-dal).
The orchestrator lives under orchestrator, it is a custom Python module that controls the lifecycle of the application, logging, deployment, and file/folder permissions.
It is installed from running install.sh. Source file changes are reflected automatically, so there's no need to reinstall.
It is controllable through the CLI, which exposes the following commands:
orchestrator up: Starts the application.
orchestrator down: Deletes the application (except persistent data).
orchestrator restart: Restarts the application (calls down then up).
orchestrator --help or orchestrator: Shows available commands and options.
--debug: Enable debug logs (more verbose). Position/order does not matter.
install.sh:
- Installs the post renderer as a Helm plugin
- Enables execute permission for
orchestrator/post-renderer/run.sh - Creates the Python virtual environment for executing the orchestrator
- Installs the orchestrator dependencies from
requirements.txt - Installs the orchestrator in editable mode within the virtual environment
- Symlinks that executable from the virtual environment to
$HOME/.local/bin/orchestrator, where$HOMEis the path to the user's home directory - Adds
$HOME/.local/binto the user'sPATHenvironment variable if not already there, so that theorchestratorexecutable is globally accessible.- Assumes
bashand.bashrcare used for the user's shell, otherwise you can copy and paste theexportcommand logged to the terminal in the terminal itself or your shell's configuration file.
- Assumes
The orchestrator lives under the virtual environment venv within the project root, which is symlinked to $HOME/.local/bin, where $HOME is the path to the user's home folder, then it is also added to the user's PATH if not already included, so that it can be called anywhere through orchestrator.
It loads its configuration from orchestrator/config.py, where some values are hardcoded and others are loaded from the .env file, or set to a default value if not given.
These can be modified in .env.
If not specified, they are set to the following default values in orchestrator/config.py:
| Key | Value |
|---|---|
| CLUSTER_NAME | cluster |
| SERVICE_DESCRIPTOR_NAME | deploy.yaml |
| DEFAULT_NAMESPACE | default |
| NGINX_PORT | 8080 |
| ISTIO_PORT | 5000 |
| DEFAULT_STORAGE_CAPACITY | 10Gi |
| DEFAULT_STORAGE_REQUEST | 10Gi |
POST_RENDERER_CONTEXT is also used internally for sharing the file path of the context file for the current service in the deployment pipeline with the post renderer. It is not meant to be manually configured in .env.
The orchestrator sets it for each service before calling Helm, so the post renderer knows which generated context.json to load.
At a high-level, this is what calling orchestrator up does:
- Removes resources from the potentially running app.
- Takes ownership of the app root.
- Installs root and workspaces npm packages.
- Needed for development environment syntax validation from Zod contracts.
- Generates an OpenAPI spec file from the Zod schemas at
contractstocontracts/generated/openapi.json. - Discovers enabled services from
deploy.yamlfiles. - Creates
Service(orchestrator/models/service.py) andMount(orchestrator/models/mount.py) objects from parsing deploy.yaml. - Writes a Kind config file from a template to mount the project files onto the cluster nodes at
/mnt. - Creates the Kind cluster.
- Creates the cluster namespaces
- Sets up networking
- Writes NGINX
default.conffrom template and environment variables. - Starts NGINX at the given port (default
8080) through a regular docker container. - Deploys all Istio services.
- Starts Istio ingress at given port (default
5000).
- Writes NGINX
- Deploys infrastructure services with the
infratype. - Deploys storage services with the
storagetype. - Deploys monitoring services with the
monitoringtype. - Deploys application services with the
appstype. - Logs success (or error) message to the terminal including the address of relevant frontend/service services.
Some of this is logged by default. More in-depth logs can be seen by including the
--debugoption in the call toorchestrator.
All dynamically generated service deployment files, such as YAML manifests for persistent volumes, persistent volume claims, migration jobs as well as context.json, are stored in
platform/deployment/generated/${SERVICE_NAME}, whereSERVICE_NAMEis the name of the respective service.
At a high-level, this is how service deployment works (mostly implemented at orchestrator/core/deployment.py):
-
Discovers enabled services and their configuration by parsing service descriptors, which are named deploy.yaml by default.
- The expected service descriptor filename can be modified through the
SERVICE_DESCRIPTOR_NAMEenvironment variable in.env. - Services are stored internally as
Serviceobjects, defined inorchestrator/models/service.py. - Mounts are stored internally as
Mountobjects, defined inorchestrator/models/mount.py.
- The expected service descriptor filename can be modified through the
-
Topologically sorts all services as a dependency graph using depth-first search, installing dependencies first and preventing circular dependencies.
-
Fixes ownership and permissions for mount folders/files.
- Some services need the host, Kind nodes, Kubernetes pods, and Docker containers to all be able to read/write shared paths.
-
Dynamically generates Kubernetes manifests where needed.
- The base templates are at
platform/deployment/templates. - Persistent mounts generate a PV/PVC pair.
- Source mounts use a direct
hostPath. - PostgreSQL migration jobs are generated when
migrations.enabledis set in the service descriptor.
- The base templates are at
-
Generates the service context within the same generated folder.
- It includes metadata about the service, such as generated documents and mounts.
- It will be used by the post renderer.
-
Calls Helm to install the service.
- If the service descriptor has a
filefield, it is passed to Helm as avalues.yamloverride with-f. - If the service descriptor has
wait: true, Helm is called with--wait. - The generated context.json path is passed to the post renderer through the
POST_RENDERER_CONTEXTenvironment variable.
- If the service descriptor has a
-
The post renderer intercepts the Helm installation: it ingests the unmodified manifests which Helm sends to
stdin, manipulates them to inject the generated data, then writes them tostdout. -
Helm then uses the manipulated manifests from
stdoutto perform the final deployment. -
The orchestrator waits for cluster resources if given in
waitFor.- Supports waiting for CRD, deployment, endpoint and/or webhook.
-
Logs success or error.
The filename of a service descriptor is expected to be deploy.yaml by default. It can be changed through the environment variable SERVICE_DESCRIPTOR_NAME.
The descriptor is a small custom configuration layer mainly used by the orchestrator to understand the service needs.
Helm still owns templating, values and installation, but deploy.yaml tells the orchestrator where the chart is, which service group it belongs to, which mounts need to be injected, which service dependencies must be installed first, and which extra resources must be generated before Helm finishes the deployment.
The most relevant fields are:
| Field | Description |
|---|---|
| name | Service name. This is also used for the Helm release name and for the generated folder under platform/deployment/generated. |
| enabled | If false, the service is ignored. |
| type | Deployment group, such as infra, storage, monitoring, or apps. |
| namespace | Kubernetes namespace. Defaults to DEFAULT_NAMESPACE. |
| chart | Helm chart path. |
| file | Optional values file passed to Helm with -f. |
| dependsOn | Other service names that must be deployed first. |
| mounts | Source or persistent mounts that the post renderer injects into the rendered manifests. |
| migrations | Optional database migration job configuration. |
| wait | Whether Helm should wait for the release. |
| waitFor | Additional cluster resources the orchestrator should wait for after Helm finishes. |
The accepted path formats for chart and file are:
./chartor./values.yamlresolves relative to the folder where the descriptor lives.platform/deployment/charts/node-appresolves relative to the project root.- Absolute paths are used as-is.
- Mount paths must be inside the project root, as we mount the entire project at
/mntthrough Kind onto the nodes.
Mounts can be one of two types:
source: Injected as a directhostPath, useful for development source code, contracts, generated OpenAPI files, etc.persistent: Generates a PV/PVC pair and injects the PVC into the workload, useful for services which write data themselves, such as database and monitoring services.
Example deploy.yaml for users-auth-postgres at platform/storage/databases/postgresql/users-auth:
name: users-auth-postgres
enabled: true
type: storage
namespace: databases
chart: platform/storage/databases/postgresql/chart
file: ./values.yaml
mounts:
- name: data
enabled: true
type: persistent
readOnly: false
hostPath: ./data
containerPath: /var/lib/postgresql/data
Example values.yaml:
db:
name: users-auth
username: admin
password: 12345
database: users_auth
Example deploy.yaml for users-auth-dal at backend/dal/users/users-auth:
name: users-auth-dal
enabled: true
type: apps
namespace: apps
chart: platform/deployment/charts/node-app
file: ./values.yaml
dependsOn:
- users-auth-postgres
mounts:
- name: development
type: source
enabled: true
readOnly: false
hostPath: ./src
containerPath: /app
- name: contracts
type: source
enabled: true
readOnly: true
hostPath: contracts
containerPath: /contracts
migrations:
enabled: true
target: users-auth-postgres
hostPath: ./migrations
containerPath: /migrations
Example values.yaml:
fullnameOverride: "users-auth-dal"
replicaCount: 1
image:
repository: node
tag: 24.15.0-bookworm
# This sets the pull policy for images.
pullPolicy: IfNotPresent
container:
command: ["sh", "-c"]
args:
[
"cd /app && npm install && mkdir -p node_modules/@app && ln -sf /contracts node_modules/@app/contracts && NODE_OPTIONS=--preserve-symlinks npm run dev",
]
# This is for setting up a service more information can be found here: https://kubernetes.io/docs/concepts/services-networking/service/
service:
# This sets the service type more information can be found here: https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types
type: ClusterIP
# This sets the ports more information can be found here: https://kubernetes.io/docs/concepts/services-networking/service/#field-spec-ports
port: 80Migrations are currently PostgreSQL only.
When enabled, the orchestrator finds the target database service, reads its values file, generates a Helm hook Job using platform/deployment/templates/migration.template.yaml, waits for PostgreSQL to accept connections, then runs migrate/migrate (golang-migrate) against the target database.
context.json is the contract between the orchestrator and the post renderer.
It is generated per service at:
platform/deployment/generated/${SERVICE_NAME}/context.json
It contains:
| Key | Description |
|---|---|
| serviceName | Name of the service being deployed. |
| serviceType | Service group/type, such as infra, storage, monitoring, or apps. |
| generatedDir | Absolute path to the generated folder for that service. |
| documents | Generated manifests related to the service, such as PVs, PVCs and migration Jobs. |
| mounts | Mount metadata used by the post renderer. |
Each mount includes the original host path, the Kind path under /mnt, the container path, whether it is read-only, the generated volume name, and the Kubernetes volume source to inject.
For example:
- A
sourcemount becomes ahostPathvolume. - A
persistentmount becomes apersistentVolumeClaimvolume, while the PV/PVC manifests themselves are generated into the same servicegeneratedfolder.
Example context.json generated for users-auth-postgres:
{
"serviceName": "users-auth-postgres",
"serviceType": "storage",
"generatedDir": "/home/slins/Programming/Web/App/platform/deployment/generated/users-auth-postgres",
"documents": [
"/home/slins/Programming/Web/App/platform/deployment/generated/users-auth-postgres/data.pv.yaml",
"/home/slins/Programming/Web/App/platform/deployment/generated/users-auth-postgres/data.pvc.yaml"
],
"mounts": [
{
"name": "data",
"enabled": true,
"type": "persistent",
"readOnly": false,
"hostPath": "/home/slins/Programming/Web/App/platform/storage/databases/postgresql/users-auth/data",
"kindPath": "/mnt/platform/storage/databases/postgresql/users-auth/data",
"containerPath": "/var/lib/postgresql/data",
"volumeName": "users-auth-postgres-data",
"source": {
"persistentVolumeClaim": {
"claimName": "users-auth-postgres-data-pvc"
}
},
"capacity": "10Gi"
}
]
}Example for users-auth-dal:
{
"serviceName": "users-auth-dal",
"serviceType": "apps",
"generatedDir": "/home/slins/Programming/Web/App/platform/deployment/generated/users-auth-dal",
"documents": [
"/home/slins/Programming/Web/App/platform/deployment/generated/users-auth-dal/users-auth-dal.migrate-job.yaml"
],
"mounts": [
{
"name": "development",
"enabled": true,
"type": "source",
"readOnly": false,
"hostPath": "/home/slins/Programming/Web/App/backend/dal/users/users-auth/src",
"kindPath": "/mnt/backend/dal/users/users-auth/src",
"containerPath": "/app",
"volumeName": "users-auth-dal-development",
"source": {
"hostPath": {
"path": "/mnt/backend/dal/users/users-auth/src"
}
},
"capacity": "10Gi"
},
{
"name": "contracts",
"enabled": true,
"type": "source",
"readOnly": true,
"hostPath": "/home/slins/Programming/Web/App/contracts",
"kindPath": "/mnt/contracts",
"containerPath": "/contracts",
"volumeName": "users-auth-dal-contracts",
"source": {
"hostPath": {
"path": "/mnt/contracts"
}
},
"capacity": "10Gi"
}
]
}The post renderer lives under orchestrator/post-renderer and is installed as a Helm plugin through install.sh.
The Helm post renderer here is necessary to dynamically modify YAML manifests before they are installed, as necessary.
This is useful for dynamically creating entire new manifests from existing templates at platform/deployment/templates such as migration jobs, persistent volumes and persistent volume claims, as well as modifying existing manifests, such as programmatically mounting the generated persistent volumes and persistent volume claims onto container images.
Everything can be arbitrarily described in each service's service descriptor file (deploy.yaml).
The post renderer works by receiving the manifests as they are defined through stdin from Helm, then returning the final modified manifests through stdout.
Its flow is:
-
Loads context.json as a dictionary using the file path set through the
POST_RENDERER_CONTEXTenvironment variable.- If no context is found, it writes Helm's original
stdinback tostdoutwithout changing it.
- If no context is found, it writes Helm's original
-
Loads generated YAML documents from the service's generated folder, such as PVs, PVCs, and migration Jobs.
-
Reads the YAML manifests rendered by Helm from
stdinas a list of dictionaries. -
Injects content into the manifests that need it.
- For usual Kubernetes resources (
Pod,Deployment,StatefulSet,DaemonSet,ReplicaSet,Job,CronJob), it finds the pod spec and injects volumes and volume mounts into the containers. - For Jaeger, it uses a special path as the Jaeger operator expects volumes and volume mounts directly under the Jaeger custom resource spec.
- For usual Kubernetes resources (
-
If a mount already exists at the same container path, it is replaced instead of duplicated.
-
Each manipulated document is added to the output document list.
-
The output document list is deduplicated, based on an id using Kubernetes kind, metadata name and namespace.
-
The final documents are written to
stdout, where they are captured by Helm and subsequently installed.
This is why the post renderer currently has logging disabled: writing logs to stdout malforms the manifests expected by Helm. I tried writing them to stderr, but it didn't work either, so for now I decided to comment out the logs in orchestrator/post-renderer/post_renderer.py.
- Finish
ThemePickercomponent, make it draggable, implement save, cancel and reset state behavior. - Finish
CommentsOverlaycomponent. - Make all font size variables relative to the medium font size variable in
globals.css, so that scaling the medium font size automatically scales all variants (independent of the operating system and browser's settings). - Add a font size selector component for the entire application.
- Create a sidebar comments component for full-screen posts (similar to
CommentsOverlaybut opens/closes from the side). - Finish wiring the pages together from the existing component library.
- Improve cross-browser consistency (mostly developed and tested on Firefox)
- Review which pages/components truly need
"use client". - Connect to the backend APIs.
- Introduce application-level client-side and server-side state management through Zustand and Tanstack Query.
- Improve responsiveness across multiple devices, screens, and dimensions.
- Add unit and end-to-end tests.
- Implement AI image tagging on upload.
- Implement recommendation/interest tracking system based on user engagement with posts that contain specific tags.
- Implement color coding/search/grouping algorithm for images matching a given color palette, optionally mixing color similarity weights with recommendation weights.
- Migrate internal APIs to gRPC where applicable, such as DAL services. Use REST APIs only for public-facing services such as BFF services.
- Give more descriptive validation error messages in data-access layer API responses such as "Password too short", "Username too long", etc. from Zod validation, while avoiding leaking this internal information from the BFF to the public users in certain scenarios.
- Broaden the REST API coverage for the current and future BFF and DAL services.
- Turn
contractsinto a standalone NPM package registered in a central registry API. - Introduce a domain aggregator service for aggregating, for example, database calls between BFF and DAL if they are often coupled such as users, posts, images, etc. This is so that we can retrieve multiple related data with a single query. GraphQL might be a good fit for this.
- Integrate Redis, Kafka, Cassandra, Elasticsearch, Logstash, and Sentry.
- Add CI/CD tools such as GitHub Actions, Jenkins, Argo, Terraform, etc.
- Add pre-commit hooks for formatting, linting, validation, etc.
- Add stress, unit, and end-to-end tests using Jest/Vitest, Cypress/Playwright, and tools such as Locust/k6/wrk, etc.
- Create jobs for periodically dumping databases (e.g.
pg_dumpusers_authfromusers-auth-postgres). - Create a central registry API for push/pulling data shared by services such as custom Docker images, NPM packages, AI model versioning, etc.
- Integrate the central registry API with the CI/CD setup to automatically build files, making most file mounts unnecessary (the containers would already contain them built-in).
- Create database/API tests, such as checking whether services are responding, returning correctly formatted responses, and using correct status codes.
- Improve install script (
install.sh) coverage to also install all project dependencies, including binaries from Helm, Kubernetes, etc. - Find a way to make logs functional within the post renderer (
orchestrator/post-renderer/post_renderer.py), as by definition it conflicts with the logger (orchestrator/core/log.py) since the post renderer directly manipulates data instdin,stdout, andstderr. - Improve error handling, validation, and coverage within the orchestrator.
- Improve code documentation through commenting and annotating functions, using specs such as TSDoc.
- Enable strict mTLS within the cluster by default, while leaving certain services as permissive (e.g. Kafka, RabbitMQ, etc.).
- Enable application-level TLS for all services or only services with permissive mTLS.
- Add/update credentials for sensitive services such as Prometheus, Jaeger, etc.
- Add service accounts and role-based access control for services and authentication.
- Set resource limits for services such as CPU usage, memory usage, storage usage, etc.
- Introduce rate limiting for each relevant service at each service layer.
- Create an ideal failover safe setup with replication, high-availability and global synchronization, including multi-region synchronization, eventual consistency, primary local and global writes, etc.
- Introduce database orchestration services for high-availability and replication such as PgBouncer, ZooKeeper, Patroni, CloudnativePG, etc.
- Introduce distributed monitoring orchestrators, such as Thanos for Prometheus.
- Replace Grafana's built-in SQLite database with a dedicated PostgreSQL service.