|
| 1 | +--- |
| 2 | +title: High availability and resilience in application hosting |
| 3 | +description: This page explores best practices and architectural principles for achieving high availability and resilience in application hosting on Scaleway. |
| 4 | +tags: high-availability resilience fault-tolerance application-hosting cloud-infrastructure scalability reliability |
| 5 | +dates: |
| 6 | + validation: 2025-12-04 |
| 7 | + posted: 2025-12-04 |
| 8 | +--- |
| 9 | + |
| 10 | +To ensure high performance and reliability for your applications, you can design your infrastructure to be resilient and highly available. Achieving resilience today means designing your cloud infrastructure to withstand failures at every level - from individual servers to entire Availability Zones. |
| 11 | + |
| 12 | +This page walks you through key strategies for deploying across multiple AZs, securing your data with reliable Block Storage snapshots, implementing automated failover with DNS management, and testing recovery procedures. |
| 13 | + |
| 14 | +Scaleway products are available in multiple regions and locations worldwide. |
| 15 | + |
| 16 | +- A **region** is a separate geographical area (Paris, Amsterdam, Warsaw). Each region contains one or multiple Availability Zones. |
| 17 | +- **Availability Zones (AZ)** are isolated locations in a specific region. Each Availability Zone provides its own services and infrastructure. |
| 18 | + |
| 19 | +<Message type="tip"> |
| 20 | + Refer to the [product availability](/account/reference-content/products-availability/) documentation page for an updated list of which products are available in which regions and AZs. |
| 21 | +</Message> |
| 22 | + |
| 23 | +## Design for AZ redundancy |
| 24 | + |
| 25 | +Regions and AZs allow you to distribute the Scaleway resources that make up your infrastructure across different physical places to enforce protection against localized failures. |
| 26 | + |
| 27 | +Depending on your application’s availability requirements, data sensitivity, and tolerance for downtime or data loss, you might distribute your resources across regions, AZs or both. |
| 28 | + |
| 29 | +### When to distribute resources across AZs |
| 30 | + |
| 31 | +We recommend using multiple AZs within the same region when: |
| 32 | + |
| 33 | +- You need **high availability** and **fault tolerance** against localized failures such as power outages, network issues or hardware failures. |
| 34 | +- Your application must meet **uptime SLAs** (99.9% or higher, for example). |
| 35 | +- You want **low-latency data replication** and **fast failover**. |
| 36 | +- You’re running **production workloads** that do not require global reach. |
| 37 | + |
| 38 | +**Example use-case:** A customer-facing web application hosted on Scaleway Instances, balanced by a Load Balancer, and connected to a highly available PostgreSQL Managed Database, with all components distributed across two Availability Zones for resilience. |
| 39 | + |
| 40 | +### When to distribute resources across regions |
| 41 | + |
| 42 | +We recommend using multiple regions when: |
| 43 | + |
| 44 | +- You need protection against **regional outages** such as natural disasters and extended power or network failures. |
| 45 | +- You must comply with **data sovereignty laws** - if you need to store EU user data in France, for example. |
| 46 | +- Your users are **geographically dispersed**, and you want to reduce latency. |
| 47 | +- You require **backup and recovery** in a physically separate location. |
| 48 | + |
| 49 | +**Example use-case:** A static website hosted on Scaleway Object Storage in `fr-par` that replicates the content to a bucket in `nl-ams` using automated tools, ensuring the site remains deployable even during a regional disruption. |
| 50 | + |
| 51 | +### When to distribute resources across both |
| 52 | + |
| 53 | +We recommend setting up resources to be both multi-AZ and multi-region when: |
| 54 | + |
| 55 | +- Your application is **business-critical** and cannot tolerate extended downtime. |
| 56 | +- You need **both high availability and disaster recovery**. |
| 57 | +- You are subject to **strict compliance or regulatory requirements**. |
| 58 | +- You operate a **global service** requiring low latency and continuous uptime. |
| 59 | + |
| 60 | +**Example use-case:** A mission-critical SaaS platform running in active-passive mode across two regions, with each region operating in a multi-AZ configuration using Scaleway Instances, Load Balancers, and HA-managed databases, combined with cross-region DNS failover for uninterrupted service. |
| 61 | + |
| 62 | +### Use cases |
| 63 | + |
| 64 | +Find below a list of example use cases to help you decide how to design resiliency and high-availability in your infrastructure. |
| 65 | + |
| 66 | +| Use Case | Distribution Strategy | |
| 67 | +|--------|------------------------| |
| 68 | +| Basic production app | Multi-AZ within one region | |
| 69 | +| High availability and auto-failover | Multi-AZ with HA services (Managed Databases, Load Balancer) | |
| 70 | +| Data protection and compliance | Cross-region backups | |
| 71 | +| Global reach and low latency | Multi-region deployment | |
| 72 | +| Maximum resilience | Multi-AZ + Multi-region with failover | |
| 73 | + |
| 74 | + |
| 75 | +## Use multi-AZ services |
| 76 | + |
| 77 | +Some Scaleway products inherently support multi-AZ deployments, when configured for high availability. |
| 78 | + |
| 79 | +The table below lists which products offer cross-AZ resilience and under what conditions: |
| 80 | + |
| 81 | +| Product | Multi-AZ Support | Description | |
| 82 | +|--------|------------------|--------------| |
| 83 | +| Managed Databases (PostgreSQL, MySQL) | Yes (HA Mode) | Deploys primary and replica nodes across two AZs with automatic failover and synchronous replication. | |
| 84 | +| Load Balancer | Yes | Distributes traffic across Instances in multiple AZs. Automatically detects unhealthy instances and reroutes traffic, ensuring resilience during AZ outages. | |
| 85 | +| Object Storage | Yes (regional) | Data is stored across multiple AZs within a region. Inherently resilient and durable, with no user configuration required. | |
| 86 | +| Managed MongoDB® | Yes (3-node Replica Set) | Requires using the 3-node replica set option includes 1 primary + 2 standby nodes. Provides redundancy and automatic failover. If the primary fails, a standby is promoted. | |
| 87 | + |
| 88 | +## Implement resilience best practices |
| 89 | + |
| 90 | +A resilient architecture requires ongoing operational practices to maintain availability over time. |
| 91 | + |
| 92 | +We recommend following these best practices to ensure your applications stay available: |
| 93 | + |
| 94 | +- **Automate deployments** - use infrastructure-as-code tools, such as [Terraform](https://registry.terraform.io/providers/scaleway/scaleway/latest/docs), to consistently deploy across AZs and regions. |
| 95 | +- **Back up regularly and test restores** - [Enable automated backups](/managed-databases-for-postgresql-and-mysql/how-to/manage-backups/#how-to-set-up-autobackups) for managed services (Managed Databases, Managed MongoDB®), and regularly test that you can restore from them. |
| 96 | +- **Replicate critical data across regions** - Use scripts or CI/CD pipelines to copy backups, dumps, or static assets using tools like `awscli`, `rclone`, or `s3cmd`. |
| 97 | +- **Monitor health and performance** - [Set up alerts](/cockpit/how-to/send-metrics-logs-to-cockpit/) for key metrics: instance health, database replication lag, storage capacity, and DNS reachability. |
| 98 | +- **Test failover procedures** - Run regular recovery drills: simulate an AZ outage, restore a database from backup, or trigger DNS failover. Document the results to keep track of what works and what doesn't |
0 commit comments