This repository provides a turnkey, cross-cloud deployment of multi-node CockroachDB clusters using Terraform and Ansible. It includes provider-specific Terraform modules for AWS, Azure, and GCP (with optional multi-region replicator), plus Ansible roles to configure CockroachDB nodes, HAProxy (or cloud load balancers), Kafka, and an application tier. Out-of-the-box monitoring is set up via Prometheus and Grafana. Sample applications and programming tools are also installed on the application node. Advanced features like ChangeFeeds, replicator/molt for PostgreSQL migrations, and logical/physical replication are also scaffolded for zero-downtime migrations and multi-datacenter deployments.
- Security Notes
- Directory Structure
- Using the Terraform HCL
Currently, this supports AZURE, AWS and GCP now. For AWS and AZURE, the cloud provider load balancer can be used instead of haproxy. For GCP, the cloud provider load balancer is not yet supported. The goal is to have minimal changes to the ansible for each of the cloud providers. The subdirectories are:
- ansible contains the ansible scripts
- terraform-aws contains the aws terraform code
- multiregionAWS contains aws multi-region terraform code. Smaller terraform code using terraform-aws folder but for multi-region
- multiregionGCP contains gcp multi-region terraform code. Smaller terraform code using terraform-gcp folder but for multi-region
- terraform-azure contains the azure terraform code
- terraform-gcp contains the gcp terraform code
- multireginAzure will come soon...
Terraform HCL to create a multi-node CockroachDB cluster.. The number of nodes can be a multiple of 3 and nodes will be evenly distributed between 3 Azure Zones. Optionally, you can include
- haproxy VM - the proxy will be configured to connect to the cluster
- app VM - application node that includes software for a multi-region demo
- load balancer - for AWS and Azure, can use cloud provider load balancer instead of haproxy
firewalldhas been disabled on all nodes (cluster, haproxy and app).- A security group is created and assigned with ports 22, 8080, 3000 and 26257 opened to a single IP address. The address is configurable as an input variable (my-ip-address)
To use the HCL, you will need to define an SSH Key -- that will be used for all VMs created to provide SSH access. This is simple in both Azure and AWS but a bit more difficult in GCP. In GCP, it was much easier to create the ssh key with the gcloud api than to do this in the UI. The main.tf file for each deployment has a key name as well as a full path to the SSH key file.
# See the appendix below to intall Terraform, the Cloud CLIs and logging in to Cloud platforms
git clone https://github.com/jphaugla/crdb-terraform-ansible.git
cd crdb-terraform-ansible/This has changed with new enterprise license requirements. Can now use without adding a license for an initial time period
add the enterprise license and the cluster organization to the following files in the region subdirectory under provisioners/temp So, for example, if the region is centralus, add the contents of your licence key to a file in provisioners/temp/centralus/enterprise_license
enterprise_license
cluster_organization
- Use the terraform/ansible deployment using the appropriate subdirectories for selected cloud provider and single or multi-region
- Valid the parameters in the main.tf in each of the chosen directory
- Can enable/disable deployment of haproxy by setting the include_ha_proxy flag to "no" in deploy main.tf
- Can enable/disable deployment of replicator using start_replicator flag in main.tf
- Optionally can set install_enterprise_keys in main.tf
- Depending on needs, decide whether to deploy kafka setting the include_kafka to yes or no in main.tf
- Look up the IP address of your client workstation and put that IP address in my_ip_address
- This allows your client workstation to access the nodes through their public IP address
- This access is needed for the ansible scripts to perform necessary operations
- NOTE: Inside the application node, this banking java application will be deployed and configured
- if no need for the application to run, kill the pid. Easy to find the pid by doing a grep on java and killing the application job
make sure to be in the subdirectory chosen above before running these commands
terraform init
terraform plan
terraform apply
Generic grafana prometheus plugin and grafana dashboard configure prometheus data source for grafana import grafana dashboards Detailed steps are documented in the following grafana links for cockroachDB and replicator/replicator.
Prometheus and Grafana are configured and started by the ansible scripts. Both are running as services on the haproxy node
- Look up the haproxy node address in the region subdirectory under provisioners/temp
- Start the grafana interface using grafana ui.
- This grafana ui is the haproxy external node ip at port 3000
- Change the admin login password (original login is the installation default of admin/admin)
- configure prometheus data source for grafana
- really this is:
- adding the prometheus data source as documented in the link above
- entering http://localhost:9090 for the connection URL
- scrolling to the bottom of the UI window
- Click save and test
- really this is:
- import grafana dashboards
- From the same grafana interface at grafana ui, Click on Dashboards using the above instructions
- CockroachDB and replicator/terminator grafana dashboards are available within grafana dashboards folder
- These could be stale. Refresh this folder using the getGrafanaDashboards.sh
- import all the dashboards. One of them is for replicator and the rest are cockroachDB dashboards
- NOTE: replicator.json is only needed if doing replicator
terraform destroy
This is no longer a recommended pattern now that LDR and PCR have been released Using this same 2 region deployment can set up LDR with these scripts
- terraform apply in each region directory-reference the steps noted above
- add license and cluster org to the provisioners/temp/
git clone https://github.com/jphaugla/crdb-terraform-ansible.git
cd crdb-terraform-ansible/terraform-azure/region1
terraform init
terraform apply
cd crdb-terraform-ansible/terraform-azure/region2
terraform init
terraform apply- This will deploy this Digital-Banking-CockroachDB github into the application node with connectivity to cockroachDB.
Additionally, replicator is deployed and running on the application node also with connectivity to haproxy and cockroachDB in the same region
cd ~/crdb-terraform-ansible/provisioners/temp/{region_name}
ssh -i path_to_ssh_file adminuser@`cat app_external_ip.txt`
ps -ef |grep replicator
# if it is not running, start it
cd /opt
./start.sh- NOTE: this compiling and starting of the application step has been automated in terraform so only for debug/understanding
- The java application needs to be started manually on the application node for each region. Set up the environment file
- the ip addresses can be found in a subdirectory under temp for each deployed region
- Make sure to set the COCKROACH_HOST environment variable to the private IP address for the haproxy node
- If using kafka, KAFKA_HOST should be set to the internal IP address for kafka
- set the REGION to the correct region
- do on each region
# NOTE: this should already be running. If not running check log files in /mnt/datat1/bank-app
# steps below will rerun
cd ~/crdb-terraform-ansible/provisioners/temp/{region_name}
ssh -i path_to_ssh_file adminuser@`cat app_external_ip.txt`
cd Digital-Banking-CockroachDB
# edit scripts/setEnv.sh as documented above
source scripts/setEnv.sh
mvn clean package
java -jar target/cockroach-0.0.1-SNAPSHOT.jar- The necessary manual step is to deploy a CockroachDB Changefeed across the regions to make active/active replicator between the two otherwise independent regions
- Port 30004 is open on both regions to allow the changefeed to communicate with the application server on the other region
- Start the changefeed on each side with changefeed pointing to the other sids's application node external IP address
- The changefeed script is written on each of the cockroach database nodes by the terraform script. Login to any of the cockroach
nodes using the IP address in temp for each deployed region.
- As previously mentioned, the changefeed script must be modified to point to the application external IP address for the other region
- this is the step that reaches across to the other region as everything else is within region boundaries
- IMPORTANT NOTE: Must have enterprise license for the changefeed to be enabled
- Two different changefeeds are provided in the home directory for the adminuser on any of the cockroachDB nodes: Banking application or cockroach kv workload
- In either case, edit the corresponding sql script using the external IP address for other regions application node
- Banking application
- edit create-changefeed.sql replacing the IP address before port number 30004, with the external IP address for other regions application node
- create-changefeed.sh-creates a changefeed for the banking application
- Cockraoch kv workload
- edit create-changefeed-kv.sql replacing the IP address before port number 30004, with the external IP address for other regions application node
- create-changefeed-kv.sh-creates a changefeed for the cockroachdb kv workload
cd ~/crdb-terraform-ansible/provisioners/temp/{region_name}
ssh -i path_to_ssh_file adminuser@`cat crdb_external_ip{any ip_address}`
# edit create-changefeed.sh putting the app node external IP address for the other region
cockroach sql --certs-dir=certs
SET CLUSTER SETTING cluster.organization = 'Acme Company';
SET CLUSTER SETTING enterprise.license = 'xxxxxxxxxxxx';
exit
# two different changefeed scripts are provided
vi create-changefeed-kv.sql
# or
vi create-changefeed.sql
./create-changefeed-kv.sh
# or
./create-changefeed.shVerify rows are flowing across from either region by running additional test application steps or run sample kv workload from the adminuser home in the application node application machine using the provided kv-workload.sh script
az vm image list -p "Canonical"
az vm image list -p "Microsoft"
- Install Azure CLI using homebrew
- Install Azure CLI manually sudo rpm --import https://packages.microsoft.com/keys/microsoft.asc for RHEL 8 sudo dnf install -y https://packages.microsoft.com/config/rhel/8/packages-microsoft-prod.rpm sudo dnf install azure-cli
az upgrade az version az login (directs you to a browser login with a code -- once authenticated, your credentials will be displayed in the terminal)
Microsoft Terraform Docs https://learn.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-terraform Sizes for VM machines (not very helpful) https://learn.microsoft.com/en-us/azure/virtual-machines/sizes User Data that is a static SH https://github.com/guillermo-musumeci/terraform-azure-vm-bootstrapping-2/blob/master/linux-vm-main.tf
Install AWS CLI using homebrew
AWS Terraform Docs https://registry.terraform.io/providers/hashicorp/aws/latest/docs Amazon EC2 instance types https://aws.amazon.com/ec2/instance-types/
aws ec2 describe-images \
--owners amazon \
--filters \
"Name=name,Values=al2023-ami-2023*" \
"Name=architecture,Values=x86_64" \
"Name=virtualization-type,Values=hvm" \
--query "sort_by(Images, &CreationDate)[-1].ImageId" \
--output text
aws ec2 describe-images \
--owners amazon \
--filters \
"Name=name,Values=al2023-ami-2023*" \
"Name=architecture,Values=arm64" \
"Name=virtualization-type,Values=hvm" \
--query "sort_by(Images, &CreationDate)[-1].ImageId" \
--output text
aws ec2 describe-images \
--owners 099720109477 \
--filters \
"Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-2025*" \
--query "sort_by(Images, &CreationDate)[-1].ImageId" \
--output texthomebrew install manual install
GCP Terraform Docs https://registry.terraform.io/providers/hashicorp/google/latest/docs GCP Compute Engine Types https://cloud.google.com/compute/docs/machine-resource
gcloud compute images list \
--project=ubuntu-os-cloud \
--filter="family:( ubuntu-2204-lts )" \
--format="table[box](name, family, creationTimestamp)"configure prometheus data source for grafana import grafana dashboards
- terraform.tfvars and vars.tf have important parameters.
- Each node type has its own tf file
- Network components including security groups with port permissions are in network.tf
- Can use either of the regions subdirectories to kick off the deployment. Both regions are defined to enable replicator deployment
- These files connect terraform and ansible for azure
- template file at inventory.tpl
- provisioning.tf
- inventory.tf
- These files connect terraform and ansible for aws
- template file at inventory.tpl
- provisioning.tf
- inventory.tf
- These files connect terraform and ansible for gcp
- template file at inventory.tpl
- provisioning.tf
- inventory.tf
- Ansible code is in the provisioners/roles subdirectory
- playbook.yml
- Each node group has ansible code to export the node's private and public ip addresses to a region subdirectory under ansible/temp
- haproxy-node doesn't have any additional installation
- app-node creates an application node running replicator and a Digital Banking java application
- replicator creates replicator and molt deployment
- kafka-node
- crdb-node
- For using replicator, a changefeed script is created using a j2 template
- prometheus
- Under each of these node groups
- A vars/main.yml file has variable flags to enable/disable processing
- A tasks/main.yml calls the required tasks to do the actual processing
- A templates directory has j2 files allowing environment variable and other substitution
- playbook.yml
- Molt replicator is no longer used for 2 region/DC deployments of CockroachDB but is part of zero downtime migration with molt
- 2 region/DC deployments of CockroachDB use Logical Data Replication or Physical Cluster Replication see below
- This github enables but does not fully automate migration and replication from PostgreSQL to CockroachDB
- A cloud storage bucket is created on each of the three cloud providers
- Scripts are created on the application node with the correct connection strings for an AWS deployment
- Eventually, these scripts will be customized for each cloud provider
To run molt-replicator
- Turn on the processing for molt-replicator with the terraform variable setup_migration in main.tf
- The loading of the postgres employee sample database may be turned off as it is time consuming. Enable install_employee sample in vars/main.tf
- Use the scripts created on the application node in /home/ec2-user/
NOTE: for each of these scripts, I have linked the ansible template (j2) or file that is used to create this shell script. Hope this helps with understanding for the reader.
- Login to application node
- Dump the DDL for the already created employees database in postgres using pg_dump_employees.sh
./pg_dump_employees.sh- Convert the resulting employees database DDL from PostgreSQL to CockroachDB using molt_convert.sh
./molt_convert.sh- Edit the resulting file employees_converted.sql to use a new database, employees instead of creating a new schema employees
- delete the line: CREATE SCHEMA employees;
- remove every occurrence of ALTER SCHEMA employees OWNER TO postgres;
- Create the employees schema in CockroachDB using create_employee_schema.sh
./create_employee_schema.sh- Push the data from postgreSQL through the S3 to CockroachDB molt_s3.sh
./molt_s3.sh - Start replication of the data from postgreSQL through the S3 to CockroachDB using molt_s3_replicate.sh
./molt_s3_replicate.sh - Insert a row of data in psql and see that it flows to cockroachdb
psql -U postgres -h '127.0.0.1' -d employees
insert into employee values (9000, '1989-12-13', 'Taylor', 'Swift', 'F', '2022-06-26');
exit
./sql.sh
use employees;
select * from employees where id=9000;- cockroachDB create changefeed
- Migration Overview
- replicator/replicator grafana dashboards
- MOLT schema conversion
- MOLT docker example
- MOLT Fetch
- Migrate from PostgreSQL
- Logical Data Replication blog
- Physical Cluster Replication Documentation
- Logical Data Replication Documentation
dbworkload is installed as part of the ansible set up. A script to run is also configured with the correct IP addresses for running dbworkload with a standard banking demo as described in this dbworkload project home
cd /opt/dbworkload
./dbworkload.shNOTE: on teardown, may see failures on delete of some azure components. Re-running the destroy command is an option but sometime a force delete is needed on the OS disk drives of some nodes
terraform destroy