Skip to content

cockroachlabs-field/crdb-terraform-ansible

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

crdb-terraform-ansible

Overview

This repository provides a turnkey, cross-cloud deployment of multi-node CockroachDB clusters using Terraform and Ansible. It includes provider-specific Terraform modules for AWS, Azure, and GCP (with optional multi-region replicator), plus Ansible roles to configure CockroachDB nodes, HAProxy (or cloud load balancers), Kafka, and an application tier. Out-of-the-box monitoring is set up via Prometheus and Grafana. Sample applications and programming tools are also installed on the application node. Advanced features like ChangeFeeds, replicator/molt for PostgreSQL migrations, and logical/physical replication are also scaffolded for zero-downtime migrations and multi-datacenter deployments.

Outline

Directory structure

Currently, this supports AZURE, AWS and GCP now. For AWS and AZURE, the cloud provider load balancer can be used instead of haproxy. For GCP, the cloud provider load balancer is not yet supported. The goal is to have minimal changes to the ansible for each of the cloud providers. The subdirectories are:

  • ansible contains the ansible scripts
  • terraform-aws contains the aws terraform code
  • multiregionAWS contains aws multi-region terraform code. Smaller terraform code using terraform-aws folder but for multi-region
  • multiregionGCP contains gcp multi-region terraform code. Smaller terraform code using terraform-gcp folder but for multi-region
  • terraform-azure contains the azure terraform code
  • terraform-gcp contains the gcp terraform code
  • multireginAzure will come soon...

Terraform HCL to create a multi-node CockroachDB cluster.. The number of nodes can be a multiple of 3 and nodes will be evenly distributed between 3 Azure Zones. Optionally, you can include

  • haproxy VM - the proxy will be configured to connect to the cluster
  • app VM - application node that includes software for a multi-region demo
  • load balancer - for AWS and Azure, can use cloud provider load balancer instead of haproxy

Security Notes

  • firewalld has been disabled on all nodes (cluster, haproxy and app).
  • A security group is created and assigned with ports 22, 8080, 3000 and 26257 opened to a single IP address. The address is configurable as an input variable (my-ip-address)

Using the Terraform HCL

To use the HCL, you will need to define an SSH Key -- that will be used for all VMs created to provide SSH access. This is simple in both Azure and AWS but a bit more difficult in GCP. In GCP, it was much easier to create the ssh key with the gcloud api than to do this in the UI. The main.tf file for each deployment has a key name as well as a full path to the SSH key file.

Run this Terraform Script

# See the appendix below to intall Terraform, the Cloud CLIs and logging in to Cloud platforms

git clone https://github.com/jphaugla/crdb-terraform-ansible.git
cd crdb-terraform-ansible/

if you intend to use enterprise features of the database

This has changed with new enterprise license requirements. Can now use without adding a license for an initial time period add the enterprise license and the cluster organization to the following files in the region subdirectory under provisioners/temp So, for example, if the region is centralus, add the contents of your licence key to a file in provisioners/temp/centralus/enterprise_license enterprise_license
cluster_organization

Prepare

  • Use the terraform/ansible deployment using the appropriate subdirectories for selected cloud provider and single or multi-region
  • Valid the parameters in the main.tf in each of the chosen directory
  • Can enable/disable deployment of haproxy by setting the include_ha_proxy flag to "no" in deploy main.tf
  • Can enable/disable deployment of replicator using start_replicator flag in main.tf
  • Optionally can set install_enterprise_keys in main.tf
  • Depending on needs, decide whether to deploy kafka setting the include_kafka to yes or no in main.tf
  • Look up the IP address of your client workstation and put that IP address in my_ip_address
    • This allows your client workstation to access the nodes through their public IP address
    • This access is needed for the ansible scripts to perform necessary operations
  • NOTE: Inside the application node, this banking java application will be deployed and configured
    • if no need for the application to run, kill the pid. Easy to find the pid by doing a grep on java and killing the application job

Kick off terraform script

make sure to be in the subdirectory chosen above before running these commands

terraform init
terraform plan
terraform apply

Add Grafana Dashboards

Background and Links

Generic grafana prometheus plugin and grafana dashboard configure prometheus data source for grafana import grafana dashboards Detailed steps are documented in the following grafana links for cockroachDB and replicator/replicator.

Specific steps for github

Prometheus and Grafana are configured and started by the ansible scripts. Both are running as services on the haproxy node

  • Look up the haproxy node address in the region subdirectory under provisioners/temp
  • Start the grafana interface using grafana ui.
    • This grafana ui is the haproxy external node ip at port 3000
  • Change the admin login password (original login is the installation default of admin/admin)
  • configure prometheus data source for grafana
    • really this is:
      • adding the prometheus data source as documented in the link above
      • entering http://localhost:9090 for the connection URL
      • scrolling to the bottom of the UI window
      • Click save and test
  • import grafana dashboards
    • From the same grafana interface at grafana ui, Click on Dashboards using the above instructions
    • CockroachDB and replicator/terminator grafana dashboards are available within grafana dashboards folder
      • These could be stale. Refresh this folder using the getGrafanaDashboards.sh
      • import all the dashboards. One of them is for replicator and the rest are cockroachDB dashboards
      • NOTE: replicator.json is only needed if doing replicator

clean up and remove everything that was created

terraform destroy

Deploy to 2 regions with replicator

This is no longer a recommended pattern now that LDR and PCR have been released Using this same 2 region deployment can set up LDR with these scripts

Run Terraform

  • terraform apply in each region directory-reference the steps noted above
  • add license and cluster org to the provisioners/temp/
git clone https://github.com/jphaugla/crdb-terraform-ansible.git
cd crdb-terraform-ansible/terraform-azure/region1
terraform init
terraform apply
cd crdb-terraform-ansible/terraform-azure/region2
terraform init
terraform apply

Verify deployment

  • This will deploy this Digital-Banking-CockroachDB github into the application node with connectivity to cockroachDB.
    Additionally, replicator is deployed and running on the application node also with connectivity to haproxy and cockroachDB in the same region

Ensure replicator is running on each region

cd ~/crdb-terraform-ansible/provisioners/temp/{region_name}
ssh -i path_to_ssh_file adminuser@`cat app_external_ip.txt`
ps -ef |grep replicator
# if it is not running, start it
cd /opt
./start.sh

Verify application running in each region

  • NOTE: this compiling and starting of the application step has been automated in terraform so only for debug/understanding
  • The java application needs to be started manually on the application node for each region. Set up the environment file
    • the ip addresses can be found in a subdirectory under temp for each deployed region
    • Make sure to set the COCKROACH_HOST environment variable to the private IP address for the haproxy node
    • If using kafka, KAFKA_HOST should be set to the internal IP address for kafka
    • set the REGION to the correct region
  • do on each region
# NOTE: this should already be running.  If not running check log files in /mnt/datat1/bank-app
# steps below will rerun
cd ~/crdb-terraform-ansible/provisioners/temp/{region_name}
ssh -i path_to_ssh_file adminuser@`cat app_external_ip.txt`
cd Digital-Banking-CockroachDB
# edit scripts/setEnv.sh as documented above
source scripts/setEnv.sh
mvn clean package
java -jar target/cockroach-0.0.1-SNAPSHOT.jar

Deploy changefeeds

  • The necessary manual step is to deploy a CockroachDB Changefeed across the regions to make active/active replicator between the two otherwise independent regions
    • Port 30004 is open on both regions to allow the changefeed to communicate with the application server on the other region
  • Start the changefeed on each side with changefeed pointing to the other sids's application node external IP address
  • The changefeed script is written on each of the cockroach database nodes by the terraform script. Login to any of the cockroach nodes using the IP address in temp for each deployed region.
    • As previously mentioned, the changefeed script must be modified to point to the application external IP address for the other region
    • this is the step that reaches across to the other region as everything else is within region boundaries
  • IMPORTANT NOTE: Must have enterprise license for the changefeed to be enabled
  • Two different changefeeds are provided in the home directory for the adminuser on any of the cockroachDB nodes: Banking application or cockroach kv workload
    • In either case, edit the corresponding sql script using the external IP address for other regions application node
    • Banking application
      • edit create-changefeed.sql replacing the IP address before port number 30004, with the external IP address for other regions application node
      • create-changefeed.sh-creates a changefeed for the banking application
    • Cockraoch kv workload
      • edit create-changefeed-kv.sql replacing the IP address before port number 30004, with the external IP address for other regions application node
      • create-changefeed-kv.sh-creates a changefeed for the cockroachdb kv workload
cd ~/crdb-terraform-ansible/provisioners/temp/{region_name}
ssh -i path_to_ssh_file adminuser@`cat crdb_external_ip{any ip_address}`
# edit create-changefeed.sh putting the app node external IP address for the other region
cockroach sql --certs-dir=certs
SET CLUSTER SETTING cluster.organization = 'Acme Company';
SET CLUSTER SETTING enterprise.license = 'xxxxxxxxxxxx';
exit
# two different changefeed scripts are provided
vi create-changefeed-kv.sql
# or 
vi create-changefeed.sql
./create-changefeed-kv.sh
 # or 
 ./create-changefeed.sh

Verify rows are flowing across from either region by running additional test application steps or run sample kv workload from the adminuser home in the application node application machine using the provided kv-workload.sh script

Technical Documentation

Azure Documentation

Finding images

az vm image list -p "Canonical"
az vm image list -p "Microsoft"

Install Azure CLI

az upgrade az version az login (directs you to a browser login with a code -- once authenticated, your credentials will be displayed in the terminal)

Azure Links:

Microsoft Terraform Docs https://learn.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-terraform Sizes for VM machines (not very helpful) https://learn.microsoft.com/en-us/azure/virtual-machines/sizes User Data that is a static SH https://github.com/guillermo-musumeci/terraform-azure-vm-bootstrapping-2/blob/master/linux-vm-main.tf

AWS Documentation

Install AWS CLI

Install AWS CLI using homebrew

AWS Links

AWS Terraform Docs https://registry.terraform.io/providers/hashicorp/aws/latest/docs Amazon EC2 instance types https://aws.amazon.com/ec2/instance-types/

Finding AWS images
aws ec2 describe-images \
  --owners amazon \
  --filters \
    "Name=name,Values=al2023-ami-2023*" \
    "Name=architecture,Values=x86_64" \
    "Name=virtualization-type,Values=hvm" \
  --query "sort_by(Images, &CreationDate)[-1].ImageId" \
  --output text
aws ec2 describe-images \
  --owners amazon \
  --filters \
    "Name=name,Values=al2023-ami-2023*" \
    "Name=architecture,Values=arm64" \
    "Name=virtualization-type,Values=hvm" \
  --query "sort_by(Images, &CreationDate)[-1].ImageId" \
  --output text
aws ec2 describe-images \
  --owners 099720109477 \
  --filters \
    "Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-2025*" \
  --query "sort_by(Images, &CreationDate)[-1].ImageId" \
  --output text

GCP Documentation

Install GCP CLI

homebrew install manual install

GCP Links

GCP Terraform Docs https://registry.terraform.io/providers/hashicorp/google/latest/docs GCP Compute Engine Types https://cloud.google.com/compute/docs/machine-resource

Finding GCP images

gcloud compute images list \
--project=ubuntu-os-cloud \
--filter="family:( ubuntu-2204-lts )" \
--format="table[box](name, family, creationTimestamp)"

CockroachDB Links

General Links

configure prometheus data source for grafana import grafana dashboards

Terraform/Ansible Documentation

Molt-replicator

  • Molt replicator is no longer used for 2 region/DC deployments of CockroachDB but is part of zero downtime migration with molt
  • 2 region/DC deployments of CockroachDB use Logical Data Replication or Physical Cluster Replication see below
  • This github enables but does not fully automate migration and replication from PostgreSQL to CockroachDB
    • A cloud storage bucket is created on each of the three cloud providers
    • Scripts are created on the application node with the correct connection strings for an AWS deployment
      • Eventually, these scripts will be customized for each cloud provider

Running molt-replicator

To run molt-replicator

  • Turn on the processing for molt-replicator with the terraform variable setup_migration in main.tf
  • The loading of the postgres employee sample database may be turned off as it is time consuming. Enable install_employee sample in vars/main.tf
  • Use the scripts created on the application node in /home/ec2-user/ NOTE: for each of these scripts, I have linked the ansible template (j2) or file that is used to create this shell script. Hope this helps with understanding for the reader.
    • Login to application node
    • Dump the DDL for the already created employees database in postgres using pg_dump_employees.sh
./pg_dump_employees.sh
  • Convert the resulting employees database DDL from PostgreSQL to CockroachDB using molt_convert.sh
./molt_convert.sh
  • Edit the resulting file employees_converted.sql to use a new database, employees instead of creating a new schema employees
    • delete the line: CREATE SCHEMA employees;
    • remove every occurrence of ALTER SCHEMA employees OWNER TO postgres;
  • Create the employees schema in CockroachDB using create_employee_schema.sh
./create_employee_schema.sh
  • Push the data from postgreSQL through the S3 to CockroachDB molt_s3.sh
 ./molt_s3.sh 
  • Start replication of the data from postgreSQL through the S3 to CockroachDB using molt_s3_replicate.sh
 ./molt_s3_replicate.sh 
  • Insert a row of data in psql and see that it flows to cockroachdb
psql -U postgres -h '127.0.0.1' -d employees
insert into employee values (9000, '1989-12-13', 'Taylor', 'Swift', 'F', '2022-06-26');
exit
./sql.sh
use employees;
select * from employees where id=9000;

Molt replicator links

Two Datacenter Solutions

Two Datacenter Links

dbworkload

dbworkload is installed as part of the ansible set up. A script to run is also configured with the correct IP addresses for running dbworkload with a standard banking demo as described in this dbworkload project home

using dbworkload

cd /opt/dbworkload
./dbworkload.sh

To tear it all down

NOTE: on teardown, may see failures on delete of some azure components. Re-running the destroy command is an option but sometime a force delete is needed on the OS disk drives of some nodes

terraform destroy

About

create crdb cluster using terraform and ansible

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HCL 81.9%
  • Jinja 8.7%
  • Shell 6.7%
  • Python 2.3%
  • Smarty 0.4%