Skip to content

fiskolini/clickhouse-datawarehouse-sample

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ClickHouse Installation Examples

This repository contains two examples of ClickHouse installations, each with its own directory and configuration files:

1. Simple Installation of ClickHouse Server with Grafana

In the clickhouse-server directory, you will find a simple installation of ClickHouse Server along with Grafana for monitoring. This installation is designed for a single ClickHouse instance. You can find the container's information in the docker-compose.yml file and the relevant configuration files in the .docker directory.

Usage

To set up this installation, follow these steps:

  1. Navigate to the clickhouse-server directory.
  2. Review the configuration files in the .docker directory to customize the installation to your requirements.
  3. Use the provided Docker Compose file or other instructions to start the ClickHouse Server and Grafana.

Add sample data

ClickHouse supports some sample datasets that can be used to populate data and to try the capabilities of ClickHouse.

How to set a sample dataset:

  1. Make sure you have the clickhouse instance(s) up and running;
  2. Download the necessary sample data by running wget --no-check-certificate --continue https://transtats.bts.gov/PREZIP/On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2022_1.zip
    1. Their dataset is huge (around 41GB). If you still want to use the whole data, write a simple script that can download the files from 1987 to 2022 and the 12 different parts from each of them (https://transtats.bts.gov/PREZIP/On_Time_Reporting_Carrier_On_Time_Performance_1987_present_{1987..2022}_{1..12}.zip).
  3. After having the file locally, run the following command to ingest data from them: ls -1 *.zip | xargs -I{} -P $(nproc) bash -c "echo {}; unzip -cq {} '*.csv' | sed 's/\.00//g' | clickhouse-client --input_format_csv_empty_as_default 1 --query='INSERT INTO ontime FORMAT CSVWithNames'"

2. High Availability Installation of ClickHouse

In the clickhouse-ha-server directory, you will find a more complex setup of ClickHouse with high availability. This configuration includes:

  • Two ClickHouse instances (clickhouse-01 and clickhouse-02).
  • Three dedicated ClickHouse Keepers.
  • A CH Proxy load balancer.
  • One shard with replication across clickhouse-01 and clickhouse-02.

The configuration files for this installation can be found in the .docker directory within the clickhouse-ha-server directory.

Usage

To set up this high availability ClickHouse installation, follow these steps:

  1. Navigate to the clickhouse-ha-server directory.
  2. Review the configuration files in the .docker directory to customize the installation to your requirements.
  3. Use the provided Docker Compose file or other instructions to start the high availability ClickHouse setup.

3. Additional Notes

  • Both installations use Docker Compose to manage the containers and simplify deployment.
  • The provided configurations are examples and may need to be adjusted for specific production environments.
  • Refer to the official ClickHouse documentation for detailed configuration options and advanced usage scenarios.
  • The presentation slides can be found here (Google Slides) and a copy of them can be found in the presentation.pdf file.

About

Sample docker installations for setting your own ClickHouse Datawarehouse

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors