Skip to content

ttl256/1brc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The One Billion Row Challenge

https://github.com/gunnarmorling/1brc

Generating input data

The official repo states two ways to generate input data: 1) via create_measurements.sh, which runs a Java program 2) via src/main/python/create_measurements.py. The python version of the generator do not create a proper 10K keysize dataset and average temperature is not normally distributed.

This repo includes its own generator featuring:

  • Seeding. With custom seed it's possible to recreate the same dataset on different machines without copying over ~16GB file.
  • 10K keyset size.
  • No sampling data. Weatherstations' name is randomly generated while maintaining a the same 7th order curve to produce names.
  • Average temperature is based on latitude and normally distributed as per official generation program.

To build the executable and generate the input data run:

make create_measurements
bin/create_measurements
Usage of ./create_measurements:
  -c int
        Number of rows to generate (default 1000000000)
  -f string
        Path to output file with measurements data (default "./measurements.txt")
  -seed string
        Seed for random number generator. If len(seed) < 32 it's padded with zeroes, if len(seed) > 32 it's truncated to 32.

Run

make 1brc

Baseline implementation for performance comparison and result verification.

bin/1brc -f $path_to_output_file -baseline -print

Author's current implementation.

bin/1brc -f $path_to_output_file -print

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors