18 lines (13 loc) · 864 Bytes

ClimateSpark

Overreview

ClimateSpark is a Spark-based distributed computing framework to support big climate data management and analytics. It has the following capabilities:

Natively support HDF4 and NetCDF4 datasets stored in HDFS
Spatiotemporal query of multi-dimensional array-based climate data with high efficiency, e.g. high data locality, no redundant data reading
ClimateRDD: a multi-dimensional array-based data model for Spark to organize climate data
Basic climate data analytics

Tutorial

https://docs.google.com/document/d/1JIMLhNzXA_Ay-0P6yxzGfvUhajMLfJFyyduillzpeSI/edit

Extract the metadata: hadoop jar sia-core/target/sia-core-0.1.0.jar properties/sia_merra2_preprocessor.properties
Build index: hadoop jar sia-indexer/target/sia-indexer-0.1.0.jar properties/sia_merra2_indexer.properties