From a9eb4f2f1d6a5ab1e9ee4805594bb6cbd4d4572a Mon Sep 17 00:00:00 2001 From: Ruslan Dautkhanov Date: Fri, 25 Oct 2019 10:02:27 -0600 Subject: [PATCH 01/12] Update README.md --- tutorial/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tutorial/README.md b/tutorial/README.md index 86d56bf..84eeaf4 100644 --- a/tutorial/README.md +++ b/tutorial/README.md @@ -1,7 +1,7 @@ Intro to MR4C Examples: =========== -###EXAMPLES +### EXAMPLES 1. hello world - set up a basic working algorithm 2. input output - learn how to read and write datasets @@ -13,7 +13,7 @@ Intro to MR4C Examples: 8. mbtiles - an example using the MR4C Geospatial library to export a mbtiles file from a Skysat scene. 9. random access - when you need to deal with large datasets, you will need to read in smaller chunks. This example shows you the various ways that MR4C can help you do this. -###Run in Hadoop +### Run in Hadoop * HDFS commands - use the format: From ed4e19cbc017bcdb01322d102b0440a9fd4816a1 Mon Sep 17 00:00:00 2001 From: Ruslan Dautkhanov Date: Fri, 25 Oct 2019 10:03:06 -0600 Subject: [PATCH 02/12] Update README.md --- tutorial/example1_HelloWorld/README.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/tutorial/example1_HelloWorld/README.md b/tutorial/example1_HelloWorld/README.md index bf8bc9c..5b8e3aa 100644 --- a/tutorial/example1_HelloWorld/README.md +++ b/tutorial/example1_HelloWorld/README.md @@ -1,19 +1,19 @@ -#MR4C "Hello World" Application +# MR4C "Hello World" Application Hello World application for MR4C -##Description +## Description This is the classic "HelloWorld" application that should verify that your build is working correctly. -##Installation +## Installation Navigate to the HelloWorld folder and make the project using the folowing commands: cd ~/MR4C/examples/example1_HelloWorld make -##Run Hello World +## Run Hello World To run our hello world application: @@ -21,24 +21,24 @@ To run our hello world application: This should print out "Hello World" in a block labled "**NATIVE_OUTPUT**" to separate it from the mr4c messages. -##Concepts +## Concepts To fully understand how a basic MR4C application works, review the following files in your installation folder. -###/src/helloworld.cpp +### /src/helloworld.cpp defines the Example class that executes an algorithm (cout<<"hello world"< Date: Fri, 25 Oct 2019 10:03:22 -0600 Subject: [PATCH 03/12] Update README.md --- tutorial/example1_HelloWorld/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorial/example1_HelloWorld/README.md b/tutorial/example1_HelloWorld/README.md index 5b8e3aa..80c522a 100644 --- a/tutorial/example1_HelloWorld/README.md +++ b/tutorial/example1_HelloWorld/README.md @@ -31,7 +31,7 @@ when it is registered with MR4C_REGISTER_ALGORITHM(name, algoPtr); ### makefile instructions to build the HelloWorld shared object from helloworld.cpp -###/helloworld.json +### /helloworld.json mr4c configuration file that tells mr4c to call a "HelloWorld" object from the "NATIVEC" library called "HelloWorld" ### Execute From 47c46b6978deed2fe2ea495ac8c5b5bdf95d469d Mon Sep 17 00:00:00 2001 From: Ruslan Dautkhanov Date: Fri, 25 Oct 2019 10:23:21 -0600 Subject: [PATCH 04/12] Update README.md --- tutorial/example2_IO/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tutorial/example2_IO/README.md b/tutorial/example2_IO/README.md index 45d4fde..204237a 100644 --- a/tutorial/example2_IO/README.md +++ b/tutorial/example2_IO/README.md @@ -1,18 +1,18 @@ -#MR4C Change Image Example +# MR4C Change Image Example -##Description +## Description This example application illustrates simple input/output using MR4C. Contact mr4c@googlegroups.com with any question or comments. -##Build +## Build Navigate to the example2_IO folder and run the following commands: make -##Running bbChangeImage +## Running bbChangeImage To run bbChangeImage execute the following command from the cloned folder: From 4a4a73e95a0c899d87fa115c01c345c6d3d52a24 Mon Sep 17 00:00:00 2001 From: Ruslan Dautkhanov Date: Fri, 25 Oct 2019 10:23:46 -0600 Subject: [PATCH 05/12] Update README.md --- tutorial/example3_Dimensions/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tutorial/example3_Dimensions/README.md b/tutorial/example3_Dimensions/README.md index 218b2b1..8f9568b 100644 --- a/tutorial/example3_Dimensions/README.md +++ b/tutorial/example3_Dimensions/README.md @@ -1,8 +1,8 @@ -#Dimensions +# Dimensions This example builds on the IO example but instead of specifying files for input and output this algorithm reads/writes all files to/from a folder/URI using dimensions. This makes the algrithm very flexible and scale to very large datasets with a lot of elements. -###Configuration: +### Configuration: We use the mappers in dimension.json to configure our dataset dimensions: "mapper" : { From 8078061f576b1fdc10a174f1fba1227eec75af9c Mon Sep 17 00:00:00 2001 From: Ruslan Dautkhanov Date: Fri, 25 Oct 2019 10:24:06 -0600 Subject: [PATCH 06/12] Update README.md --- tutorial/example4_ExternalLib/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorial/example4_ExternalLib/README.md b/tutorial/example4_ExternalLib/README.md index e6aca15..07d12ad 100644 --- a/tutorial/example4_ExternalLib/README.md +++ b/tutorial/example4_ExternalLib/README.md @@ -1,4 +1,4 @@ -##Introduction +## Introduction The ExternalLib example is our first example utilizing a third party library in addition to our own algorithm. Additionally, we are using GDAL to read a geotiff and report some of the important metadata. This method can be used to convert a MR4C::Dataset into a GDALDataset From a16ce0af2c4427dc352149209b76efe545108bbe Mon Sep 17 00:00:00 2001 From: Ruslan Dautkhanov Date: Fri, 25 Oct 2019 10:24:25 -0600 Subject: [PATCH 07/12] Update README.md --- tutorial/example5_json/README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/tutorial/example5_json/README.md b/tutorial/example5_json/README.md index 0c77020..c4e54df 100644 --- a/tutorial/example5_json/README.md +++ b/tutorial/example5_json/README.md @@ -1,19 +1,19 @@ -#MR4C json Example +# MR4C json Example -##Description +## Description This example application illustrates how to read json formated data. We will use a SkySat metadata file as an example. Contact mr4c@googlegroups.com with any question or comments. -##Build +## Build Navigate to the example5_json folder and run the following command: make -##Running json.sh +## Running json.sh To run the json example, execute the following command: @@ -23,7 +23,7 @@ This will input the file from the input folder, execute the algorithm, and output some of the important records stored in the json file to stdout in the section between the **NATIVE_OUTPUT** header/footer. -##Concepts +## Concepts If you open the SkySat metadata file from the ./input folder, you will quickly see that there is a lot of information that can be a little difficult to interpret with a simple read. From c8f7f8870933037b565117c0e39adc4ba6dff1e9 Mon Sep 17 00:00:00 2001 From: Ruslan Dautkhanov Date: Fri, 25 Oct 2019 10:24:45 -0600 Subject: [PATCH 08/12] Update README.md --- tutorial/example6_ImageAlgo/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tutorial/example6_ImageAlgo/README.md b/tutorial/example6_ImageAlgo/README.md index 0cd5591..eefcc30 100644 --- a/tutorial/example6_ImageAlgo/README.md +++ b/tutorial/example6_ImageAlgo/README.md @@ -1,4 +1,4 @@ -##Introduction +## Introduction This example builds on everything we learned in the previous examples: 1. Read in several files from a directory @@ -9,7 +9,7 @@ This example builds on everything we learned in the previous examples: 6. Use MR4C dimensions to write output files from memory -##References +## References - Please refer to the [GDAL documentation](http://www.gdal.org/cpl__vsi_8h.html) for more info on how to use virtual files. - Please download and build [GDAL from source](http://trac.osgeo.org/gdal/wiki/BuildHints). From 79d2a5bfb61d369f7d8939b28400ccfa0389b148 Mon Sep 17 00:00:00 2001 From: Ruslan Dautkhanov Date: Fri, 25 Oct 2019 10:25:09 -0600 Subject: [PATCH 09/12] Update README.md --- tutorial/example7_yarn/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tutorial/example7_yarn/README.md b/tutorial/example7_yarn/README.md index 4f7fd51..8fd2453 100644 --- a/tutorial/example7_yarn/README.md +++ b/tutorial/example7_yarn/README.md @@ -1,5 +1,5 @@ -Using YARN Resource Allocation with MR4C -=========== +# Using YARN Resource Allocation with MR4C + The previous examples illustrate the fundamental interfaces that MR4C uses to connect algorithms to datasets, but you may notice that we have only worked on the local machine using the mr4c executable. This example will show you the true power of MR4C by executing the algorithms in Hadoop using the mr4c_hadoop executable. Additionally, we introduce a simple map reduce workflow including two steps that work in conjunction to allow you to split the work accross into many processes, and reduce the results from each of those processes into a single answer. While we do that we will also introduce the MR4C/YARN features using dynamic resource allocation. We use two algorithms configured with map.json and reduce.json. The resources are allocated in mapReduce.sh using the following parameters: From 02a2d8c82c851f406a78c6b553af6a122575c9b8 Mon Sep 17 00:00:00 2001 From: Ruslan Dautkhanov Date: Fri, 25 Oct 2019 10:25:26 -0600 Subject: [PATCH 10/12] Update README.md --- tutorial/example7_yarn/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorial/example7_yarn/README.md b/tutorial/example7_yarn/README.md index 8fd2453..ff346a5 100644 --- a/tutorial/example7_yarn/README.md +++ b/tutorial/example7_yarn/README.md @@ -25,7 +25,7 @@ The cluster will be queried for its resource limits: The Hcluster names can be configured in the $MR4C_HOME/bin/java_yarn/conf/site.json file. Alternatively, you can assign a site.json using the $MR4C_SITE variable. -##MapReduce +## MapReduce The two algorithms in this example illustrate basic mapper and reducer steps: * Mapper: collect individual histograms for the pixel values from a series of input images in a map step. * Reducer: combine all the histograms into a single histogram for all input images. From d026e4325c74ac3bfc246b71111c18753e57bc2f Mon Sep 17 00:00:00 2001 From: Ruslan Dautkhanov Date: Fri, 25 Oct 2019 10:26:09 -0600 Subject: [PATCH 11/12] Update README.md --- tutorial/example8_mbtiles/README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/tutorial/example8_mbtiles/README.md b/tutorial/example8_mbtiles/README.md index 29a6be7..78d1c7c 100644 --- a/tutorial/example8_mbtiles/README.md +++ b/tutorial/example8_mbtiles/README.md @@ -1,9 +1,9 @@ -##Algorithm name: mbtiles +## Algorithm name: mbtiles -###Description: +### Description: MR4C now supports output in MBtiles format for upload to mapbox.com or any other compatible application. The following example illustrates how to output MBtiles using the MR4C Geospatial library. -###Parameters: +### Parameters: * name: required, becomes "name" in mbtiles metadata * description: optional, becomes description in mbtiles metadata * version: optional, integer, default=1, becomes version in mbtiles metadata @@ -13,7 +13,7 @@ MR4C now supports output in MBtiles format for upload to mapbox.com or any other * maxZoom: optional, if not provided algo will compute max zoom so tile size is the largest tile that does not need to be down-sampled -###Input Data: +### Input Data: * Geotiff * 8 or 16 bit rendered values @@ -25,7 +25,7 @@ To reproject: gdalwarp -wm 2048 -r cubic -t_srs EPSG:3857 input.tif inputWeb.tif -###Run Script +### Run Script The run script is configured to get the input dataset from the local folder and put it to hdfs. After the output.mbtiles file is created, the script will get the file to the local folder. From 85a6ad121fe8f083d310fad12d6c326a90c9a59e Mon Sep 17 00:00:00 2001 From: Ruslan Dautkhanov Date: Fri, 25 Oct 2019 10:26:34 -0600 Subject: [PATCH 12/12] Update README.md --- tutorial/example9_RandomAccess/README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/tutorial/example9_RandomAccess/README.md b/tutorial/example9_RandomAccess/README.md index 623a429..7b87e22 100644 --- a/tutorial/example9_RandomAccess/README.md +++ b/tutorial/example9_RandomAccess/README.md @@ -1,8 +1,8 @@ -Using Random Access within a DataFile -=========== +# Using Random Access within a DataFile + In the previous examples we have assumed that the dataset is made up of small files that can be read into memory one at a time. However sometimes we need to read part of a file or write to a specific part of a file. This example illustrates how to use the MR4C RandomAccessFile class to deal with larger and/or more complex I/O. -###Sequential Read/Write +### Sequential Read/Write If all you need to do is read/write through a file using a small buffer to avoid loading the whole thing at once you can use a MR4C::DataFile object and use the read, write , and skip functions. These methods will be more performant because they do not require a temporary file and can access data directly through HDFS. While this example is focused on random access, you can easily do the same thing directly with a DataFile object, only with the limitation that you have to move sequentially forward through the file: @@ -19,14 +19,14 @@ While this example is focused on random access, you can easily do the same thing file->skip(buffersize); } -###Random Access +### Random Access If you need to use a variable size buffer and read and write to any part of a file then you will need to instantiate a RandomAccessFile object and use the similar member functions. In the example, we read an input dataset and iterate through the dataset keys and instantiate a RandomAccessFile object for each file. We then read some random blocks into a 100 byte buffer and print them to stdout. Finally, we create some output files and write some of the content that we extracted from the input files as well as some modified content to arbitrary locations within a 1000 byte file. Input and output datasets are stored in HDFS, please refer to the RandomAccess.json and RandomAccess.sh files to understand the staging process. -###Execution Example +### Execution Example To execute: ./RandomAccess.sh