Skip to content
This repository was archived by the owner on Dec 16, 2021. It is now read-only.
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions tutorial/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Intro to MR4C Examples:
===========

###EXAMPLES
### EXAMPLES

1. hello world - set up a basic working algorithm
2. input output - learn how to read and write datasets
Expand All @@ -13,7 +13,7 @@ Intro to MR4C Examples:
8. mbtiles - an example using the MR4C Geospatial library to export a mbtiles file from a Skysat scene.
9. random access - when you need to deal with large datasets, you will need to read in smaller chunks. This example shows you the various ways that MR4C can help you do this.

###Run in Hadoop
### Run in Hadoop

* HDFS commands - use the format:

Expand Down
20 changes: 10 additions & 10 deletions tutorial/example1_HelloWorld/README.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,44 @@
#MR4C "Hello World" Application
# MR4C "Hello World" Application

Hello World application for MR4C


##Description
## Description

This is the classic "HelloWorld" application that should verify that your build is working correctly.

##Installation
## Installation
Navigate to the HelloWorld folder and make the project using the folowing commands:

cd ~/MR4C/examples/example1_HelloWorld
make

##Run Hello World
## Run Hello World

To run our hello world application:

./HelloWorld.sh

This should print out "Hello World" in a block labled "**NATIVE_OUTPUT**" to separate it from the mr4c messages.

##Concepts
## Concepts
To fully understand how a basic MR4C application works, review the following files in your installation folder.

###/src/helloworld.cpp
### /src/helloworld.cpp
defines the Example class that executes an algorithm (cout<<"hello world"<<endl;)
when it is registered with MR4C_REGISTER_ALGORITHM(name, algoPtr);

###makefile
### makefile
instructions to build the HelloWorld shared object from helloworld.cpp

###/helloworld.json
### /helloworld.json
mr4c configuration file that tells mr4c to call a "HelloWorld" object from the "NATIVEC" library called "HelloWorld"

###Execute
### Execute
This bash script executes mr4c using the configuration file helloworld.json.
Additionally, it adds our /lib/helloworld.so to the LD_LIBRARY_PATH variable, making it available to mr4c.

##Conclusion
## Conclusion
This should have illustrated some of the basic ideas behind implimenting an algorithm in MR4C.
Assuming that this example worked successfully, please try the ChangeImage example next.
If you have any trouble getting this working or have questions or comments, we would love to hear from you.
Expand Down
8 changes: 4 additions & 4 deletions tutorial/example2_IO/README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
#MR4C Change Image Example
# MR4C Change Image Example


##Description
## Description

This example application illustrates simple input/output using MR4C.

Contact mr4c@googlegroups.com with any question or comments.

##Build
## Build
Navigate to the example2_IO folder and run the following commands:

make

##Running bbChangeImage
## Running bbChangeImage

To run bbChangeImage execute the following command from the cloned folder:

Expand Down
4 changes: 2 additions & 2 deletions tutorial/example3_Dimensions/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#Dimensions
# Dimensions

This example builds on the IO example but instead of specifying files for input and output this algorithm reads/writes all files to/from a folder/URI using dimensions. This makes the algrithm very flexible and scale to very large datasets with a lot of elements.

###Configuration:
### Configuration:
We use the mappers in dimension.json to configure our dataset dimensions:

"mapper" : {
Expand Down
2 changes: 1 addition & 1 deletion tutorial/example4_ExternalLib/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
##Introduction
## Introduction
The ExternalLib example is our first example utilizing a third party library in addition to our own algorithm.
Additionally, we are using GDAL to read a geotiff and report some of the important metadata.
This method can be used to convert a MR4C::Dataset into a GDALDataset
Expand Down
10 changes: 5 additions & 5 deletions tutorial/example5_json/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
#MR4C json Example
# MR4C json Example

##Description
## Description

This example application illustrates how to read json formated data.
We will use a SkySat metadata file as an example.

Contact mr4c@googlegroups.com with any question or comments.

##Build
## Build

Navigate to the example5_json folder and run the following command:

make

##Running json.sh
## Running json.sh

To run the json example, execute the following command:

Expand All @@ -23,7 +23,7 @@ This will input the file from the input folder, execute the algorithm,
and output some of the important records stored in the json file to stdout
in the section between the **NATIVE_OUTPUT** header/footer.

##Concepts
## Concepts
If you open the SkySat metadata file from the ./input folder,
you will quickly see that there is a lot of information that
can be a little difficult to interpret with a simple read.
Expand Down
4 changes: 2 additions & 2 deletions tutorial/example6_ImageAlgo/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
##Introduction
## Introduction
This example builds on everything we learned in the previous examples:

1. Read in several files from a directory
Expand All @@ -9,7 +9,7 @@ This example builds on everything we learned in the previous examples:
6. Use MR4C dimensions to write output files from memory


##References
## References

- Please refer to the [GDAL documentation](http://www.gdal.org/cpl__vsi_8h.html) for more info on how to use virtual files.
- Please download and build [GDAL from source](http://trac.osgeo.org/gdal/wiki/BuildHints).
6 changes: 3 additions & 3 deletions tutorial/example7_yarn/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Using YARN Resource Allocation with MR4C
===========
# Using YARN Resource Allocation with MR4C

The previous examples illustrate the fundamental interfaces that MR4C uses to connect algorithms to datasets, but you may notice that we have only worked on the local machine using the mr4c executable. This example will show you the true power of MR4C by executing the algorithms in Hadoop using the mr4c_hadoop executable.

Additionally, we introduce a simple map reduce workflow including two steps that work in conjunction to allow you to split the work accross into many processes, and reduce the results from each of those processes into a single answer. While we do that we will also introduce the MR4C/YARN features using dynamic resource allocation. We use two algorithms configured with map.json and reduce.json. The resources are allocated in mapReduce.sh using the following parameters:
Expand All @@ -25,7 +25,7 @@ The cluster will be queried for its resource limits:

The Hcluster names can be configured in the $MR4C_HOME/bin/java_yarn/conf/site.json file. Alternatively, you can assign a site.json using the $MR4C_SITE variable.

##MapReduce
## MapReduce
The two algorithms in this example illustrate basic mapper and reducer steps:
* Mapper: collect individual histograms for the pixel values from a series of input images in a map step.
* Reducer: combine all the histograms into a single histogram for all input images.
Expand Down
10 changes: 5 additions & 5 deletions tutorial/example8_mbtiles/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
##Algorithm name: mbtiles
## Algorithm name: mbtiles

###Description:
### Description:
MR4C now supports output in MBtiles format for upload to mapbox.com or any other compatible application. The following example illustrates how to output MBtiles using the MR4C Geospatial library.

###Parameters:
### Parameters:
* name: required, becomes "name" in mbtiles metadata
* description: optional, becomes description in mbtiles metadata
* version: optional, integer, default=1, becomes version in mbtiles metadata
Expand All @@ -13,7 +13,7 @@ MR4C now supports output in MBtiles format for upload to mapbox.com or any other
* maxZoom: optional, if not provided algo will compute max zoom so tile size is the largest tile that does not need to be down-sampled


###Input Data:
### Input Data:

* Geotiff
* 8 or 16 bit rendered values
Expand All @@ -25,7 +25,7 @@ To reproject:
gdalwarp -wm 2048 -r cubic -t_srs EPSG:3857 input.tif inputWeb.tif


###Run Script
### Run Script

The run script is configured to get the input dataset from the local folder and put it to hdfs.
After the output.mbtiles file is created, the script will get the file to the local folder.
Expand Down
10 changes: 5 additions & 5 deletions tutorial/example9_RandomAccess/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Using Random Access within a DataFile
===========
# Using Random Access within a DataFile

In the previous examples we have assumed that the dataset is made up of small files that can be read into memory one at a time. However sometimes we need to read part of a file or write to a specific part of a file. This example illustrates how to use the MR4C RandomAccessFile class to deal with larger and/or more complex I/O.

###Sequential Read/Write
### Sequential Read/Write
If all you need to do is read/write through a file using a small buffer to avoid loading the whole thing at once you can use a MR4C::DataFile object and use the read, write , and skip functions. These methods will be more performant because they do not require a temporary file and can access data directly through HDFS.

While this example is focused on random access, you can easily do the same thing directly with a DataFile object, only with the limitation that you have to move sequentially forward through the file:
Expand All @@ -19,14 +19,14 @@ While this example is focused on random access, you can easily do the same thing
file->skip(buffersize);
}

###Random Access
### Random Access
If you need to use a variable size buffer and read and write to any part of a file then you will need to instantiate a RandomAccessFile object and use the similar member functions.

In the example, we read an input dataset and iterate through the dataset keys and instantiate a RandomAccessFile object for each file. We then read some random blocks into a 100 byte buffer and print them to stdout. Finally, we create some output files and write some of the content that we extracted from the input files as well as some modified content to arbitrary locations within a 1000 byte file.

Input and output datasets are stored in HDFS, please refer to the RandomAccess.json and RandomAccess.sh files to understand the staging process.

###Execution Example
### Execution Example
To execute:

./RandomAccess.sh
Expand Down