Skip to content

gandola/pig-extensions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pig-extensions

Utilities for Apache Pig.

  • MD5: UDF that generates a MD5 and returns the hexadecimal value including leading zeros.
  • CSVExcelStorageWithPath: CSVExcelStorage extension that prepends the full file path to each tuple.
  • ExtendedMultiStorage: MultiStorage allows the removal of the key field from the output files.

##Examples

REGISTER pig-extensions-1.0.jar;

DEFINE CSVLoader pg.hadoop.pig.piggybank.CSVExcelStorageWithPath();
B = LOAD '$INPUT' USING CSVLoader AS (file_path:chararray, ...);
REGISTER pig-extensions-1.0.jar;

DEFINE MD5 pg.hadoop.pig.MD5();
...
B = FOREACH A GENERATE MD5(my_string) as md5_str;
...
REGISTER pig-extensions-1.0.jar;
...
B = FOREACH A GENERATE uid, createdAt, url;
STORE B INTO '$OUTPUT' USING pg.hadoop.pig.piggybank.ExtendedMultiStorage('$OUTPUT', '0', 'none', ',', 'false');

/**
Output includes only createdAt and url, example:
2016-01-10,http://mywebsite.com
2016-01-11,http://mywebsite1.com
...
*/

About

Utilities for Apache Pig.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages