在此之前,已经完成了Ubuntu16.04环境下安装配置Hadoop2.8.1集群。本教程则是要在Windows环境下搭建MapReduce开发环境,远程调用建立好的集群执行MR程序。
- 系统:Windows 10 Enterprise
- 版本:1803
- 处理器:Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
- 内存:16.0GB
- 类型:64位操作系统 64位处理器
- JDK【下载地址】(不一定与集群中的JDK版本一致,集群中安装的是jdk-8u131,而本次教程时下载的是jdk-8u181。)
- Hadoop【百度网盘】(需要与集群中Hadoop版本一致,但无需配置。解压即可。)
- winutils【下载地址】(选择与Hadoop版本对应的文件下载。)
- IDEA【下载地址】(社区版与专业版皆可。)
安装JDK
- 运行下载的JDK安装文件,选择安装路径,执行安装;
- 打开系统高级设置,设置环境变量;
- 添加环境变量
JAVA_HOME路径设置到JDK安装目录; - 将
%JAVA_HOME%\bin\添加到Path变量中; - 打开命令行窗口,执行
java -version命令,命令行显示java的版本信息则表示安装成功。
安装Hadoop
- 解压下载好的Hadoop文件;
- 打开系统高级设置,设置环境变量;
- 添加环境变量
HADOOP_HOME路径设置到Hadoop根目录; - 将
%HADOOP_HOME%\bin\添加到Path变量中。
安装winutils
- 解压下载好的winutils文件;
- 将其中与Hadoop版本(本教程中是Hadoop2.8.1)对应的文件拷贝到
%HADOOP_HOME%\bin\下。
安装IDEA
- 运行下载的IDEA安装程序,选择安装路径,执行安装。
- 启动IDEA;
- 新建项目,选择JAVA项目,选择JDK,设置项目名称和项目目录,完成;
- 在Project视图中,鼠标右击项目,选择
Open Module Settings; - 选择
Libraries,点击New Project Library,选择Java; - 在弹出框中选择
%HADOOP_HOME%\share\hadoop\下的所有文件夹,点击OK; - 点击
Add,选择%HADOOP_HOME%\share\hadoop\common\下的lib文件夹,点击OK; - 修改Name为
Hadoop,点击OK; - 在项目中src文件夹上右击选择
New Package,设置包名(如hemajun.mapred.example),点击OK; - 在新建的包上右击选择
New Class,设置类名(如WordCount),点击OK; - 编辑代码如下:
package hemajun.mapred.example; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import java.io.IOException; public final class WordCount { public static class WordCountMapper extends Mapper<LongWritable, Text, Text, LongWritable> { @Override protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, LongWritable>.Context context) throws IOException, InterruptedException { String[] words = value.toString().split("[,. ]"); for (String word : words) { if (!"".equals(word)) { context.write(new Text(word), new LongWritable(1)); } } } } public static class WordCountReducer extends Reducer<Text, LongWritable, Text, LongWritable> { @Override protected void reduce(Text key, Iterable<LongWritable> values, Reducer<Text, LongWritable, Text, LongWritable>.Context context) throws IOException, InterruptedException { long sum = 0; for (LongWritable value : values) { sum += value.get(); } context.write(key, new LongWritable(sum)); } } public static void main(String[] args) throws Exception { if (args.length != 3) { System.err.println("Usage : WordCount <in> <out>"); System.exit(2); } System.setProperty("HADOOP_USER_NAME", "root"); Configuration configuration = new Configuration(); configuration.set("fs.defaultFS", "hdfs://192.168.73.130:9000"); Job job = Job.getInstance(configuration, "WordCount"); job.setJarByClass(WordCount.class); job.setMapperClass(WordCountMapper.class); job.setCombinerClass(WordCountReducer.class); job.setReducerClass(WordCountReducer.class); job.setNumReduceTasks(2); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[1])); FileOutputFormat.setOutputPath(job, new Path(args[2])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
- 菜单栏点击
Run,选择Edit Configurations...; - 在弹出窗口中点击
Add New Configuration,选择Application; - 设置名称为"Hadoop",选择主类为刚刚定义的主类(如hemajun.mapred.example.WordCount),设置程序参数为"WorCount hdfs://192.168.73.130:9000/WordCount/input hdfs://192.168.73.130:9000/WordCount/output";
- 集群上启动Hadoop:
./sbin/start-dfs.sh && ./sbin/start-yarn.sh; - 新建输入文件夹:
./bin/hadoop fs -mkdir /WordCount/ && ./bin/hadoop fs -mkdir /WordCount/input/(更多HDFS操作可参考HDFS的Shell基本操作); - 新建输入文件:
echo "Hello Hadoop" > text.txt; - 将输入文件上传到HDFS:
./bin/hadoop fs -put ./text.txt /WordCount/input/(更多HDFS操作可参考HDFS的Shell基本操作); - 回到IDEA,菜单栏点击
Run,选择Run 'Hadoop',可看到程序正在编译执行,Run视图中可看到运行日志; - 程序运行完成后,可在HDFS上查看相应的运行日志和运行结果。