Skip to content

Commit db9b3dc

Browse files
committed
Add a post: minikube-hadoop-cluster.md
1 parent 7ebe5b4 commit db9b3dc

File tree

1 file changed

+186
-0
lines changed

1 file changed

+186
-0
lines changed
Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
---
2+
title: "Deploy a Hadoop Cluster by Minikube"
3+
date: 2025-05-05T17:48:50+08:00
4+
draft: false
5+
summary: A guide to deploy a Hadoop cluster by minikube
6+
tags: ["Hadoop", "Kubernetes", "Minikube", "Helm"]
7+
categories: ["English"]
8+
---
9+
10+
This post is a guide to deploy a Hadoop cluster by minikube.
11+
12+
13+
# macOS
14+
15+
## Prerequisite
16+
17+
```shell
18+
$ brew install qemu socket_vmnet helm kubectl
19+
```
20+
21+
## Start Minikube
22+
23+
```shell
24+
$ minikube start --driver qemu --network socket_vmnet \
25+
--cpus "$(($(nproc) / 2))" --memory "$(nproc)g"
26+
```
27+
28+
# Ubuntu 24.04
29+
30+
## Prerequisite
31+
32+
- [Helm](https://github.com/helm/helm/releases/latest)
33+
- [kubectl](https://kubernetes.io/releases/download/)
34+
- [Docker](https://docs.docker.com/engine/install/ubuntu/)
35+
36+
```shell
37+
# https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user
38+
$ sudo groupadd docker
39+
40+
$ sudo usermod -aG docker $USER
41+
42+
$ newgrp docker
43+
```
44+
45+
## Start Minikube
46+
47+
```shell
48+
$ minikube start --driver docker \
49+
--cpus "$(($(nproc) / 2))" --memory "$(nproc)g"
50+
```
51+
52+
# Install Hadoop Cluster by Helm
53+
54+
```shell
55+
$ git clone https://github.com/adonis0147/helm-hadoop
56+
57+
$ cd helm-hadoop
58+
59+
$ bash docker/build_image.sh
60+
61+
$ helm install --name-template hadoop .
62+
```
63+
64+
# Check the Status of Hadoop Cluster
65+
66+
```shell
67+
$ kubectl get pods
68+
69+
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
70+
dn-0 1/1 Running 0 5m14s 10.244.0.21 minikube <none> <none>
71+
dn-1 1/1 Running 0 5m9s 10.244.0.27 minikube <none> <none>
72+
dn-2 1/1 Running 0 5m5s 10.244.0.30 minikube <none> <none>
73+
hs-0 1/1 Running 0 5m13s 10.244.0.22 minikube <none> <none>
74+
jn-0 1/1 Running 0 5m14s 10.244.0.24 minikube <none> <none>
75+
jn-1 1/1 Running 0 5m7s 10.244.0.29 minikube <none> <none>
76+
jn-2 1/1 Running 0 5m1s 10.244.0.31 minikube <none> <none>
77+
nm-0 1/1 Running 0 5m14s 10.244.0.20 minikube <none> <none>
78+
nm-1 1/1 Running 0 5m9s 10.244.0.25 minikube <none> <none>
79+
nm-2 1/1 Running 0 5m6s 10.244.0.28 minikube <none> <none>
80+
nn-0 1/1 Running 0 5m14s 10.244.0.19 minikube <none> <none>
81+
nn-1 1/1 Running 1 (4m38s ago) 5m9s 10.244.0.26 minikube <none> <none>
82+
rm-0 1/1 Running 0 5m14s 10.244.0.23 minikube <none> <none>
83+
```
84+
85+
**Hadoop Cluster**
86+
87+
- `Namenode`: 2
88+
- `Journalnode`: 3
89+
- `Datanode`: 3
90+
- `Resourcemanager`: 1
91+
- `Nodemanager`: 3
92+
- `Historyserver`: 1
93+
94+
# Access the services
95+
96+
## macOS
97+
98+
```shell
99+
# Set route up
100+
$ sudo route -n delete 10.244.0.0/16
101+
$ sudo route -n add 10.244.0.0/16 "$(minikube ip)"
102+
103+
# Don't kill this process
104+
$ minikube tunnel
105+
```
106+
107+
## Ubuntu 24.04
108+
109+
```shell
110+
# Set DNS up
111+
$ interface="$(netstat -nr | grep "$(minikube ip | sed -n 's/\(.*\)\..*/\1.0/p')" |
112+
awk '{print $NF}')"
113+
$ sudo resolvectl dns "${interface}" \
114+
"$(kubectl get -n kube-system service --no-headers | awk '{print $3}')"
115+
$ sudo resolvectl domain "${interface}" cluster.local
116+
117+
# Set route up
118+
$ sudo route del -net 10.244.0.0 netmask 255.255.0.0
119+
$ sudo route add -net 10.244.0.0 netmask 255.255.0.0 gw "$(minikube ip)"
120+
121+
# Don't kill this process
122+
$ minikube tunnel
123+
```
124+
125+
# Test
126+
127+
## Ping
128+
129+
```shell
130+
$ ping nn-0.namenode.default.svc.cluster.local
131+
132+
PING nn-0.namenode.default.svc.cluster.local (10.244.0.19) 56(84) bytes of data.
133+
64 bytes from nn-0.namenode.default.svc.cluster.local (10.244.0.19): icmp_seq=1 ttl=63 time=0.069 ms
134+
64 bytes from nn-0.namenode.default.svc.cluster.local (10.244.0.19): icmp_seq=2 ttl=63 time=0.079 ms
135+
```
136+
137+
## Access HDFS
138+
139+
```shell
140+
$ kubectl exec -it nn-0 -- hadoop fs -ls /
141+
142+
Found 1 items
143+
drwxrwx--- - root supergroup 0 2025-05-05 11:10 /tmp
144+
```
145+
146+
## MapReduce Wordcount
147+
148+
```shell
149+
$ kubectl exec -it rm-0 -- bash -c 'for i in {0..999}; do echo ${i} >>numbers; done'
150+
151+
$ kubectl exec -it rm-0 -- hadoop fs -put numbers /numbers
152+
153+
$ kubectl exec -it rm-0 -- hadoop jar \
154+
hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar wordcount \
155+
/numbers /output
156+
157+
2025-05-05 11:29:30,858 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at rm-0.resourcemanager.default.svc.cluster.local/10.244.0.23:8032
158+
2025-05-05 11:29:31,682 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1746443418771_0001
159+
2025-05-05 11:29:32,242 INFO input.FileInputFormat: Total input files to process : 1
160+
2025-05-05 11:29:32,492 INFO mapreduce.JobSubmitter: number of splits:1
161+
2025-05-05 11:29:32,711 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1746443418771_0001
162+
2025-05-05 11:29:32,712 INFO mapreduce.JobSubmitter: Executing with tokens: []
163+
2025-05-05 11:29:33,054 INFO conf.Configuration: resource-types.xml not found
164+
2025-05-05 11:29:33,055 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
165+
2025-05-05 11:29:33,723 INFO impl.YarnClientImpl: Submitted application application_1746443418771_0001
166+
2025-05-05 11:29:33,854 INFO mapreduce.Job: The url to track the job: http://rm-0.resourcemanager.default.svc.cluster.local:8088/proxy/application_1746443418771_0001/
167+
2025-05-05 11:29:33,856 INFO mapreduce.Job: Running job: job_1746443418771_0001
168+
2025-05-05 11:29:45,261 INFO mapreduce.Job: Job job_1746443418771_0001 running in uber mode : false
169+
2025-05-05 11:29:45,263 INFO mapreduce.Job: map 0% reduce 0%
170+
2025-05-05 11:29:51,384 INFO mapreduce.Job: map 100% reduce 0%
171+
2025-05-05 11:30:00,470 INFO mapreduce.Job: map 100% reduce 100%
172+
2025-05-05 11:30:00,494 INFO mapreduce.Job: Job job_1746443418771_0001 completed successfully
173+
2025-05-05 11:30:00,638 INFO mapreduce.Job: Counters: 54
174+
...
175+
176+
$ kubectl exec -it rm-0 -- hadoop fs -ls /output
177+
178+
Found 2 items
179+
-rw-r--r-- 3 root supergroup 0 2025-05-05 11:29 /output/_SUCCESS
180+
-rw-r--r-- 3 root supergroup 5890 2025-05-05 11:29 /output/part-r-00000
181+
```
182+
183+
# Reference
184+
185+
- [Accessing services in minikube via DNS](https://www.andreasgerstmayr.at/2022/11/23/accessing-services-in-minikube-via-dns.html)
186+

0 commit comments

Comments
 (0)