5/04/2018
Setup Spark Standalone Mode Cluster With Shell Script
I am using spark-2.2.1-bin-hadoop2.7.tgz file from spark download site.
Also there are three virtual machines (151,152,153) and both Master and one of the slave will be running in 151.
Prerequisites : Installed Java 1.8 in all virtual machines.
Configure SSH Key-Based Authentication on a Linux Server
1. Run below command in all the three nodes to generate public & private keys.
For each question just press Enter button (keep password blank).
$ ssh-keygen -t rsa
2. Copy Master public key in slave nodes. Enter user password for each node.
$ ssh-copy-id dhanuka@10.163.134.151
$ ssh-copy-id dhanuka@10.163.134.152
$ ssh-copy-id dhanuka@10.163.134.153
Copy spark for slave nodes
$ scp -r $SPARK_HOME dhanuka@10.163.134.152:~
$ scp -r $SPARK_HOME dhanuka@10.163.134.153:~
Spark environment variable setup
1. Change ~/.bashrc file as below and run source command in each node
export SPARK_HOME=/home/dhanuka/spark-2.2.1-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
2. $ source ~/.bash_profile
Bootstrap master and other slave nodes using setup.sh shell script
$ ./setup.sh
Stop anything that is running
localhost: stopping org.apache.spark.deploy.worker.Worker
10.163.134.151: no org.apache.spark.deploy.worker.Worker to stop
10.163.134.152: stopping org.apache.spark.deploy.worker.Worker
10.163.134.153: no org.apache.spark.deploy.worker.Worker to stop
stopping org.apache.spark.deploy.master.Master
Start Master
starting org.apache.spark.deploy.master.Master, logging to /home/dhanuka/spark-2.2.1-bin-hadoop2.7/logs/spark-dhanuka-org.apache.spark.deploy.master.Master-1-bo3uxgmpxxxxnn.out
Start Workers
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/dhanuka/spark-2.2.1-bin-hadoop2.7/logs/spark-dhanuka-org.apache.spark.deploy.worker.Worker-1-bo3uxgmpxxxxnn.out
10.163.134.151: org.apache.spark.deploy.worker.Worker running as process 27165. Stop it first.
10.163.134.152: starting org.apache.spark.deploy.worker.Worker, logging to /home/dhanuka/spark-2.2.1-bin-hadoop2.7/logs/spark-dhanuka-org.apache.spark.deploy.worker.Worker-1-bo3uxgmpxxxxnn.out
10.163.134.153: starting org.apache.spark.deploy.worker.Worker, logging to /home/dhanuka/spark-2.2.1-bin-hadoop2.7/logs/spark-dhanuka-org.apache.spark.deploy.worker.Worker-1-bo3uxgmpxxxxnn.out
Script
#!/bin/bash
source ~/.bash_profile
export SPARK_SSH_FOREGROUND="yes"
cp $SPARK_HOME/conf/spark-env.sh.template $SPARK_HOME/conf/spark-env.sh
sed -i '$ a\SPARK_MASTER_HOST=10.163.134.151' $SPARK_HOME/conf/spark-env.sh
sed -i '$ a\SPARK_MASTER_WEBUI_PORT=8085' $SPARK_HOME/conf/spark-env.sh
cp $SPARK_HOME/conf/slaves.template $SPARK_HOME/conf/slaves
sed -i '$ a\10.163.134.151\n10.163.134.152\n10.163.134.153' $SPARK_HOME/conf/slaves
echo " Stop anything that is running"
$SPARK_HOME/sbin/stop-all.sh
sleep 2
echo " Start Master"
$SPARK_HOME/sbin/start-master.sh
# Pause
sleep 20
echo " Start Workers"
SPARK_SSH_FOREGROUND=true $SPARK_HOME/sbin/start-slaves.sh
Master Admin Panel
References:
[1] https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys--2
Subscribe to:
Post Comments (Atom)
sorry but i didnt understand something, so in them master we should run those 3 commands in point 2? i mean so each slave should have a ssh server running and i take their public key in the master node? i'm still new to all this so i'm not sure if i said something too trivial
ReplyDelete