5/04/2018

Setup Spark Standalone Mode Cluster With Shell Script




I am using spark-2.2.1-bin-hadoop2.7.tgz file from spark download site.
Also there are three virtual machines (151,152,153) and both Master and one of the slave will be running in 151.

Prerequisites : Installed Java 1.8 in all virtual machines.


Configure SSH Key-Based Authentication on a Linux Server

1. Run below command in all the three nodes to generate public & private keys.
   For each question just press Enter button (keep password blank).

$ ssh-keygen -t rsa

2. Copy Master public key in slave nodes. Enter user password for each node.

$ ssh-copy-id dhanuka@10.163.134.151
$ ssh-copy-id dhanuka@10.163.134.152
$ ssh-copy-id dhanuka@10.163.134.153

Copy spark for slave nodes

$ scp -r $SPARK_HOME dhanuka@10.163.134.152:~
$ scp -r $SPARK_HOME dhanuka@10.163.134.153:~


Spark environment variable setup

1. Change ~/.bashrc file as below and run source command in each node


export SPARK_HOME=/home/dhanuka/spark-2.2.1-bin-hadoop2.7

export PATH=$PATH:$SPARK_HOME/bin

2. $  source ~/.bash_profile

Bootstrap master and other slave nodes using setup.sh shell script

 $ ./setup.sh

Stop anything that is running
localhost: stopping org.apache.spark.deploy.worker.Worker
10.163.134.151: no org.apache.spark.deploy.worker.Worker to stop
10.163.134.152: stopping org.apache.spark.deploy.worker.Worker
10.163.134.153: no org.apache.spark.deploy.worker.Worker to stop
stopping org.apache.spark.deploy.master.Master
 Start Master
starting org.apache.spark.deploy.master.Master, logging to /home/dhanuka/spark-2.2.1-bin-hadoop2.7/logs/spark-dhanuka-org.apache.spark.deploy.master.Master-1-bo3uxgmpxxxxnn.out
 Start Workers
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/dhanuka/spark-2.2.1-bin-hadoop2.7/logs/spark-dhanuka-org.apache.spark.deploy.worker.Worker-1-bo3uxgmpxxxxnn.out
10.163.134.151: org.apache.spark.deploy.worker.Worker running as process 27165.  Stop it first.
10.163.134.152: starting org.apache.spark.deploy.worker.Worker, logging to /home/dhanuka/spark-2.2.1-bin-hadoop2.7/logs/spark-dhanuka-org.apache.spark.deploy.worker.Worker-1-bo3uxgmpxxxxnn.out
10.163.134.153: starting org.apache.spark.deploy.worker.Worker, logging to /home/dhanuka/spark-2.2.1-bin-hadoop2.7/logs/spark-dhanuka-org.apache.spark.deploy.worker.Worker-1-bo3uxgmpxxxxnn.out

 


Script

#!/bin/bash

source ~/.bash_profile
export SPARK_SSH_FOREGROUND="yes"

cp $SPARK_HOME/conf/spark-env.sh.template $SPARK_HOME/conf/spark-env.sh
sed -i  '$ a\SPARK_MASTER_HOST=10.163.134.151'   $SPARK_HOME/conf/spark-env.sh
sed -i  '$ a\SPARK_MASTER_WEBUI_PORT=8085'   $SPARK_HOME/conf/spark-env.sh


cp $SPARK_HOME/conf/slaves.template $SPARK_HOME/conf/slaves
sed -i  '$ a\10.163.134.151\n10.163.134.152\n10.163.134.153'   $SPARK_HOME/conf/slaves

echo " Stop anything that is running"
$SPARK_HOME/sbin/stop-all.sh

sleep 2

echo " Start Master"
$SPARK_HOME/sbin/start-master.sh


# Pause
sleep 20

echo " Start Workers"
SPARK_SSH_FOREGROUND=true  $SPARK_HOME/sbin/start-slaves.sh
 



Master Admin Panel




References:

[1] https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys--2




1 comment:

  1. sorry but i didnt understand something, so in them master we should run those 3 commands in point 2? i mean so each slave should have a ssh server running and i take their public key in the master node? i'm still new to all this so i'm not sure if i said something too trivial

    ReplyDelete