Grasshopper ----------> Master ( D 2 ): Setup Spark Standalone Mode Cluster With Shell Script

I am using spark-2.2.1-bin-hadoop2.7.tgz file from spark download site.
Also there are three virtual machines (151,152,153) and both Master and one of the slave will be running in 151.

Prerequisites : Installed Java 1.8 in all virtual machines.

Configure SSH Key-Based Authentication on a Linux Server

1. Run below command in all the three nodes to generate public & private keys.
   For each question just press Enter button (keep password blank).

$ ssh-keygen -t rsa

2. Copy Master public key in slave nodes. Enter user password for each node.

$ ssh-copy-id dhanuka@10.163.134.151
$ ssh-copy-id dhanuka@10.163.134.152
$ ssh-copy-id dhanuka@10.163.134.153

Copy spark for slave nodes

$ scp -r $SPARK_HOME dhanuka@10.163.134.152:~
$ scp -r $SPARK_HOME dhanuka@10.163.134.153:~

Spark environment variable setup

1. Change ~/.bashrc file as below and run source command in each node

export SPARK_HOME=/home/dhanuka/spark-2.2.1-bin-hadoop2.7

export PATH=$PATH:$SPARK_HOME/bin

2. $ source ~/.bash_profile

Bootstrap master and other slave nodes using setup.sh shell script

$ ./setup.sh

Stop anything that is running
localhost: stopping org.apache.spark.deploy.worker.Worker
10.163.134.151: no org.apache.spark.deploy.worker.Worker to stop
10.163.134.152: stopping org.apache.spark.deploy.worker.Worker
10.163.134.153: no org.apache.spark.deploy.worker.Worker to stop
stopping org.apache.spark.deploy.master.Master
Start Master
starting org.apache.spark.deploy.master.Master, logging to /home/dhanuka/spark-2.2.1-bin-hadoop2.7/logs/spark-dhanuka-org.apache.spark.deploy.master.Master-1-bo3uxgmpxxxxnn.out
Start Workers
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/dhanuka/spark-2.2.1-bin-hadoop2.7/logs/spark-dhanuka-org.apache.spark.deploy.worker.Worker-1-bo3uxgmpxxxxnn.out
10.163.134.151: org.apache.spark.deploy.worker.Worker running as process 27165. Stop it first.
10.163.134.152: starting org.apache.spark.deploy.worker.Worker, logging to /home/dhanuka/spark-2.2.1-bin-hadoop2.7/logs/spark-dhanuka-org.apache.spark.deploy.worker.Worker-1-bo3uxgmpxxxxnn.out
10.163.134.153: starting org.apache.spark.deploy.worker.Worker, logging to /home/dhanuka/spark-2.2.1-bin-hadoop2.7/logs/spark-dhanuka-org.apache.spark.deploy.worker.Worker-1-bo3uxgmpxxxxnn.out

Script

#!/bin/bash

source ~/.bash_profile
export SPARK_SSH_FOREGROUND="yes"

cp $SPARK_HOME/conf/spark-env.sh.template $SPARK_HOME/conf/spark-env.sh
sed -i '$ a\SPARK_MASTER_HOST=10.163.134.151'   $SPARK_HOME/conf/spark-env.sh
sed -i '$ a\SPARK_MASTER_WEBUI_PORT=8085'   $SPARK_HOME/conf/spark-env.sh

cp $SPARK_HOME/conf/slaves.template $SPARK_HOME/conf/slaves
sed -i '$ a\10.163.134.151\n10.163.134.152\n10.163.134.153'   $SPARK_HOME/conf/slaves

echo " Stop anything that is running"
$SPARK_HOME/sbin/stop-all.sh

sleep 2

echo " Start Master"
$SPARK_HOME/sbin/start-master.sh

# Pause
sleep 20

echo " Start Workers"
SPARK_SSH_FOREGROUND=true $SPARK_HOME/sbin/start-slaves.sh

Master Admin Panel

References:

[1] https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys--2

1 comment:

StevenSM said...: sorry but i didnt understand something, so in them master we should run those 3 commands in point 2? i mean so each slave should have a ssh server running and i take their public key in the master node? i'm still new to all this so i'm not sure if i said something too trivial; 3/14/2020

Grasshopper ----------> Master ( D 2 )

5/04/2018

Setup Spark Standalone Mode Cluster With Shell Script

1 comment:

Post a Comment