Friday, July 10, 2015

Installing apache spark standalone mode to cluster.

Hello all,

This post is about how to install apache spark on cluster.Apache Spark supports cluster manager like
1)Standalone cluster manager embedded with spark itself.
2)Apache MESOS
3)Apache YARN

This post elaborates how to install apache spark standalone on cluster.

1)The key thing for any cluster is password less ssh.

I am creating cluster of 2 machines( and So I have to configure password less communication between them.Assume that we are having spark user on both the machines.

Follow the steps given below.

 #on machine  
 spark@]#ssh-keygen -t RSA -P ""  
 #copy to  
 spark@]#ssh-copy-id  spark@  
 #on machine  
 spark@]#ssh-keygen -t RSA -P ""  
 #copy to  
 spark@]# ssh-copy-id  spark@  

2)Download  the compiled binary of apache spark , copy and untar  at the same location.In my case it is /opt/spark.

3)Go inside /{$spark-home}/conf  ( in my case it is /opt/spark/conf)
It contains slave.template file. Just execute the following commands

 #on machine  as I am considering it as master machine.
 spark@ conf]#cp slaves.template slaves  

Modify this file and add the ip address of worker nodes.
Content of file will look like this.

 # A Spark Worker will be started on each of the machines listed below.  
4)Then execute the following commands
 #on machine  
 spark@ conf]#cp  
 #on machine  
 spark@ conf]#cp  
5)Modify file on both the machine and add SPARK_MASTER_IP=
6)Now go inside the /{$spark-home}/sbin directory on master machine (In my case and execute ./ will start master and worker on master node ( and worker on 2nd slave node (
You can check it by executing jps command on both the machines.
7)Assume that we want to submit python script to spark cluster.Go inside 
 /{$spark-home}/bin and execute ./spark-submit --master spark://  path-to-python-script

No comments:

Post a Comment