Friday, July 10, 2015

Installing apache spark standalone mode to cluster.

Hello all,

This post is about how to install apache spark on cluster.Apache Spark supports cluster manager like
1)Standalone cluster manager embedded with spark itself.
2)Apache MESOS
3)Apache YARN

This post elaborates how to install apache spark standalone on cluster.

1)The key thing for any cluster is password less ssh.

I am creating cluster of 2 machines(192.168.5.11 and 192.168.9.159). So I have to configure password less communication between them.Assume that we are having spark user on both the machines.

Follow the steps given below.

 #on machine 192.168.5.11  
 spark@192.168.5.11]#ssh-keygen -t RSA -P ""  
 #copy id_rsa.pub to 192.168.9.159  
 spark@192.168.5.11]#ssh-copy-id  spark@192.168.9.159  
 #on machine 192.168.9.159  
 spark@192.168.9.159]#ssh-keygen -t RSA -P ""  
 #copy id_rsa.pub to 192.168.5.11  
 spark@192.168.9.159]# ssh-copy-id  spark@192.168.5.11  

2)Download  the compiled binary of apache spark , copy and untar  at the same location.In my case it is /opt/spark.

3)Go inside /{$spark-home}/conf  ( in my case it is /opt/spark/conf)
It contains slave.template file. Just execute the following commands
 

 #on machine 192.168.9.159  as I am considering it as master machine.
 spark@192.168.9.159 conf]#cp slaves.template slaves  

Modify this file and add the ip address of worker nodes.
Content of file will look like this.

 # A Spark Worker will be started on each of the machines listed below.  
 192.168.9.159  
 192.168.5.11  
4)Then execute the following commands
 #on machine 192.168.5.11  
 spark@192.168.5.11 conf]#cp spark-env.sh.template spark-env.sh  
 #on machine 192.168.9.159  
 spark@192.168.9.159 conf]#cp spark-env.sh.template spark-env.sh  
5)Modify spark-env.sh file on both the machine and add SPARK_MASTER_IP=192.168.9.159
6)Now go inside the /{$spark-home}/sbin directory on master machine (In my case 192.168.9.159) and execute ./start-all.sh.It will start master and worker on master node (192.168.9.159) and worker on 2nd slave node (192.168.5.11)
You can check it by executing jps command on both the machines.
7)Assume that we want to submit python script to spark cluster.Go inside 
 /{$spark-home}/bin and execute ./spark-submit --master spark://192.168.9.159:7077  path-to-python-script

No comments:

Post a Comment