Fork me on GitHub

Spark With Tachyon On Yarn

Spark在HA情况下,以Tachyon为内存文件系统,如何运行在Yarn上?


测试环境

1
2
3
4
5
6
7
8
9
10
11
12
13
14
测试环境:
Ubuntu 14.04 LTS x64
Tachyon:tachyon-0.7.1-bin.tar.gz
Hadoop:hadoop-2.7.1.tar.gz
Spark:spark-1.5.2-bin-hadoop2.6.tgz
Maven:apache-maven-3.3.9-bin.tar.gz
Scala:scala-2.11.7.tgz

hostname IP role
spark-master: 192.168.108.20 master & worker
spark-slave1: 192.168.108.21 worker
spark-slave2: 192.168.108.22 worker

!默认情况全部操作在root下进行

Scala安装


Scala环境变量

1
2
3
4
5
6
7
8
9
/**
* 对每台主机做如下配置
*/
vim /etc/profile

export SCALA_HOME=/home/jabo/software/scala-2.11.7
export PATH=${SCALA_HOME}/bin:$PATH

source /etc/profile

测试Scala

1
2
3
scala -version

Scala code runner version 2.11.7 -- Copyright 2002-2013, LAMP/EPFL

Java环境安装

请参考:Ubuntu下安装JDK环境


ZooKeeper集群安装

请参考:Zookeeper集群环境搭建


Tachyon集群安装

请参考:Tachyon集群部署


Hadoop2.X集群安装

请参考:Hadoop集群环境搭建


Tachyon集群High Available

请参考:Tachyon集群High Available


Spark集群安装


Spark下载

下载地址:Spark官方下载地址

下载前请先查看,Tachyon和Spark相关版本支持

spark环境变量

1
2
3
4
5
6
vim /etc/profile

export SPARK_HOME=/home/jabo/software/spark-1.5.2-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

source /etc/profile

目录权限

1
sudo chmod -R 775 spark-1.5.2-bin-hadoop2.6/

spark-env.sh配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cp ./conf/spark-env.sh.template ./conf/spark-env.sh
vim ./conf/spark-env.sh

export SCALA_HOME=/home/jabo/software/scala-2.11.7
export JAVA_HOME=/usr/lib/jvm/java
export SPARK_MASTER_IP=spark-master
export SPARK_WORKER_MEMORY=1G
export SPARK_WORKER_PORT=7077
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

#开启HA
export SPARK_JAVA_OPTS="
-Dtachyon.zookeeper.address=spark-master:2181,spark-slave1:2181,spark-slave2:2181
-Dtachyon.usezookeeper=true
$SPARK_JAVA_OPTS
"

配置slaves

1
2
3
4
5
6
cp ./conf/slaves.template ./conf/slaves
vim ./conf/slaves

spark-master
spark-slave1
spark-slave2

新建core-site.xml

1
2
3
4
5
6
7
8
9
vim ./conf/core-site.xml 

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>fs.tachyon-ft.impl</name>
<value>tachyon.hadoop.TFSFT</value>
</property>
</configuration>

分发Spark目录

分发Spark目录到所有主机


启动Zookeeper集群

1
2
3
4
/**
* 每台主机上运行
*/
zkServer.sh start

启动Hadoop集群

1
2
3
4
/**
* spark-master上运行
*/
./sbin/start-all.sh

启动Tachyon集群

1
2
3
4
5
6
/**
* spark-master上运行
*/
./bin/tachyon format

./bin/tachyon-start.sh all NoMount

启动Spark集群

1
./sbin/start-all.sh

查看各自集群情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
root@spark-master: jps
115633 Jps
93392 JournalNode
95446 TachyonMaster
92948 NameNode
93756 ResourceManager
115442 Worker
115246 Master
3072 QuorumPeerMain
93107 DataNode
93932 NodeManager
93642 DFSZKFailoverController
95643 TachyonWorker


root@spark-slave1: jps
85099 Worker
66267 JournalNode
3021 QuorumPeerMain
65967 NameNode
66621 NodeManager
85448 Jps
67496 TachyonWorker
66448 DFSZKFailoverController
66088 DataNode

root@spark-slave2: jps
11296 NodeManager
13100 Worker
11172 JournalNode
11050 DataNode
2976 QuorumPeerMain
13166 Jps
11794 TachyonWorker
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/**
* spark-master主机上
*/
//hadoop address
http://spark-master:9000/

It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon.

//Yarn address
http://spark-master:8188

左侧栏Nodes查看节点情况
左侧栏Applications查看应用执行情况

//HDFS address
http://spark-master:50070

Datanodes查看节点情况

//Tachyon address
http://spark-master:19999

Workers查看节点情况
Browse File System查看文件

//Spark address
http://spark-master:8080/

查看节点情况

Spark With Tachyon测试

1
2
3
4
5
6
7
8
//上传文件至Tachyon
tachyon tfs copyFromLocal /home/test.txt /test

//运行
MASTER=spark://spark-master:7077 spark-shell

val s = sc.textFile("tachyon-ft://spark-master:19998/test")
s.saveAsTextFile("tachyon-ft://activeHost:19998/test_done")

Spark On Yarn测试


cluster模式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/**
* 运行过程中,可以通过Yarn WebUI查看Applications运行情况
*/

./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
lib/spark-examples*.jar \
2


//运行结果
16/01/21 11:06:15 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.108.20
ApplicationMaster RPC port: 0
queue: default
start time: 1453345452519
final status: SUCCEEDED
tracking URL: http://spark-master:8088/proxy/application_1453340102205_0006/
user: root
16/01/21 11:06:15 INFO util.ShutdownHookManager: Shutdown hook called
16/01/21 11:06:15 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-edf14341-3117-47c7-a96d-9741db4824bf

client模式

1
./bin/spark-shell --master yarn --deploy-mode client

转载请注明出处


Thank you for your support.