3.4.8 集群管理

集群管理模块可以对平台的所有集群的配置信息统一管理,这个模块只有平台管理员的权限角色才可以进入。

3.4.8.1新增集群

如下图选择一种集群类型。spark集群用于处理Spark任务,包括定时和周期的离线任务和Spark Stream任务,flink集群用于处理实时Flink任务,spark集群比flink集群多出一个spark.conf配置。

3.4.8.2 配置集群

  • 基本信息

3.4.8.3 集群的具体配置

样例如下:

#jobserver-tomcat 的 conf/server.xml里的提供服务的端口
server.port = 7002
spark.home = /home/admin/spark-2.4.0-bin-2.6.0-cdh5.15.0
spark.yarn.jars = /user/yarn_jars/spark_2.4.0_2.0.0/*
spark.jobserver.jar = /user/yarn_jars/spark_2.4.0_2.0.0/jobserver-yarn-2.4.0.jar

#datacompute地址
datacompute.addr =http://10.57.26.5:8181

#jobserver-control 和 jobserver-yarn 用到kerberos用户名
#hadoop.kerberos.user=tdkj1@HADOOP.COM

hadoop.yarn.webui = http://cdh173:8088
spark.create.table.enabled = true
jobserver.profile = dev
jobserver.console.url = http://cdh173:8088/proxy/
spark.jobserver.control.url = http://10.57.30.218:7002

datacompute.git = https://gitlab.fraudmetrix.cn
jobserver.parquet.write.users = jian.tang

spark.dc.column.authorization.enabled = true

spark.executor.extraJavaOptions = -Dfile.encoding=UTF-8
spark.driver.extraJavaOptions = -Dfile.encoding=UTF-8


//集群提交限制
spark.jobserver.maxAccept = 8
spark.jobserver.maxNum =20
spark.jobserver.yarn.limitMemory = 1024
spark.jobserver.yarn.limitCore = 2

3.4.8.4 spark.conf的配置

样例如下:

spark.yarn.queue default
spark.master yarn
#spark.dc.url  http://cdh174:8181
spark.yarn.dist.jars hdfs://tdhdfs/user/yarn_jars/aspectjweaver-1.8.10.jar,hdfs://tdhdfs/user/yarn_jars/jvm_profiler/jvm-profiler-0.0.9.jar,hdfs://tdhdfs/user/yarn_jars/sql_extensions/spark-sql-extension-1.0.1.jar

#spark调度参数
spark.speculation  false
spark.speculation.interval  1000
spark.speculation.multiplier  1.5
spark.speculation.quantile  0.75
spark.task.cpus  1

#spark执行参数
spark.broadcast.blockSize  4m
spark.default.parallelism  8
spark.files.useFetchCache  true
spark.files.maxPartitionBytes  134217728
spark.storage.memoryMapThreshold  10m
spark.files.overwrite  true
spark.eventLog.logStageExecutorMetrics.enabled true
spark.eventLog.logStageExecutorProcessTreeMetrics.enabled true


#spark启动参数
spark.executor.instances  1
spark.executor.memory  1G
spark.driver.memory  1G
spark.driver.cores  1
spark.executor.memoryOverhead  512m
spark.driver.memoryOverhead  521m
spark.yarn.queue  root.user.admin
spark.hive.init  true
spark.tispark.pd.addresses  10.58.10.33:2379
spark.tispark.plan.allow_index_read  true
#sql增强 权限之类
#spark.sql.extensions  cn.tongdun.sql.TDExtensions

#sparksql参数
spark.sql.codegen  false
spark.sql.shuffle.partitions  200
spark.sql.parquet.cacheMetadata  true
spark.sql.inMemoryColumnarStorage.compressed  true
spark.sql.inMemoryColumnarStorage.batchSize  10000
spark.sql.catalogImplementation  hive
#spark.sql.parquet.compression.codec zstd

#spark动态分配参数
spark.dynamicAllocation.enabled  false
spark.shuffle.service.enabled  true
spark.dynamicAllocation.executorIdleTimeout=300s
spark.dynamicAllocation.minExecutors 6
spark.dynamicAllocation.initialExecutors  0
spark.dynamicAllocation.maxExecutors  30

spark.hadoop.hive.exec.compress.output  true
spark.hadoop.mapreduce.output.fileoutputformat.compress.codec  org.apache.hadoop.io.compress.SnappyCodec
spark.hadoop.hive.output.file.extension  .snappy.parquet
spark.hadoop.parquet.metadata.read.parallelism  8
spark.hadoop.parquet.compress  SNAPPY


spark.parquet.column.index.access true
spark.sql.parquet.mergeSchema  false
spark.submit.tasks.threshold.enabled  true
spark.submit.tasks.threshold  10000
spark.driver.extraLibraryPath   /opt/cloudera/parcels/CDH/lib/hadoop/lib/native
spark.executor.extraLibraryPath /opt/cloudera/parcels/CDH/lib/hadoop/lib/native
spark.eventLog.enabled          true
spark.eventLog.dir              hdfs://tdhdfs/tmp/spark
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.updatejar.enabled         true

3.4.8.4 各种 xxx-site.xml配置文件

从cdh的hive管理界面下载,然后逐一复制上去:

3.4.8.5 kerberos.conf配置

若启用了kerberos,则将krb5.conf里的内容复制到这里:

results matching ""

    No results matching ""