Spark RDD coalesce, repartition

- 1 min

coalesce, repartition

coalesce和repartition都是用来对RDD数据进行重新partition分布的函数。

coalesce对RDD数据进行制定partition num的重新分配,repartition(numPartitions)是对coalesce(numPartitions, shuffle=true)的包装

函数原型

def coalesce ( numPartitions : Int , shuffle : Boolean = false ): RDD [T] def repartition ( numPartitions : Int ): RDD [T]

例子:


val y = sc.parallelize(1 to 10, 10)
val z = y.coalesce(2, false)
z.partitions.length
res9: Int = 2
comments powered by Disqus
rss facebook twitter github youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora