Spark RDD keyBy

- 1 min

keyBy

对RDD item执行函数操作,生成KV pairs,返回KV对的RDD。对item执行函数的返回值为key,item本身的value为value。

函数原型:

def keyByK: RDD[(K, T)]

例子:


val a = sc.parallelize(List("dog", "salmon", "salmon", "rat", "elephant"), 3)
val b = a.keyBy(_.length)
b.collect
res26: Array[(Int, String)] = Array((3,dog), (6,salmon), (6,salmon), (3,rat), (8,elephant))
comments powered by Disqus
rss facebook twitter github youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora