1 回答

TA貢獻(xiàn)1775條經(jīng)驗(yàn) 獲得超8個贊
由于問題沒有提供簡單的數(shù)據(jù),我想該wordCounts變量是通過以下代碼準(zhǔn)備的。
import pprint
from pyspark.context import SparkContext
sc = SparkContext('local', 'test')
pairs = sc.parallelize([("a", 1), ("b", 1), ("b", 1), ("b", 1), ("b", 1), ("b", 1), ("d", 1), ("e", 1), ("a", 1), ("f", 1), ("c", 1)])
wordCounts = pairs.reduceByKey(lambda x, y: x + y)
您可以通過以下任一方式打印 wordCounts 中的值:
print(wordCounts.collect()[:5]) #Pick 5 elements
print(wordCounts.take(5)) #Pick 5 elements
print(sorted(wordCounts.collect())[:5]) #Sort the tuples, and pick the first 5 elements
print(sorted(wordCounts.collect(), key=lambda x: x[1], reverse=False)[:5]) #Sort by the second entry (i.e. count) in ascending order, and pick the first 5 elements
哪個產(chǎn)生
[('a', 2), ('b', 5), ('d', 1), ('e', 1), ('f', 1)]
[('a', 2), ('b', 5), ('d', 1), ('e', 1), ('f', 1)]
[('a', 2), ('b', 5), ('c', 1), ('d', 1), ('e', 1)]
[('d', 1), ('e', 1), ('f', 1), ('c', 1), ('a', 2)]
強(qiáng)烈建議您下次提供一個最小的可重現(xiàn)示例。
添加回答
舉報(bào)