剛開始接觸GraphX,拿到了一份類似用戶關(guān)注的測(cè)試數(shù)據(jù)。用戶ID10000條,對(duì)應(yīng)10000個(gè)節(jié)點(diǎn)。用戶關(guān)系的數(shù)量不明。數(shù)據(jù)格式如下:建圖代碼如下:import org.apache.spark.graphx.Edgeimport org.apache.spark.graphx.Graphval vertexRdd = sc.textFile("hdfs://ubt1:9820/WBNW/Vertex")val edgeRdd = sc.textFile("hdfs://ubt1:9820/WBNW/Edge")val users = vertexRdd.map(line => line.split(",")).map(parts => (parts(0).toLong, parts(1)))val follow_relation = edgeRdd.map(line => line.split(",")).map(parts => new Edge(parts(0).toLong,parts(1).toLong,parts(2).toLong))val graph = Graph(users, follow_relation)val v_count = vertexRdd.countval e_count = edgeRdd.countval gv_count = graph.vertices.countval ge_count = graph.edges.count輸出結(jié)果如下:發(fā)現(xiàn)邊的數(shù)量在Rdd與Graph中相同,點(diǎn)卻明顯不一致。請(qǐng)問是什么原因,謝謝各位。
GraphX構(gòu)建圖的時(shí)候,頂點(diǎn)數(shù)增多了是為什么,求教!
炎炎設(shè)計(jì)
2018-08-22 10:09:49