因此,正如我在Spark Dataframe中所知道的那樣,多個列可以具有相同的名稱,如下面的dataframe快照所示:[Row(a=107831, f=SparseVector(5, {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}), a=107831, f=SparseVector(5, {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0})),Row(a=107831, f=SparseVector(5, {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}), a=125231, f=SparseVector(5, {0: 0.0, 1: 0.0, 2: 0.0047, 3: 0.0, 4: 0.0043})),Row(a=107831, f=SparseVector(5, {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}), a=145831, f=SparseVector(5, {0: 0.0, 1: 0.2356, 2: 0.0036, 3: 0.0, 4: 0.4132})),Row(a=107831, f=SparseVector(5, {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}), a=147031, f=SparseVector(5, {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0})),Row(a=107831, f=SparseVector(5, {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}), a=149231, f=SparseVector(5, {0: 0.0, 1: 0.0032, 2: 0.2451, 3: 0.0, 4: 0.0042}))]上面的結(jié)果是通過將一個數(shù)據(jù)框連接到自身而創(chuàng)建的,您可以看到4同時具有兩個a和的列f。問題是當(dāng)我嘗試使用該a列進(jìn)行更多計(jì)算時,找不到一種方法來選擇a,我嘗試了,df[0]并且df.select('a')都在錯誤信息下方返回了我:AnalysisException: Reference 'a' is ambiguous, could be: a#1333L, a#1335L.無論如何,Spark API中是否可以再次將列與重復(fù)的名稱區(qū)分開?還是某種讓我更改列名的方法?
Spark Dataframe區(qū)分名稱重復(fù)的列
慕田峪4524236
2019-11-28 14:07:46