首頁猿問如何將Vector拆分為列 -...

如何將Vector拆分為列 - 使用PySpark

Python

瀟湘沐 2019-07-25 09:53:04

如何將Vector拆分為列 - 使用PySpark上下文：我有DataFrame2列：單詞和向量。其中“vector”的列類型是VectorUDT。一個例子：word | vectorassert | [435,323,324,212...]我希望得到這個：word | v1 | v2 | v3 | v4 | v5 | v6 ......assert | 435 | 5435| 698| 356|....題：如何使用PySpark為每個維度拆分包含多列向量的列？提前致謝

查看完整描述

2 回答

鴻蒙傳說

TA貢獻1865條經(jīng)驗獲得超7個贊

def splitVecotr(df, new_features=['f1','f2']):schema = df.schema
cols = df.columnsfor col in new_features: # new_features should be the same length as vector column length
    schema = schema.add(col,DoubleType(),True)return spark.createDataFrame(df.rdd.map(lambda row: [row[i] for i in cols]+row.features.tolist()), schema)

該函數(shù)將特征向量列轉換為單獨的列

反對回復 2019-07-25