首頁猿問向 DataFrame...

向 DataFrame 添加一個新列，其文字值類型為 set

Java

動漫人物 2021-12-10 16:51:34

Map<File, Dataset<Row> allWords = ...StructField[] structFields = new StructField[] { new StructField("word", DataTypes.StringType, false, Metadata.empty()), new StructField("count", DataTypes.IntegerType, false, Metadata.empty()), new StructField("files", ???, false, Metadata.empty())};StructType structType = new StructType(structFields);Dataset<Row> allFilesWords = spark.createDataFrame(new ArrayList<>(), structType);for (Map.Entry<File, Dataset<Row>> entry : allWords.entrySet()) { Integer fileIndex = files.indexOf(entry.getKey()); allFilesWords.unionAll( allWords.get(entry.getKey()).withColumn("files", ???) );}在上面給定的代碼中，allWords表示從文件到其字數(shù) ( Row: (string, integer))的映射?，F(xiàn)在，我想將所有文件的結果聚合到一個 DataFrame 中，同時保留提到該單詞的原始文件。由于最后，每個單詞可能在多個文件中都被提到過，因此該files列設計為整數(shù)類型集（假設文件被映射為整數(shù)）。現(xiàn)在，我正在嘗試向allWordsDataFrame添加一個新列，然后使用unionAll將它們?nèi)亢喜⒃谝黄?。但我不知道如何files使用僅包含一個 item 的 set來定義和初始化新列（此處命名）fileIndex。感謝評論中提供的鏈接，我知道我應該使用functions.typedLit但此函數(shù)要求提供第二個參數(shù)，我不知道該提供什么。另外，我不知道如何定義列。最后一件事，提供的鏈接是 Python 中的，我正在尋找 Java API。

查看完整描述

1 回答

慕俠2389804

TA貢獻1719條經(jīng)驗獲得超6個贊

我自己找到了解決方案（在一些幫助下）：

Map<File, Dataset<Row> allWords = ...

StructField[] structFields = new StructField[] {

new StructField("word", DataTypes.StringType, false, Metadata.empty()),

new StructField("count", DataTypes.IntegerType, false, Metadata.empty()),

new StructField("files", DataTypes.createArrayType(DataTypes.IntegerType), true, Metadata.empty())

};

StructType structType = new StructType(structFields);

Dataset<Row> allFilesWords = spark.createDataFrame(new ArrayList<>(), structType);

for (Map.Entry<File, Dataset<Row>> entry : allWords.entrySet()) {

Integer fileIndex = files.indexOf(entry.getKey());

allFilesWords.unionAll(

allWords.get(entry.getKey())

.withColumn("files", functions.typedLit(seq, MyTypeTags.SeqInteger()))

);

}

問題是這TypeTag是來自 Scala 的編譯時工件，根據(jù)我在另一個問題中得到的內(nèi)容，它需要由 Scala 編譯器生成，而您無法在 Java 中生成一個。因此，我必須TypeTag在 Scala 文件中編寫自定義數(shù)據(jù)結構并將其添加到我的 Maven Java 項目中。為此，我關注了這篇文章。

這是我的MyTypeTags.scala文件：

import scala.reflect.runtime.universe._

object MyTypeTags {

val SeqInteger = typeTag[Seq[Integer]]

}

反對回復 2021-12-10

1 回答
0 關注
239 瀏覽

關注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

向 DataFrame 添加一個新列，其文字值類型為 set

向 DataFrame 添加一個新列，其文字值類型為 set

1 回答

添加回答

向 DataFrame 添加一個新列，其文字值類型為 set

向 DataFrame 添加一個新列，其文字值類型為 set