第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

<u id="3hnhh"><form id="3hnhh"><dl id="3hnhh"></dl></form></u>

<rt id="3hnhh"><menu id="3hnhh"></menu></rt>

<rt id="3hnhh"></rt>

我的購物車

已加入門課程

購物車里空空如也

快去這里選購你中意的課程

實戰(zhàn)課

我的訂單中心

全部開發(fā)者教程

TensorFlow 入門教程

TensorFlow 簡介、安裝與快速入門

TensorFlow 簡介 TensorFlow 安裝 - CPU TensorFlow 安裝 - GPU TensorFlow 快速入門示例

TensorFlow 模型的簡潔表示-Keras

Keras 簡介使用 tf.keras 進行圖片分類使用 Keras 進行文本分類使用 Keras 進行回歸在 Keras 中保存與加載模型在 Keras 中進行模型的評估 Keras 中的Masking 與 Padding

TensorFlow 中的數(shù)據(jù)格式

TensorFlow 中的數(shù)據(jù)核心使用 TensorFlow 加載 CSV 數(shù)據(jù) 使用 TensorFlow 加載 Numpy 數(shù)據(jù) 使用 TF 加載 DateFrame 數(shù)據(jù) 使用圖像數(shù)據(jù)來訓練模型在 TensorFlow 之中使用文本數(shù)據(jù) TF 之中的 Unicode 數(shù)據(jù)格式的處理

TensorFlow模型的高級表示-Estimat

使用預(yù)設(shè)的 Estimator 模型將Keras模型轉(zhuǎn)化為Estimator模型 Estimator實現(xiàn)BoostingTree模型

TensorFlow 高級技巧

過擬合問題 TensorFlow 中的回調(diào)函數(shù) 文本數(shù)據(jù)嵌入在 TensorFlow 之中使用卷積神經(jīng)網(wǎng)絡(luò) 在 TensorFlow 之中使用循環(huán)神經(jīng)網(wǎng)絡(luò) 在 TensorFlow 之中使用注意力模型在 TensorFlow 之中進行遷移學習在 TensorFlow 之中進行數(shù)據(jù)增強在 TensorFlow 之中進行圖像分割如何進行多 GPU 的分布式訓練？使用 tf.function 提升效率使用 TF HUB 進行模型復(fù)用

TensorFlow高級技巧-自定義

使用 TensorFlow 進行微分操作在 TensorFlow 之中自定義網(wǎng)絡(luò)層與模型在 TensorFlow 之中自定義訓練

TF 框架中的可視化工具-TensorBoard

TensorBoard 的簡介與快速上手使用 TensorBoard 記錄訓練中的各項指標在 TensorBoard 之中查看模型結(jié)構(gòu)圖在 TensorBoard 之中記錄圖片數(shù)據(jù)

首頁慕課教程 TensorFlow 入門教程使用 TensorFlow 加載 CSV 數(shù)據(jù)

夜流歌 · 更新于 2020-09-28

上一節(jié)

TensorFlow 中的數(shù)據(jù)核心

使用 TensorFlow 加載 Numpy 數(shù)據(jù)

下一節(jié)

使用 TensorFlow 加載 CSV 數(shù)據(jù)

在機器學習相關(guān)的任務(wù)之中，我們最常用的數(shù)據(jù)集合的格式就是 CSV 格式了，因此我們不僅僅要對CSV格式文件有所了解，同時也要學會如何在 TensorFlo w之中使用 CSV 數(shù)據(jù)。

1. 認識 CSV 數(shù)據(jù)格式

CSV 文件，全稱叫做“逗號分隔值文件”，文件后綴為“.csv”，它是一種表格文件；與 Excel 等文件不同的是，它是以純文本表示的表格文件，而單元格之間用逗號分隔，因此被稱作逗號分隔值文件。

CSV 文件大體可以分為兩個部分：

列名部分；
數(shù)據(jù)部分。

比如以下CSV文件：

      a   b   c           # 列名部分
0    1   'a'   89         # 數(shù)據(jù)部分
1    3   'f'   88
2    8   'g'   99

該 CSV 文件一共包含三條數(shù)據(jù)，每條數(shù)據(jù)包括 a、b、c 三個字段，而其中 a 和 c 字段是整數(shù)，而 b 字段是字符串。

在實際的應(yīng)用之中，一般的 CSV 數(shù)據(jù)會包含很多冗余的數(shù)據(jù)，我們會根據(jù)自己的需要來選擇我們所需要的數(shù)據(jù)字段，從而進行下一步的工作。

2. 如何在 TensorFlow 之中使用 CSV 數(shù)據(jù)

要在 TensorFlow 之中使用 CSV 數(shù)據(jù)進行訓練的話，我們大致需要經(jīng)過三個步驟：

獲取 CSV 數(shù)據(jù)文件；
將 csv 文件數(shù)據(jù)構(gòu)建為 TensorFlow 中的 dataset 格式；
對數(shù)據(jù)集合進行進一步的處理以便符合模型輸入的需求。

2.1 獲取 CSV 數(shù)據(jù)文件

這一步中我們要首先獲取到數(shù)據(jù)文件，獲取的方式各不相同。

如果要使用 TensorFlow 內(nèi)部的函數(shù) API 進行網(wǎng)絡(luò) CSV 數(shù)據(jù)文件的獲取，則我們可以通過以下 API 來實現(xiàn)數(shù)據(jù)集合的獲?。?/p>

file_path = tf.keras.utils.get_file("data.csv", DATA_URL)

其中第一個參數(shù)表示的是獲取的數(shù)據(jù)文件所保存的名字，而第二個參數(shù) DATA_URL 表示的是網(wǎng)絡(luò) CSV 文件的地址。同時該函數(shù)會將本地保存文件的目錄返回。

2.2 將 csv 文件數(shù)據(jù)構(gòu)建為 TensorFlow 中的 tf.dataset 格式

在這一步之中，我們需要使用到 TensorFlow 中的 API 函數(shù)來將 csv 格式的數(shù)據(jù)轉(zhuǎn)化為 TensorFlow 可以理解的數(shù)據(jù)形式，具體來說，我們可以通過以下API實現(xiàn)：

dataset = tf.data.experimental.make_csv_dataset(
      file_path, batch_size, label_name, na_value, num_epochs
    )

該API之中的幾個參數(shù)需要我們有所了解：

file_path：CSV數(shù)據(jù)文件的路徑；
batch_size：我們要劃分數(shù)據(jù)集合的批次大?。?/li>
label_name：我們要進行預(yù)測的列；
na_value：該API會將文件中的空白值替換為 na_value ；
num_epochs：重復(fù)讀取該數(shù)據(jù)集合的數(shù)量，通常設(shè)置為 1，因為我們只需要讀取一遍數(shù)據(jù)集即可。

2.3 對數(shù)據(jù)集合進行進一步的處理以便符合模型輸入的需求

在該步驟之中，我們會對數(shù)據(jù)進行預(yù)處理，我們常見的預(yù)處理包括：

過濾掉無用數(shù)據(jù)；
指定數(shù)據(jù)的規(guī)模；
指定該數(shù)據(jù)中的離散值與連續(xù)值；
將離散值離散化。

這些數(shù)據(jù)的預(yù)處理方法我們會在后面的示例中實踐到。

3. CSV文件數(shù)據(jù)處理示例

在該示例之中，我們會采用公開數(shù)據(jù)集合：泰坦尼克生存數(shù)據(jù)集。該數(shù)據(jù)集合以CSV格式保存，同時也被 TensorFlow 納入為內(nèi)置數(shù)據(jù)集，因此使用較為方便。

該數(shù)據(jù)集合中包含了每個乘客的基本信息，比如名字、性別、年齡等，同時也包含了改乘客是否生還。我們所訓練的模型就是要根據(jù)乘客的基本信息來判斷改乘客最終是否生還。

import tensorflow as tf

# 獲取數(shù)據(jù)集
train_path = tf.keras.utils.get_file("train.csv",
            "https://storage.googleapis.com/tf-datasets/titanic/train.csv")
valid_path = tf.keras.utils.get_file("eval.csv",
            "https://storage.googleapis.com/tf-datasets/titanic/eval.csv")

# 通過CSV文件構(gòu)建數(shù)據(jù)集
train_data = tf.data.experimental.make_csv_dataset(train_path, batch_size=32,
        label_name='survived', na_value="?", num_epochs=1,
        ignore_errors=True )
valid_data = tf.data.experimental.make_csv_dataset(valid_path, batch_size=32,
        label_name='survived', na_value="?", num_epochs=1,
        ignore_errors=True )

# 定義離散列并進行處理
caetogries = {'sex': ['male', 'female'],
    'class' : ['First', 'Second', 'Third'],
    'deck' : ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
    'embark_town' : ['Cherbourg', 'Southhampton', 'Queenstown'],
    'alone' : ['y', 'n']}
categorical_columns = []
for feature, vocab in caetogries.items():
    cat_col = tf.feature_column.categorical_column_with_vocabulary_list(
        key=feature, vocabulary_list=vocab)
    categorical_columns.append(tf.feature_column.indicator_column(cat_col))

# 定義連續(xù)列
numerical_names = {'age',
    'n_siblings_spouses',
    'parch',
    'fare'}
numerical_columns = []
for feature in numerical_names:
    num_col = tf.feature_column.numeric_column(feature)
    numerical_columns.append(num_col)

# 定義模型
model = tf.keras.Sequential([
    tf.keras.layers.DenseFeatures(categorical_columns + numerical_columns),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid'),
])

model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy'])

# 模型訓練
model.fit(train_data, epochs=20)

# 模型評估
model.evaluate(valid_data)

在該示例之中，我們首先獲取了 CSV 數(shù)據(jù)集合并且構(gòu)建了 dataset。然后我們便進行了至關(guān)重要的一步，我們對于離散數(shù)據(jù)和連續(xù)數(shù)據(jù)進行了預(yù)處理。

對于離散數(shù)據(jù)，我們進行了以下幾步處理：

定義離散列以及其可能的值（我們稱作字典）；
使用 tf.feature_column.categorical_column_with_vocabulary_list 這個函數(shù) API 構(gòu)建 TensorFlow 能夠識別的離散列；
將所有的離散列合在一起以待后面使用。

對于連續(xù)數(shù)據(jù)，我們同樣進行了以下幾步處理：

定義連續(xù)列；
使用 tf.feature_column.numeric_column 這個函數(shù) API 構(gòu)建 TensorFlow 能夠識別的連續(xù)列；
將所有的連續(xù)列合在一起以待后面使用。

最后在模型的第一層，我們添加了一個特征的預(yù)處理層，因為我們輸入的特征有的是連續(xù)值，有些是離散值，因此我們需要該層進行特征的預(yù)處理。

最后我們得到的輸出為：

Epoch 1/20
20/20 [==============================] - 0s 2ms/step - loss: 0.7948 - accuracy: 0.6459
......
Epoch 19/20
20/20 [==============================] - 0s 2ms/step - loss: 0.4619 - accuracy: 0.8022
Epoch 20/20
20/20 [==============================] - 0s 2ms/step - loss: 0.4661 - accuracy: 0.7990
9/9 [==============================] - 0s 2ms/step - loss: 0.4571 - accuracy: 0.7841
[0.45707935094833374, 0.7840909361839294]

于是我們可以看到，我們的模型最終在測試集合上達到了78.4%的準確率。

4. 小結(jié)

在該節(jié)課之中，我們學習了 CSV 的數(shù)據(jù)文件格式，同時了解了在 TensorFlow 之中如何處理該格式的數(shù)據(jù)。同時，我們也使用泰坦尼克數(shù)據(jù)集進行了實踐，最終達到了一個不錯的效果。

圖片描述

上一節(jié)

TensorFlow 中的數(shù)據(jù)核心

下一節(jié)

使用 TensorFlow 加載 Numpy 數(shù)據(jù)

我要提出意見反饋

索引目錄

使用 TensorFlow 加載 CSV 數(shù)據(jù)

1. 認識 CSV 數(shù)據(jù)格式

2. 如何在 TensorFlow 之中使用 CSV 數(shù)據(jù)

2.1 獲取 CSV 數(shù)據(jù)文件

2.2 將 csv 文件數(shù)據(jù)構(gòu)建為 TensorFlow 中的 tf.dataset 格式

2.3 對數(shù)據(jù)集合進行進一步的處理以便符合模型輸入的需求

3. CSV文件數(shù)據(jù)處理示例

4. 小結(jié)

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學習伙伴

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

<legend id="k5cmp"><small id="k5cmp"></small></legend>