第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

<tt id="rezyf"><abbr id="rezyf"><i id="rezyf"></i></abbr></tt>

我的購(gòu)物車

已加入門(mén)課程

購(gòu)物車?yán)锟湛杖缫?/h3>
快去這里選購(gòu)你中意的課程

實(shí)戰(zhàn)課

體系課

我的訂單中心

去購(gòu)物車

全部開(kāi)發(fā)者教程

TensorFlow 入門(mén)教程

TensorFlow 簡(jiǎn)介、安裝與快速入門(mén)

TensorFlow 簡(jiǎn)介 TensorFlow 安裝 - CPU TensorFlow 安裝 - GPU TensorFlow 快速入門(mén)示例

TensorFlow 模型的簡(jiǎn)潔表示-Keras

Keras 簡(jiǎn)介使用 tf.keras 進(jìn)行圖片分類使用 Keras 進(jìn)行文本分類使用 Keras 進(jìn)行回歸在 Keras 中保存與加載模型在 Keras 中進(jìn)行模型的評(píng)估 Keras 中的Masking 與 Padding

TensorFlow 中的數(shù)據(jù)格式

TensorFlow 中的數(shù)據(jù)核心使用 TensorFlow 加載 CSV 數(shù)據(jù) 使用 TensorFlow 加載 Numpy 數(shù)據(jù) 使用 TF 加載 DateFrame 數(shù)據(jù) 使用圖像數(shù)據(jù)來(lái)訓(xùn)練模型在 TensorFlow 之中使用文本數(shù)據(jù) TF 之中的 Unicode 數(shù)據(jù)格式的處理

TensorFlow模型的高級(jí)表示-Estimat

使用預(yù)設(shè)的 Estimator 模型將Keras模型轉(zhuǎn)化為Estimator模型 Estimator實(shí)現(xiàn)BoostingTree模型

TensorFlow 高級(jí)技巧

過(guò)擬合問(wèn)題 TensorFlow 中的回調(diào)函數(shù) 文本數(shù)據(jù)嵌入在 TensorFlow 之中使用卷積神經(jīng)網(wǎng)絡(luò) 在 TensorFlow 之中使用循環(huán)神經(jīng)網(wǎng)絡(luò) 在 TensorFlow 之中使用注意力模型在 TensorFlow 之中進(jìn)行遷移學(xué)習(xí) 在 TensorFlow 之中進(jìn)行數(shù)據(jù)增強(qiáng) 在 TensorFlow 之中進(jìn)行圖像分割如何進(jìn)行多 GPU 的分布式訓(xùn)練？使用 tf.function 提升效率使用 TF HUB 進(jìn)行模型復(fù)用

TensorFlow高級(jí)技巧-自定義

使用 TensorFlow 進(jìn)行微分操作在 TensorFlow 之中自定義網(wǎng)絡(luò)層與模型在 TensorFlow 之中自定義訓(xùn)練

TF 框架中的可視化工具-TensorBoard

TensorBoard 的簡(jiǎn)介與快速上手使用 TensorBoard 記錄訓(xùn)練中的各項(xiàng)指標(biāo) 在 TensorBoard 之中查看模型結(jié)構(gòu)圖在 TensorBoard 之中記錄圖片數(shù)據(jù)

首頁(yè) 慕課教程 TensorFlow 入門(mén)教程在 TensorFlow 之中使用文本數(shù)據(jù)

夜流歌 · 更新于 2020-10-16

上一節(jié)

使用圖像數(shù)據(jù)來(lái)訓(xùn)練模型

TF 之中的 Unicode 數(shù)據(jù)格式的處理

下一節(jié)

在 TensorFlow 之中使用文本數(shù)據(jù)

在之前的學(xué)習(xí)之中，我們?cè)?jīng)學(xué)習(xí)過(guò)如何進(jìn)行文本分類，但是歸根結(jié)底我們都是采用 TensorFlow 內(nèi)置的 API 來(lái)直接獲取數(shù)據(jù)集的 Dataset ，而沒(méi)有真正的從文本文件中加載數(shù)據(jù)集。

在實(shí)際的機(jī)器學(xué)習(xí)任務(wù)之中，我們的數(shù)據(jù)集不可能每個(gè)都由 TensorFlow 提供，大多數(shù)的數(shù)據(jù)都是我們自行加載的。而對(duì)于文本數(shù)據(jù)，我們使用最多的數(shù)據(jù)格式就是 txt 數(shù)據(jù)格式，因此這節(jié)課我們來(lái)學(xué)習(xí)如何從文本文件中使用文本數(shù)據(jù)。

要使用文本數(shù)據(jù)，我們大致可以分為兩個(gè)步驟：

使用 tf.data.TextLineDataset 加載文本數(shù)據(jù)；
使用編碼將數(shù)據(jù)進(jìn)行編碼。

1. 使用 tf.data.TextLineDataset 加載文本數(shù)據(jù)

在 TensorFlow 之中加載文本數(shù)據(jù)最常用的方式就是采用 TensorFlow 中的內(nèi)置函數(shù)使用 tf.data.TextLineDataset 加載文本數(shù)據(jù)進(jìn)行加載。

由于該 API 的存在，在 TensorFlow 之中加載數(shù)據(jù)變得非常簡(jiǎn)單、快捷。

在這里，我們先使用谷歌倉(cāng)庫(kù)中的 txt 作為一個(gè)示例，大家可以使用自己的 txt 文件進(jìn)行測(cè)試。

import tensorflow as tf
import os

txt_path = tf.keras.utils.get_file('derby.txt', origin='https://storage.googleapis.com/download.tensorflow.org/data/illiad/derby.txt')

dataset = tf.data.TextLineDataset(txt_path).map(lambda x: (x, 0))
dataset.shuffle(1000).batch(32)

print(dataset)
for data in labeled_dataset.take(4):
  print(data)

在這里，我們要注意以下幾點(diǎn)：

首先我們使用 tf.data.TextLineDataset 函數(shù)來(lái)加載 txt 文件，該函數(shù)會(huì)將其自動(dòng)轉(zhuǎn)化為 tf.data.Dataset 對(duì)象；
然后我們對(duì)每條數(shù)據(jù)進(jìn)行了映射處理，因?yàn)閿?shù)據(jù)集需要含有標(biāo)簽，而我們的 txt 不含標(biāo)簽，因此我們使用 0 作為暫時(shí)的標(biāo)簽；
再者我們使用 shuffle 對(duì)數(shù)據(jù)集進(jìn)行了隨機(jī)化處理，然后又進(jìn)行了分批的處理，這里的批大小為 32 ；
最后我們查看了前四條數(shù)據(jù)。

于是我們可以得到結(jié)果：

<MapDataset shapes: ((), ()), types: (tf.string, tf.int32)>
(<tf.Tensor: shape=(), dtype=string, numpy=b"\xef\xbb\xbfOf Peleus' son, Achilles, sing, O Muse,">, <tf.Tensor: shape=(), dtype=int32, numpy=0>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'The vengeance, deep and deadly; whence to Greece'>, <tf.Tensor: shape=(), dtype=int32, numpy=0>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'Unnumbered ills arose; which many a soul'>, <tf.Tensor: shape=(), dtype=int32, numpy=0>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'Of mighty warriors to the viewless shades'>, <tf.Tensor: shape=(), dtype=int32, numpy=0>)

可以發(fā)現(xiàn)，我們已經(jīng)成功創(chuàng)建了數(shù)據(jù)集，但是沒(méi)有進(jìn)行編碼處理，這顯然是不適合直接進(jìn)行機(jī)器學(xué)習(xí)的。

2. 使用編碼將數(shù)據(jù)進(jìn)行編碼

我們可以使用 tensorflow_dataset.features.text.Tokenizer 對(duì)象進(jìn)行編碼處理，該對(duì)象能夠?qū)⒔邮盏降木渥舆M(jìn)行編碼。同時(shí)，我們可以通過(guò) tensorflow_dataset.features.text.TokenTextEncoder 函數(shù)進(jìn)行編碼器的構(gòu)建。

import tensorflow_datasets as tfds

tokenizer = tfds.features.text.Tokenizer()

vocab = set()
for text, l in dataset:
  token = tokenizer.tokenize(text.numpy())
  vocab.update(token)

print(len(vocab))

于是我們可以得到輸出：

然后我們可以進(jìn)行編碼操作（以下映射方式參考于 TensorFlow 官方文檔）：

# 定義編碼器
encoder = tfds.features.text.TokenTextEncoder(vocab)

def encode(text, label):
  encoded_text = encoder.encode(text.numpy())
  return encoded_text, label

# 使用tf.py_function進(jìn)行映射
def encode_map_fn(text, label):
  encoded_text, label = tf.py_function(encode, inp=[text, label], Tout=(tf.int32, tf.int32))

  # 手動(dòng)設(shè)置形狀Shape
  encoded_text.set_shape([None])
  label.set_shape([])

  return encoded_text, label

# 進(jìn)行編碼處理
encoded_data_set = dataset.map(encode_map_fn)
print(encoded_data_set)
for data in encoded_data_set.take(4):
  print(data)

在這里，我們進(jìn)行了以下幾步操作：

我們首先使用 tfds.features.text.TokenTextEncoder 對(duì)象構(gòu)造了編碼器；
然后我們對(duì)每個(gè)數(shù)據(jù)進(jìn)行了映射處理；
在每個(gè)映射操作之中，我們使用 tf.py_function 函數(shù)進(jìn)行映射操作；這是因?yàn)?，如果?map 函數(shù)之中調(diào)用 Tensor.numpy() 函數(shù)會(huì)報(bào)錯(cuò)，因此需要使用 tf.py_function 進(jìn)行映射操作；
最后，因?yàn)?tf.py_function 不會(huì)設(shè)置數(shù)據(jù)的形狀 Shape ，因此我們需要手動(dòng)設(shè)置 Shape 。

于是，我們可以得到輸出：

<MapDataset shapes: ((None,), ()), types: (tf.int32, tf.int32)>
(<tf.Tensor: shape=(7,), dtype=int32, numpy=array([7755, 4839, 4383, 5722, 4996, 2065, 8059], dtype=int32)>, <tf.Tensor: shape=(), dtype=int32, numpy=0>)
(<tf.Tensor: shape=(8,), dtype=int32, numpy=array([ 855, 5184,  700, 8356, 5931, 5665, 4634, 7127], dtype=int32)>, <tf.Tensor: shape=(), dtype=int32, numpy=0>)
(<tf.Tensor: shape=(7,), dtype=int32, numpy=array([1620, 6817, 5649, 5461, 5505,  209, 3146], dtype=int32)>, <tf.Tensor: shape=(), dtype=int32, numpy=0>)
(<tf.Tensor: shape=(7,), dtype=int32, numpy=array([7755, 1810, 3656, 4634, 4920, 1136, 6789], dtype=int32)>, <tf.Tensor: shape=(), dtype=int32, numpy=0>)

于是我們可以發(fā)現(xiàn)，我們的數(shù)據(jù)集已經(jīng)成功編碼，現(xiàn)在可以便可以使用該數(shù)據(jù)集進(jìn)行模型的訓(xùn)練了。

3. 小結(jié)

在這節(jié)課之中，我們學(xué)習(xí)了如何在 TensorFlow 之中使用文本數(shù)據(jù)?？傮w而言，在大多數(shù)的學(xué)習(xí)任務(wù)之中都需要我們手動(dòng)載入文本數(shù)據(jù)，我們一方面可以通過(guò) tf.data.TextLineDataset 加載文本數(shù)據(jù)，另外一方面我們需要使用 tensorflow_dataset.features.text.Tokenizer 進(jìn)行文本的編碼處理。

圖片描述

上一節(jié)

使用圖像數(shù)據(jù)來(lái)訓(xùn)練模型

下一節(jié)

TF 之中的 Unicode 數(shù)據(jù)格式的處理

我要提出意見(jiàn)反饋

索引目錄

在 TensorFlow 之中使用文本數(shù)據(jù)

1. 使用 tf.data.TextLineDataset 加載文本數(shù)據(jù)

2. 使用編碼將數(shù)據(jù)進(jìn)行編碼

3. 小結(jié)

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

<samp id="odp82"><b id="odp82"><noframes id="odp82"></noframes></b></samp>

<var id="odp82"><source id="odp82"></source></var>