寫文章

首頁手記基于Tensorflow的LSTM-CNN文本分類模型

基于Tensorflow的LSTM-CNN文本分類模型

標簽：

深度學習

题记

前段时间再看QA方面的文章，读了一篇paper（《LSTM-based deep learning model for non-factoid answer selection》）中，使用了LSTM-CNN模型来做answer与question的语义抽取。受此启发，使用这个模型对文本语义信息进行抽取，加上一个softmax函数形成文本分类模型。

1.LSTM(Long Short-Term Memory)

LSTM在NLP中的应用实在太广泛了，在Machine Translation，Text Classification，QA等领域都有着成熟的应用，具体通过对RNN的结构进行改进，加入Memory Cell与三个门控单元，对历史信息进行有效的控制。而不是像RNN一样每次都将前一时刻的hidden state完全洗掉，从而增强了其处理长文本序列的能力，也解决了vanishing gradient的问题。

具体结构如图所示：

Input Gate决定当前时刻LSTM单元的Input vector对memory cell中信息的改变量，Forget Gate决定上一时刻历史信息对当前时刻memory cell中的信息的影响程度，Output Gate对memory cell中信息的输出量进行控制。

将Input Gate，Output Gate，Forget Gate表示为：，，，LSTM更新方法为：

当前时刻的cell state 为，为LSTM单元最终输出，使用sigmoid function 作为activation function，，，为LSTM的权重矩阵与偏置量。

2.CNN(Convolutional Neural Network)

CNN的结构类似Yoon Kim在《Convolutional neural networks for sentence classification》中提出的结构。

其中，卷积窗口的大小设置对最终的分类结果影响较大，借鉴N-gram语言模型的思想，通过提取相邻n个词进行局部特征的提取，从而捕捉上下文搭配词语的语义信息，对整个文本的语义进行表示。根据这种思想，将卷积窗口大小设置为n*m，n为窗口内词的个数，m为词向量维度。

同时使用多个卷积核生成feature maps，再进行max pooling操作，最后使用sotfmax进行分类。

3.LSTM-CNN Model

首先通过Embedding Layer将单词转化为词向量，再输入LSTM进行语义特征提取，由于原始语料处理时进行了padding的操作，所以在LSTM输出时乘以MASK矩阵来减小padding所带来的影响。下一步将LSTM的输出作为CNN的输入，进行进一步的特征提取。最后得到分类结果。

整个模型的结构如下：

4.代码

class LSTM_CNN_Model(object):



    def __init__(self,config,is_training=True):

        self.keep_prob=config.keep_prob
        self.batch_size = 64

        num_step=config.num_step
        self.input_data=tf.placeholder(tf.int32,[None,num_step])
        self.target = tf.placeholder(tf.int64,[None])
        self.mask_x = tf.placeholder(tf.float32,[num_step,None])

        class_num=config.class_num
        hidden_neural_size=config.hidden_neural_size
        vocabulary_size=config.vocabulary_size
        embed_dim=config.embed_dim
        hidden_layer_num=config.hidden_layer_num
 

        #build LSTM network

        lstm_cell = tf.contrib.rnn.BasicLSTMCell(hidden_neural_size,forget_bias=0.0,state_is_tuple=True)
        if self.keep_prob<1:
            lstm_cell =  tf.contrib.rnn.DropoutWrapper(
                lstm_cell,output_keep_prob=self.keep_prob
            )

        cell = tf.contrib.rnn.MultiRNNCell([lstm_cell]*hidden_layer_num,state_is_tuple=True)

        self._initial_state = cell.zero_state(self.batch_size,tf.float32)

        #embedding layer
        with tf.device("/cpu:0"),tf.name_scope("embedding_layer"):
            embedding = tf.get_variable("embedding",[vocabulary_size,embed_dim],dtype=tf.float32)
            inputs=tf.nn.embedding_lookup(embedding,self.input_data)

        if self.keep_prob<1:
            inputs = tf.nn.dropout(inputs,self.keep_prob)

        out_put=[]
        state=self._initial_state
        with tf.variable_scope("LSTM_layer"):
            for time_step in range(num_step):
                if time_step>0: tf.get_variable_scope().reuse_variables()
                (cell_output,state)=cell(inputs[:,time_step,:],state)
                out_put.append(cell_output)

        out_put=out_put*self.mask_x[:,:,None]

        with tf.name_scope("Conv_layer"):
            out_put = tf.transpose(out_put,[1,2,0])
            out_put = tf.reshape(out_put , [self.batch_size,hidden_neural_size,num_step,-1])

            W_conv = tf.get_variable(name="conv_w" , initializer=tf.truncated_normal(shape=[600,5,1,200],stddev=0.1))
            B_conv = tf.get_variable(name="conv_b", initializer=tf.constant(0.1,shape=[200]))

            conv_output = tf.nn.relu(tf.nn.conv2d(out_put , W_conv , strides=[1,1,1,1],padding='VALID') + B_conv)
            conv_output = tf.reshape(conv_output,[self.batch_size,36,200,1])
            max_pool_out = tf.nn.max_pool(conv_output,ksize=[1,36,1,1],strides=[1,1,1,1],padding='VALID')
            max_pool_out = tf.reshape(max_pool_out,[self.batch_size,200])


        with tf.name_scope("Softmax_layer_and_output"):
            softmax_w = tf.get_variable("softmax_w",[200,class_num],dtype=tf.float32)
            softmax_b = tf.get_variable("softmax_b",[class_num],dtype=tf.float32)
            self.logits = tf.matmul(max_pool_out,softmax_w)+softmax_b

        with tf.name_scope("loss"):
            self.loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=self.logits+1e-10,labels=self.target)
            self.cost = tf.reduce_mean(self.loss)

        with tf.name_scope("accuracy"):
            self.prediction = tf.argmax(self.logits,1)
            correct_prediction = tf.equal(self.prediction,self.target)
            self.correct_num=tf.reduce_sum(tf.cast(correct_prediction,tf.float32))
            self.accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32),name="accuracy")

        #add summary
        loss_summary = tf.summary.scalar("loss",self.cost)
        #add summary
        accuracy_summary=tf.summary.scalar("accuracy_summary",self.accuracy)

        if not is_training:
            return

        self.globle_step = tf.Variable(tf.constant(0),dtype=tf.int32,name="globle_step",trainable=False)
        self.lr = tf.Variable(tf.constant(0.8),dtype=tf.float32,trainable=False)

        tvars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars),
                                      config.max_grad_norm)


        # Keep track of gradient values and sparsity (optional)
        grad_summaries = []
        for g, v in zip(grads, tvars):
            if g is not None:
                grad_hist_summary = tf.summary.histogram("{}/grad/hist".format(v.name), g)
                sparsity_summary = tf.summary.scalar("{}/grad/sparsity".format(v.name), tf.nn.zero_fraction(g))
                grad_summaries.append(grad_hist_summary)
                grad_summaries.append(sparsity_summary)
        self.grad_summaries_merged = tf.summary.merge(grad_summaries)

        self.summary =tf.summary.merge([loss_summary,accuracy_summary,self.grad_summaries_merged])

        optimizer = tf.train.GradientDescentOptimizer(self.lr)
        optimizer.apply_gradients(zip(grads, tvars))
        self.train_op=optimizer.apply_gradients(zip(grads, tvars))

        self.new_lr = tf.placeholder(tf.float32,shape=[],name="new_learning_rate")
        self._lr_update = tf.assign(self.lr,self.new_lr)

    def assign_new_lr(self,session,lr_value):
        session.run(self._lr_update,feed_dict={self.new_lr:lr_value})

5.实验结果

实验环境如下

GPU：NVIDIA GeForce GTX 1080

OS：Ubuntu 16.04

开发环境：Anaconda 2.3.1，TensorFlow 1.5.0rc1

实验结果如下

训练集准确率：

Loss Function：

最终在测试集和验证集上准确率分别为：87.31%，91.17%

相较LSTM模型提高4%~5%

最后附上源代码：https://github.com/zjrn/LSTM-CNN_CLASSIFICATION

https://github.com/zjrn/LSTM-CNN_CLASSIFICATION.git

https://github.com/zjrn/LSTM-CNN_CLASSIFICATION.git

原文出处

點擊查看更多內(nèi)容

2人點贊

評論

共同學習，寫下你的評論

評論加載中...

作者其他優(yōu)質(zhì)文章

正在加載中

慕森卡

手記

篇

粉絲

121

獲贊與收藏

529

關(guān)注作者，訂閱最新文章

閱讀免費教程

今天注冊有機會得

100積分直接送

付費專欄免費學

大額優(yōu)惠券免費領(lǐng)

立即參與放棄機會

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學習伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空