keras

tf.keras.layers.GRU理解 ######### tf的官方文档，我根据自己的理解加的注释 ########## encoder_inputs = tf.random.normal([32, 10, 8]) #input_shape=[batch_size, encoder_max_len, embeding_dim]:[32,10,8] gru = tf.keras.layers.GRU(4) #return_sequences默认为False，即只返回最后一个单元的output;return_state默认为False,不返回最后一个单元的hidden_state output = gru(inputs) print(output.shape) #只输出最后一个单元的output,所以shape=[32,4] gru = tf.keras.layers.GRU(4, return_sequences=True, return_state=True) #return_sequences=True，返回每一个单元的output，encoder_max_len那么长的序列;return_state=True,返回最后一个单元的hidden_state whole_sequence_output, final_state = gru(inputs) print(whole_sequence_output.shape) #输出每个单元的output,所以encoder_output.shape=[32,10，4] print(final_state.shape) ########## 关于attention接收的是最后一个单元的hidden_state,还是整个序列的hidden_state，还是整个序列的output? ########### 我感觉是没有标准答案的。首选用整个序列的hidden_state，但是老师课上说每个RNN单元的输出就是hidden_state,所以整个序列的hidden_state=整个序列的output（这里是否真的等于我还是怀疑的）；其次采用最后一个单元的hidden_state，毕竟包含了前面的序列信息，应该要扩展维度进行计算。但是，我觉得LSTM、GRU是采用了门控机制，最后一个单元的hidden_state过滤了部分信息，所以个人不建议采用。整个序列的hidden_state=整个序列的output？在GRU里面是肯定的。从下面这个图看得出来######### 在LSTM里面用什么计算attention？ ######### Attention计算是全局的，每个输出单词都要计算一次。另外，LSTM中，h和c，区别主要是，c可以理解成是记忆主线，h可以理解成短时记忆，h是根据当前输入组合产生的门控信号。H是单个的，c是累积的。 c主要是保存前面单元传过来的记忆信息+这个单元要记忆的，起到记忆作用；h主要还是保存这个单元重要信息的。所以LSTM用C计算的更好。

kerasys Hair Clinic System Rinse 是什么东东 您好，请问您是想知道kerasys Hair Clinic System Rinse 是什么东东吗？

猜你想看

kaela raspberry cranberry minimize 清明节手抄报 marko sceo 损失函数荣成熵增定律熵增世界名画微软账户工会法 PEOPLE怎么读 modigliani earise 图灵测试

大家在看

hamada posh koji avcc yammy proposes lingos lingoes mojave vimicro pentile wannacry veggie veggieg serto turnup netants turnto

keras

tf.keras.layers.GRU理解

kerasys Hair Clinic System Rinse 是什么东东

猜你想看

大家在看