亚欧色一区w666天堂,色情一区二区三区免费看,少妇特黄A片一区二区三区,亚洲人成网站999久久久综合,国产av熟女一区二区三区

  • 發布文章
  • 消息中心
點贊
收藏
評論
分享
原創

Skip Deep LSTM(關于LSTM的trick)

2023-02-23 02:51:45
11
0

Skip Deep LSTM(關于LSTM的trick)

首先,網(wang)絡(luo)深度對模型性能影響極為關鍵

但是,在(zai)LSTM的應用中(zhong),通常堆疊三層以上的LSTM訓練困(kun)難(nan),存在(zai)梯度消失或爆炸(zha)的問題

因此,借鑒GNMT(谷歌翻(fan)譯系統)的思想(xiang),提出一(yi)種基(ji)于稠密跳躍連接的深度LSTM(Skip Deep LSTM)

實驗表明,在(zai)圖(tu)像(xiang)理解(jie)任務(wu)(wu)上(shang),訓練(lian)loss優于(yu)傳統LSTM,同時,在(zai)時序預測等(deng)任務(wu)(wu)(發電預測等(deng))上(shang),該模型設(she)計(ji)方(fang)式優于(yu)常(chang)用LSTM。

模(mo)型(xing)設(she)計參考了GNMT思想(xiang),第一層為(wei)雙向長短(duan)時記憶網絡(luo)(BiLSTM),深度為(wei)5-7層效果最(zui)佳,在image caption 和(he)時序預測問題上優(you)于常(chang)用LSTM。

核心代碼如下(xia)(基于tf2實(shi)現(xian)):

 


# 基于稠密連接的深度LSTM,可根據實驗情況搭建不同尺度連接和網絡層數,目前5-7層效果最佳
bi1 = (Bidirectional(LSTM(64, return_sequences=True)))(se2)
bi2 = LSTM(128, return_sequences=True)(bi1)
bi3 = LSTM(128, return_sequences=True)(bi2)
res1 = add([bi1, bi3])
bi4 = LSTM(128, return_sequences=True)(res1)
res2 = add([bi2, bi4, bi1])
bi5 = LSTM(128, return_sequences=True)(res2)
res3 = add([bi3, bi5, bi2, bi1])
bi6 = LSTM(128, return_sequences=True)(res3)
res4 = add([bi4, bi6, bi3, bi1])
se3 = LSTM(256)(res4)
#融合的LSTM
bi1_1 = (Bidirectional(LSTM(16, return_sequences=True)))(add1)
bi1_2 = (Bidirectional(LSTM(16, return_sequences=True)))(bi1_1)
bi1_3 = (Bidirectional(LSTM(16, return_sequences=True)))(bi1_2)
# attention_mul = attention_3d_block(bi1)
bi2_1 = LSTM(32, return_sequences=True)(se2)
bi2_2 = LSTM(32, return_sequences=True)(bi2_1)
bi2_3 = LSTM(32, return_sequences=True)(bi2_2)
res1 = add([bi1_1, bi2_1, bi1_3, bi2_3])
bi1_4 = (Bidirectional(LSTM(16, return_sequences=True)))(res1)
bi2_4 = LSTM(32, return_sequences=True)(res1)
res2 = add([bi1_1, bi2_1, bi1_2, bi2_2])
bi1_5 = (Bidirectional(LSTM(16, return_sequences=True)))(res2)
bi2_5 = LSTM(32, return_sequences=True)(res2)
res3 = add([bi1_1, bi2_1, bi1_2, bi2_2, bi1_3, bi2_3])
# se3 = LSTM(256)(res3)
se3 = (Bidirectional(LSTM(128)))(res3)
decoder2 = Dense(256, activation='relu')(se3)

 

基于時間步注意力機制的嵌入
# 基于timeStep的注意力
def attention_3d_block(inputs):
    # input_dim = int(inputs.shape[2])
    a = Permute((2, 1))(inputs)  # 置換維度
    # Dense層神經元個數就是最大單詞數,在時序預測問題中,是輸入的特征數
    a = Dense(36, activation='tanh')(a)
    a_probs = Permute((2, 1), name='attention_vec')(a)
    # output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')
    output_attention_mul = multiply([inputs, a_probs], name='attention_mul')
    return output_attention_mul
# 一般問題是第一層LSTM和第二層LSTM中嵌入timeStep注意力較好,可針對不同問題進行大量實驗,選擇最佳位置
bi1 = (Bidirectional(LSTM(64, return_sequences=True)))(se2)
attention_mul = attention_3d_block(bi1)
bi2 = LSTM(128, return_sequences=True)(attention_mul)
bi3 = LSTM(128, return_sequences=True)(bi2)
res1 = add([bi1, bi3])
bi4 = LSTM(128, return_sequences=True)(res1)
res2 = add([bi2, bi4, bi1])
bi5 = LSTM(128, return_sequences=True)(res2)
res3 = add([bi3, bi5, bi2, bi1])
bi6 = LSTM(128, return_sequences=True)(res3)
res4 = add([bi4, bi6, bi3, bi1])
se3 = LSTM(256)(res4)
 
 

0條評論
0 / 1000
嘎嘎嘎嘎
15文章數
0粉絲數
嘎嘎嘎嘎
15 文章(zhang) | 0 粉絲
嘎嘎嘎嘎
15文(wen)章數
0粉絲數
嘎嘎嘎嘎
15 文章 | 0 粉(fen)絲
原創(chuang)

Skip Deep LSTM(關于LSTM的trick)

2023-02-23 02:51:45
11
0

Skip Deep LSTM(關于LSTM的(de)trick)

首先(xian),網絡深度對模(mo)型性能影響極為關鍵

但是,在LSTM的應用中,通常堆(dui)疊三層以上的LSTM訓練(lian)困難,存(cun)在梯度消失或爆炸的問題(ti)

因(yin)此,借鑒GNMT(谷歌翻譯系統)的(de)思想,提出一種基(ji)于稠密跳躍連接的(de)深度LSTM(Skip Deep LSTM)

實驗表明,在圖像理解任(ren)(ren)務上(shang)(shang),訓練loss優于(yu)傳統LSTM,同時(shi),在時(shi)序(xu)預測等任(ren)(ren)務(發電(dian)預測等)上(shang)(shang),該模型設計(ji)方式優于(yu)常用LSTM。

模型設計參考(kao)了GNMT思(si)想,第(di)一層為(wei)雙(shuang)向長短時記憶網絡(BiLSTM),深度(du)為(wei)5-7層效果最佳,在image caption 和時序預測問題(ti)上優于常用LSTM。

核心代碼(ma)如下(基(ji)于tf2實現):

 


# 基于稠密連接的深度LSTM,可根據實驗情況搭建不同尺度連接和網絡層數,目前5-7層效果最佳
bi1 = (Bidirectional(LSTM(64, return_sequences=True)))(se2)
bi2 = LSTM(128, return_sequences=True)(bi1)
bi3 = LSTM(128, return_sequences=True)(bi2)
res1 = add([bi1, bi3])
bi4 = LSTM(128, return_sequences=True)(res1)
res2 = add([bi2, bi4, bi1])
bi5 = LSTM(128, return_sequences=True)(res2)
res3 = add([bi3, bi5, bi2, bi1])
bi6 = LSTM(128, return_sequences=True)(res3)
res4 = add([bi4, bi6, bi3, bi1])
se3 = LSTM(256)(res4)
#融合的LSTM
bi1_1 = (Bidirectional(LSTM(16, return_sequences=True)))(add1)
bi1_2 = (Bidirectional(LSTM(16, return_sequences=True)))(bi1_1)
bi1_3 = (Bidirectional(LSTM(16, return_sequences=True)))(bi1_2)
# attention_mul = attention_3d_block(bi1)
bi2_1 = LSTM(32, return_sequences=True)(se2)
bi2_2 = LSTM(32, return_sequences=True)(bi2_1)
bi2_3 = LSTM(32, return_sequences=True)(bi2_2)
res1 = add([bi1_1, bi2_1, bi1_3, bi2_3])
bi1_4 = (Bidirectional(LSTM(16, return_sequences=True)))(res1)
bi2_4 = LSTM(32, return_sequences=True)(res1)
res2 = add([bi1_1, bi2_1, bi1_2, bi2_2])
bi1_5 = (Bidirectional(LSTM(16, return_sequences=True)))(res2)
bi2_5 = LSTM(32, return_sequences=True)(res2)
res3 = add([bi1_1, bi2_1, bi1_2, bi2_2, bi1_3, bi2_3])
# se3 = LSTM(256)(res3)
se3 = (Bidirectional(LSTM(128)))(res3)
decoder2 = Dense(256, activation='relu')(se3)

 

基于時間步注意力機制的嵌入
# 基于timeStep的注意力
def attention_3d_block(inputs):
    # input_dim = int(inputs.shape[2])
    a = Permute((2, 1))(inputs)  # 置換維度
    # Dense層神經元個數就是最大單詞數,在時序預測問題中,是輸入的特征數
    a = Dense(36, activation='tanh')(a)
    a_probs = Permute((2, 1), name='attention_vec')(a)
    # output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')
    output_attention_mul = multiply([inputs, a_probs], name='attention_mul')
    return output_attention_mul
# 一般問題是第一層LSTM和第二層LSTM中嵌入timeStep注意力較好,可針對不同問題進行大量實驗,選擇最佳位置
bi1 = (Bidirectional(LSTM(64, return_sequences=True)))(se2)
attention_mul = attention_3d_block(bi1)
bi2 = LSTM(128, return_sequences=True)(attention_mul)
bi3 = LSTM(128, return_sequences=True)(bi2)
res1 = add([bi1, bi3])
bi4 = LSTM(128, return_sequences=True)(res1)
res2 = add([bi2, bi4, bi1])
bi5 = LSTM(128, return_sequences=True)(res2)
res3 = add([bi3, bi5, bi2, bi1])
bi6 = LSTM(128, return_sequences=True)(res3)
res4 = add([bi4, bi6, bi3, bi1])
se3 = LSTM(256)(res4)
 
 

文章來自個人專欄
文章 | 訂閱(yue)
0條評論
0 / 1000
請輸入你的評論
1
1