畳み込みニューラルネットワークにおける 1D、2D、3D 畳み込みの直感的な理解 [終了] 質問する

Question

写真で説明したいC3D。

一言で言えば、畳み込み方向＆出力形状は重要！

↑↑↑↑↑1D 畳み込み - 基本↑↑↑↑↑

ただ1-方向（時間軸）でコンバージョンを計算する
入力 = [W]、フィルター = [k]、出力 = [W]
例) 入力 = [1,1,1,1,1]、フィルター = [0.25,0.5,0.25]、出力 = [1,1,1,1,1]
出力形状は1次元配列です
例）グラフのスムージング

tf.nn.conv1d コードおもちゃの例

import tensorflow as tf
import numpy as np

sess = tf.Session()

ones_1d = np.ones(5)
weight_1d = np.ones(3)
strides_1d = 1

in_1d = tf.constant(ones_1d, dtype=tf.float32)
filter_1d = tf.constant(weight_1d, dtype=tf.float32)

in_width = int(in_1d.shape[0])
filter_width = int(filter_1d.shape[0])

input_1d   = tf.reshape(in_1d, [1, in_width, 1])
kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1])
output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding='SAME'))
print sess.run(output_1d)

↑↑↑↑↑2D畳み込み - 基本↑↑↑↑↑

2-方向（x,y）でconvを計算する
出力形状は2Dマトリックス
入力 = [W, H]、フィルター = [k,k]、出力 = [W,H]
例）ソベル・エグデ・フィルター

tf.nn.conv2d - おもちゃの例

ones_2d = np.ones((5,5))
weight_2d = np.ones((3,3))
strides_2d = [1, 1, 1, 1]

in_2d = tf.constant(ones_2d, dtype=tf.float32)
filter_2d = tf.constant(weight_2d, dtype=tf.float32)

in_width = int(in_2d.shape[0])
in_height = int(in_2d.shape[1])

filter_width = int(filter_2d.shape[0])
filter_height = int(filter_2d.shape[1])

input_2d   = tf.reshape(in_2d, [1, in_height, in_width, 1])
kernel_2d = tf.reshape(filter_2d, [filter_height, filter_width, 1, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)

↑↑↑↑↑3D 畳み込み - 基本↑↑↑↑↑

3-方向（x、y、z）を計算して変換する
出力形状は3D音量
入力 = [W,H,ら], フィルタ = [k,k,d] 出力 = [W,H,M]
d < Lボリューム出力を作るには重要です
例）C3D

tf.nn.conv3d - おもちゃの例

ones_3d = np.ones((5,5,5))
weight_3d = np.ones((3,3,3))
strides_3d = [1, 1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
in_depth = int(in_3d.shape[2])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
filter_depth = int(filter_3d.shape[2])

input_3d   = tf.reshape(in_3d, [1, in_depth, in_height, in_width, 1])
kernel_3d = tf.reshape(filter_3d, [filter_depth, filter_height, filter_width, 1, 1])

output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3d, strides=strides_3d, padding='SAME'))
print sess.run(output_3d)

↑↑↑↑↑3D入力による2D畳み込み- LeNet、VGG、...、↑↑↑↑↑

入力は3Dですが、例) 224x224x3、112x112x32
出力形状は3Dボリュームはありますが2Dマトリックス
フィルター深度 =ら入力チャンネルと一致する必要があります =ら
2-方向 (x,y) で変換を計算します。3D ではありません
入力 = [W,H,ら], フィルタ = [k,k,ら] 出力 = [W,H]
出力形状は2Dマトリックス
N個のフィルターをトレーニングしたい場合はどうなるでしょうか（Nはフィルターの数です）
出力形状は（積み重ねられた2D）3D = 2D × Nマトリックス。

conv2d - LeNet、VGG、... 1 つのフィルター用

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae with in_channels
weight_3d = np.ones((3,3,in_channels)) 
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_3d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_3d, kernel_3d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)

conv2d - LeNet、VGG、... N フィルター用

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)

↑↑↑↑↑CNN でのボーナス 1x1 コンバージョン- GoogLeNet、...、↑↑↑↑↑

1x1 convは、sobelのような2D画像フィルタとして考えると混乱を招きます。
CNN の 1x1 変換の場合、入力は上図のような 3D 形状になります。
深さ方向のフィルタリングを計算する
入力 = [W,H,L]、フィルター =[1,1,L]出力 = [W,H]
出力スタック形状は3D = 2D × Nマトリックス。

tf.nn.conv2d - 特殊なケースの 1x1 変換

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((1,1,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)

アニメーション（3D入力による2D変換）

オリジナルリンク:リンク
著者: マーティン・ゲルナー
ツイッター: @martin_gorner
Google +: plus.google.com/+MartinGorne

2D入力によるボーナス1D畳み込み

↑↑↑↑↑1D入力による1D畳み込み↑↑↑↑↑

↑↑↑↑↑2D入力による1D畳み込み↑↑↑↑↑

入力は2Dですが、例) 20x14
出力形状は2D、しかし1Dマトリックス
フィルターの高さ =ら入力高さと一致する必要があります =ら
1-方向（x）はconvを計算します！2Dではありません
入力 = [W,ら], フィルタ = [k,ら] 出力 = [W]
出力形状は1Dマトリックス
N個のフィルターをトレーニングしたい場合はどうなるでしょうか（Nはフィルターの数です）
出力形状は（積み重ねられた1D）2D = 1D × Nマトリックス。

ボーナスC3D

in_channels = 32 # 3, 32, 64, 128, ... 
out_channels = 64 # 3, 32, 64, 128, ... 
ones_4d = np.ones((5,5,5,in_channels))
weight_5d = np.ones((3,3,3,in_channels,out_channels))
strides_3d = [1, 1, 1, 1, 1]

in_4d = tf.constant(ones_4d, dtype=tf.float32)
filter_5d = tf.constant(weight_5d, dtype=tf.float32)

in_width = int(in_4d.shape[0])
in_height = int(in_4d.shape[1])
in_depth = int(in_4d.shape[2])

filter_width = int(filter_5d.shape[0])
filter_height = int(filter_5d.shape[1])
filter_depth = int(filter_5d.shape[2])

input_4d   = tf.reshape(in_4d, [1, in_depth, in_height, in_width, in_channels])
kernel_5d = tf.reshape(filter_5d, [filter_depth, filter_height, filter_width, in_channels, out_channels])

output_4d = tf.nn.conv3d(input_4d, kernel_5d, strides=strides_3d, padding='SAME')
print sess.run(output_4d)

sess.close()

Tensorflow の入力と出力

まとめ

Answer 1

写真で説明したいC3D。

一言で言えば、畳み込み方向＆出力形状は重要！

↑↑↑↑↑1D 畳み込み - 基本↑↑↑↑↑

ただ1-方向（時間軸）でコンバージョンを計算する
入力 = [W]、フィルター = [k]、出力 = [W]
例) 入力 = [1,1,1,1,1]、フィルター = [0.25,0.5,0.25]、出力 = [1,1,1,1,1]
出力形状は1次元配列です
例）グラフのスムージング

tf.nn.conv1d コードおもちゃの例

import tensorflow as tf
import numpy as np

sess = tf.Session()

ones_1d = np.ones(5)
weight_1d = np.ones(3)
strides_1d = 1

in_1d = tf.constant(ones_1d, dtype=tf.float32)
filter_1d = tf.constant(weight_1d, dtype=tf.float32)

in_width = int(in_1d.shape[0])
filter_width = int(filter_1d.shape[0])

input_1d   = tf.reshape(in_1d, [1, in_width, 1])
kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1])
output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding='SAME'))
print sess.run(output_1d)

↑↑↑↑↑2D畳み込み - 基本↑↑↑↑↑

2-方向（x,y）でconvを計算する
出力形状は2Dマトリックス
入力 = [W, H]、フィルター = [k,k]、出力 = [W,H]
例）ソベル・エグデ・フィルター

tf.nn.conv2d - おもちゃの例

ones_2d = np.ones((5,5))
weight_2d = np.ones((3,3))
strides_2d = [1, 1, 1, 1]

in_2d = tf.constant(ones_2d, dtype=tf.float32)
filter_2d = tf.constant(weight_2d, dtype=tf.float32)

in_width = int(in_2d.shape[0])
in_height = int(in_2d.shape[1])

filter_width = int(filter_2d.shape[0])
filter_height = int(filter_2d.shape[1])

input_2d   = tf.reshape(in_2d, [1, in_height, in_width, 1])
kernel_2d = tf.reshape(filter_2d, [filter_height, filter_width, 1, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)

↑↑↑↑↑3D 畳み込み - 基本↑↑↑↑↑

3-方向（x、y、z）を計算して変換する
出力形状は3D音量
入力 = [W,H,ら], フィルタ = [k,k,d] 出力 = [W,H,M]
d < Lボリューム出力を作るには重要です
例）C3D

tf.nn.conv3d - おもちゃの例

ones_3d = np.ones((5,5,5))
weight_3d = np.ones((3,3,3))
strides_3d = [1, 1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
in_depth = int(in_3d.shape[2])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
filter_depth = int(filter_3d.shape[2])

input_3d   = tf.reshape(in_3d, [1, in_depth, in_height, in_width, 1])
kernel_3d = tf.reshape(filter_3d, [filter_depth, filter_height, filter_width, 1, 1])

output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3d, strides=strides_3d, padding='SAME'))
print sess.run(output_3d)

↑↑↑↑↑3D入力による2D畳み込み- LeNet、VGG、...、↑↑↑↑↑

入力は3Dですが、例) 224x224x3、112x112x32
出力形状は3Dボリュームはありますが2Dマトリックス
フィルター深度 =ら入力チャンネルと一致する必要があります =ら
2-方向 (x,y) で変換を計算します。3D ではありません
入力 = [W,H,ら], フィルタ = [k,k,ら] 出力 = [W,H]
出力形状は2Dマトリックス
N個のフィルターをトレーニングしたい場合はどうなるでしょうか（Nはフィルターの数です）
出力形状は（積み重ねられた2D）3D = 2D × Nマトリックス。

conv2d - LeNet、VGG、... 1 つのフィルター用

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae with in_channels
weight_3d = np.ones((3,3,in_channels)) 
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_3d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_3d, kernel_3d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)

conv2d - LeNet、VGG、... N フィルター用

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)

↑↑↑↑↑CNN でのボーナス 1x1 コンバージョン- GoogLeNet、...、↑↑↑↑↑

1x1 convは、sobelのような2D画像フィルタとして考えると混乱を招きます。
CNN の 1x1 変換の場合、入力は上図のような 3D 形状になります。
深さ方向のフィルタリングを計算する
入力 = [W,H,L]、フィルター =[1,1,L]出力 = [W,H]
出力スタック形状は3D = 2D × Nマトリックス。

tf.nn.conv2d - 特殊なケースの 1x1 変換

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((1,1,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)

アニメーション（3D入力による2D変換）

オリジナルリンク:リンク
著者: マーティン・ゲルナー
ツイッター: @martin_gorner
Google +: plus.google.com/+MartinGorne

2D入力によるボーナス1D畳み込み

↑↑↑↑↑1D入力による1D畳み込み↑↑↑↑↑

↑↑↑↑↑2D入力による1D畳み込み↑↑↑↑↑

入力は2Dですが、例) 20x14
出力形状は2D、しかし1Dマトリックス
フィルターの高さ =ら入力高さと一致する必要があります =ら
1-方向（x）はconvを計算します！2Dではありません
入力 = [W,ら], フィルタ = [k,ら] 出力 = [W]
出力形状は1Dマトリックス
N個のフィルターをトレーニングしたい場合はどうなるでしょうか（Nはフィルターの数です）
出力形状は（積み重ねられた1D）2D = 1D × Nマトリックス。

ボーナスC3D

in_channels = 32 # 3, 32, 64, 128, ... 
out_channels = 64 # 3, 32, 64, 128, ... 
ones_4d = np.ones((5,5,5,in_channels))
weight_5d = np.ones((3,3,3,in_channels,out_channels))
strides_3d = [1, 1, 1, 1, 1]

in_4d = tf.constant(ones_4d, dtype=tf.float32)
filter_5d = tf.constant(weight_5d, dtype=tf.float32)

in_width = int(in_4d.shape[0])
in_height = int(in_4d.shape[1])
in_depth = int(in_4d.shape[2])

filter_width = int(filter_5d.shape[0])
filter_height = int(filter_5d.shape[1])
filter_depth = int(filter_5d.shape[2])

input_4d   = tf.reshape(in_4d, [1, in_depth, in_height, in_width, in_channels])
kernel_5d = tf.reshape(filter_5d, [filter_depth, filter_height, filter_width, in_channels, out_channels])

output_4d = tf.nn.conv3d(input_4d, kernel_5d, strides=strides_3d, padding='SAME')
print sess.run(output_4d)

sess.close()

畳み込みニューラルネットワークにおける 1D、2D、3D 畳み込みの直感的な理解 [終了] 質問する

ベストアンサー1

tf.nn.conv1d コードおもちゃの例

tf.nn.conv2d - おもちゃの例

tf.nn.conv3d - おもちゃの例

conv2d - LeNet、VGG、... 1 つのフィルター用

conv2d - LeNet、VGG、... N フィルター用

tf.nn.conv2d - 特殊なケースの 1x1 変換

アニメーション（3D入力による2D変換）

2D入力によるボーナス1D畳み込み

ボーナスC3D

Tensorflow の入力と出力

まとめ

おすすめ記事

ベストアンサー1

tf.nn.conv1d コード おもちゃの例

tf.nn.conv2d - おもちゃの例

tf.nn.conv3d - おもちゃの例

conv2d - LeNet、VGG、... 1 つのフィルター用

conv2d - LeNet、VGG、... N フィルター用

tf.nn.conv2d - 特殊なケースの 1x1 変換

アニメーション（3D入力による2D変換）

2D入力によるボーナス1D畳み込み

ボーナスC3D

Tensorflow の入力と出力

まとめ

おすすめ記事

tf.nn.conv1d コードおもちゃの例