【Pytorch】モデルの構築・学習を行うプログラム【全結合層】

【Pytorch】モデルの構築・学習を行うプログラム【全結合層】

【Pytorch】モデルの構築・学習を行うプログラム【全結合層】

Pytorchで全結合層で学習するモデルを実装するプログラムをまとめる。

nn.Moduleをベースに実装していく。

できるだけnnライブラリを使わず自力で実装していくパターン。

relu関数とsoftmax関数の実装

まずrelu関数とsoftmax関数を実装する。

def relu(x):
    x = torch.where(x > 0, x, torch.zeros_like(x))
    return x


def softmax(x):
    x -= torch.cat([x.max(axis=1, keepdim=True).values] * x.size()[1], dim=1)
    x_exp = torch.exp(x)
    return x_exp/torch.cat([x_exp.sum(dim=1, keepdim=True)] * x.size()[1], dim=1)

全結合層の実装（Dense）

次に全結合層を実装する。

ここではnn.Moduleを継承する。

Heの初期値を使う。

class Dense(nn.Module):  # nn.Moduleを継承する
    def __init__(self, in_dim, out_dim, function=lambda x: x):
        super().__init__()
        # He Initialization
        # in_dim: 入力の次元数、out_dim: 出力の次元数
        self.W = nn.Parameter(torch.tensor(rng.uniform(
                        low=-np.sqrt(6/in_dim),
                        high=np.sqrt(6/in_dim),
                        size=(in_dim, out_dim)
                    ).astype('float32')))
        self.b = nn.Parameter(torch.tensor(np.zeros([out_dim]).astype('float32')))
        self.function = function

    def forward(self, x):  # forwardをoverride
        return self.function(torch.matmul(x, self.W) + self.b)

Sequentialで複数層のネットワークを定義する

nnにはSequentialが用意されており、これを使うことであらかじめ定義したレイヤーを重ねて多層ネットワークを定義することができる。

mlp = nn.Sequential(
    Dense(2, 3, relu),  # 自分て定義した全結合層を重ねて2層ネットワークを定義する
    Dense(3, 2, softmax)
)

# mlp = MLP(2, 3, 2) でも同様のネットワークを定義できる

print(mlp)
print()

x = torch.Tensor([[0, 0], [0, 1], [1, 0], [1, 1]])
y = mlp(x)  # forward(x)が呼ばれる
print("feedforward：")
print(y)
print()

print("mlp.parameters()でモデルのパラメータ取得：")
print(mlp.parameters())

# 出力
# Sequential(
#   (0): Dense()
#   (1): Dense()
# )

# feedforward：
# tensor([[0.5000, 0.5000],
#         [0.2238, 0.7762],
#         [0.5246, 0.4754],
#         [0.5803, 0.4197]], grad_fn=<DivBackward0>)
# 
# mlp.parameters()でモデルのパラメータ取得：
# <generator object Module.parameters at 0x7f5f9f6d1c50>

optimizerの定義（最適化）

torch.optimに一般的なoptimizerが実装されている。

勾配のリセットは.zero_grad()で、パラメータの更新は.step()で行う。

# optimizerの定義
optimizer = optim.SGD([W1, W2], lr=0.1)

# 勾配のリセット
optimizer.zero_grad()

# パラメータの更新
optimizer.step()

モデルの学習（XOR）

実際にMLPのモデルを学習させる。

今回はXORを学習させてみる。

# XORをMLPで行う
x = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float)
t = torch.tensor([0, 1, 1, 0], dtype=torch.long)

# モデルの定義
mlp = MLP(2, 3, 2)

# 最適化の定義
optimizer = optim.SGD(mlp.parameters(), lr=0.1)  # Moduleのパラメータは.parameters()で取得できる

# モデルを訓練モードにする（Dropout等に関係）
mlp.train()

for i in range(1000):

    t_hot = torch.eye(2)[t]  # 正解ラベルをone-hot vector化

    # 順伝播
    y_pred = mlp(x)

    # 誤差の計算(クロスエントロピー誤差関数)
    loss = -(t_hot*torch.log(y_pred)).sum(axis=1).mean()

    # 逆伝播
    optimizer.zero_grad()
    loss.backward()

    # パラメータの更新
    optimizer.step()

    if i % 100 == 0:
        print(i, loss.item())

# 出力
# 0 0.7283064126968384
# 100 0.5940693616867065
# 200 0.4906403124332428
# 300 0.387647807598114
# 400 0.24243196845054626
# 500 0.13766250014305115
# 600 0.08652915805578232
# 700 0.05950736626982689
# 800 0.04409634321928024
# 900 0.034435056149959564

モデルの保存・読み込みをする

モデルを保存する

モデルを保存する際には、torch.save()を用るが、モデルのインスタンスを直接保存するのではなく、モデルのパラメータの情報を有するstate_dictを保存し、読み込む際にもstate_dictを読み込んでモデルのインスタンスにloadするのが一般的。

print(list(mlp.parameters()))
print()

# state_dictの取得
state_dict = mlp.state_dict()
print(state_dict)

# モデルの保存
torch.save(state_dict, './model.pth')

# 出力
# [Parameter containing:
# tensor([[ 1.7305,  2.7831, -0.4495],
#         [-1.6474, -2.7843, -1.6844]], requires_grad=True), Parameter containing:
# tensor([ 1.6473e+00, -1.5183e-03,  0.0000e+00], requires_grad=True), Parameter containing:
# tensor([[ 2.9200, -1.0661],
#         [-2.7757,  2.7199],
#         [-1.2010, -0.3710]], requires_grad=True), Parameter containing:
# tensor([-1.3526,  1.3526], requires_grad=True)]
# 
# OrderedDict([('linear1.W', tensor([[ 1.7305,  2.7831, -0.4495],
#         [-1.6474, -2.7843, -1.6844]])), ('linear1.b', tensor([ 1.6473e+00, -1.5183e-03,  0.0000e+00])), ('linear2.W', tensor([[ 2.9200, -1.0661],
#         [-2.7757,  2.7199],
#         [-1.2010, -0.3710]])), ('linear2.b', tensor([-1.3526,  1.3526]))])

モデルを読み込む

以下のようにしてモデルを読み込む

# モデルの定義
mlp2 = MLP(2, 3, 2)
print(list(mlp2.parameters()))  # ランダムな初期値
print()

# 学習済みパラメータの読み込み
state_dict = torch.load('./model.pth')
mlp2.load_state_dict(state_dict)
print(list(mlp2.parameters()))  # 学習済みパラメータ

# 出力
# [Parameter containing:
# tensor([[ 1.5004,  0.5244, -0.3561],
#         [ 1.0002, -0.6345,  0.2359]], requires_grad=True), Parameter containing:
# tensor([0., 0., 0.], requires_grad=True), Parameter containing:
# tensor([[ 1.0440, -0.1805],
#         [ 0.8546, -1.0076],
#         [ 0.5777,  0.5786]], requires_grad=True), Parameter containing:
# tensor([0., 0.], requires_grad=True)]

# [Parameter containing:
# tensor([[ 1.7305,  2.7831, -0.4495],
#         [-1.6474, -2.7843, -1.6844]], requires_grad=True), Parameter containing:
# tensor([ 1.6473e+00, -1.5183e-03,  0.0000e+00], requires_grad=True), Parameter containing:
# tensor([[ 2.9200, -1.0661],
#         [-2.7757,  2.7199],
#         [-1.2010, -0.3710]], requires_grad=True), Parameter containing:
# tensor([-1.3526,  1.3526], requires_grad=True)]