Adam Code implementing parameter optimization algorithm Adam[Python]

Adam Code implementing parameter optimization algorithm Adam[Python]
1. Adam Class Definition
2. Adam is used in the train function used in learning

Adam Code implementing parameter optimization algorithm Adam[Python]

Adam Class Definition

Adam’s weight update is as follows:

If this is implemented, the following code will be turned out.

Define adam class.

# Adam
class Adam:
    def __init__(self, lr=0.001, beta1=0.9, beta2=0.999):
        self.lr = lr
        self.beta1 = beta1
        self.beta2 = beta2
        self.iter = 0
        self.m = None
        self.v = None
        
def update(self, params, grads):
        if self.m is None:
            self.m, self.v = {}, {}
            for key, val in params.items():
                self.m[key] = np.zeros_like(val)
                self.v[key] = np.zeros_like(val)
        
self.iter += 1
        
for key in params.keys():
            self.m[key] = self.beta1 * self.m[key] + (1 - self.beta1) * grads[key]
            self.v[key] = self.beta2 * self.v[key] + (1 - self.beta2) * (grads[key]**2)
            m_unbias = self.m[key] / (1 - self.beta1**self.iter)
            v_unbias = self.v[key] / (1 - self.beta2**self.iter)
            params[key] -= self.lr * m_unbias / (np.sqrt(v_unbias) + 1e-7)

Adam is used in the train function used in learning

If this is incorporated in the train function used in model learning, it becomes the following code.

def train(x, t, eps=0.005):
    global W, b # Weights and Biases
    batch_size = x.shape[0]
    
t_hat = softmax(np.matmul(x, W) + b)

cost = (- t * np_log(t_hat)).sum(axis=1).mean()
    delta = t_hat - t

dW = np.matmul(x.T, delta) / batch_size
    db = np.matmul(np.ones(shape=(batch_size,)), delta) / batch_size

# Update parameters in Adam
    params = {'W': W, 'b': b}
    grads = {'W': dW, 'b': db}
    adam.update(params, grads)
    return cost

The program for basic model learning is written in detail below, so see ↓

Reference