A Simple Tutorial on Deep Convolutional Neural Networks

By Zhao Kai()



Torch(http://torch.ch)是一个由Facebook人工智能研究院开发和维护的深度学习框架

Pytorch(https://pytorch.org)是Torch的python版本

In [1]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.autograd import Variable
import matplotlib.pyplot as plt
import matplotlib.cm as cm
%matplotlib inline
from scipy.io import loadmat, savemat

MNIST手写数字识别数据集

MNIST是由Yann Lecun维护的一个手写数字识别的数据集,训练集有60k张图像,测试集有10k张。

* 训练集中的每一个样本都有真实标签,用于训练

* 测试集中的每一个样本都有真实标签,用于评估模型性能

原数据集存的是二进制文件,这里有一份我转好的mat数据。

In [2]:
mnist = loadmat('../data/mnist.mat')
print "训练图片shape:", mnist['train_imgs'].shape
print "训练标签shape:", mnist['train_label'].shape
print "测试图片shape", mnist['test_imgs'].shape
print "测试标签shape", mnist['test_label'].shape
训练图片shape: (28, 28, 60000)
训练标签shape: (60000, 1)
测试图片shape (28, 28, 10000)
测试标签shape (10000, 1)

下面是训练集/测试集中图像的示例

* 训练集

In [3]:
fig, axarr = plt.subplots(4,6)
fig.set_figheight(8)
fig.set_figwidth(12)
for i in range(24):
    ax = axarr[i%4, int(i/4)]
    im = mnist['train_imgs'][:,:,i]
    ax.imshow(im, cmap=cm.Greys_r)
    ax.set_xticklabels('')
    ax.set_yticklabels('')
    ax.set_title(str(mnist['train_label'][i]))

* 测试集

In [4]:
fig, axarr = plt.subplots(4,6)
fig.set_figheight(8)
fig.set_figwidth(12)
for i in range(24):
    ax = axarr[i%4, int(i/4)]
    im = mnist['test_imgs'][:,:,i]
    ax.imshow(im, cmap=cm.Greys_r)
    ax.set_xticklabels('')
    ax.set_yticklabels('')
    ax.set_title(str(mnist['test_label'][i]))

下面我们定义一个简单的卷积神经网络,它有三个卷积层和两个全连接层,每个卷几层后面接一个pooling层:

In [5]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = F.relu(self.fc2(x))
        return F.log_softmax(x)
model = Net()
print model
Net (
  (conv1): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(10, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2_drop): Dropout2d (p=0.5)
  (fc1): Linear (320 -> 50)
  (fc2): Linear (50 -> 10)
)

In [6]:
img0 = mnist['train_imgs'][:,:,0].astype(np.float32)
label0 = list(mnist['train_label'][0].astype(np.int))
plt.imshow(img0, cmap=cm.Greys_r)
plt.title(str(label0))
if len(img0.shape) == 2:
    img0 = np.expand_dims(img0, axis=0)
    img0 = np.expand_dims(img0, axis=0)
img0 = torch.from_numpy(img0)
label0 = torch.LongTensor(label0)
data0 = Variable(img0)
label0 = Variable(label0)

我们将数字5的图像输入到神经网络中,看它的输出是什么

In [7]:
output = model(data0)
print output.data.numpy()
[[-2.32512116 -2.34897757 -2.19635177 -2.34897757 -2.26928759 -2.34897757
  -2.29915118 -2.23123384 -2.32250404 -2.34897757]]

此时神经网络在各个标签上的输出值是大致相等的,因为网络是随机初始化的

我们先将网络的输出转化为概率:

$$ p_i = \frac{e^{-x_i}}{\sum_{j=1}^K e^{-x_j}}$$

式中$x_j$就是神经网络的原始输出,$K=10$表示网络对每个样本由10个输出

然后计算此时网络的 Negative likelihood loss

$$nll\_loss = -\log(p_{y})$$

$x$就是模型的输出,也就是代码中的output.

整个loss的计算可以用大白话叙述为两个步骤:

1. 将模型的输出规范到0-1区间内,且满足$\sum_i p_i = 1$

2. 假设样本类别标签为$y$,则$loss=-\log(x_y)$

In [8]:
loss = F.nll_loss(output, label0)
print "loss = %f"%loss.data.numpy()
loss = 2.348978
In [9]:
def softmax(x):
    assert len(x.shape) == 2, 'require 2d array'
    return np.divide(np.exp(x), np.sum(np.exp(x), 1, keepdims=True))
In [10]:
plt.bar(range(10), softmax(output.data.numpy()).reshape(10))
print softmax(output.data.numpy()).reshape(10)
[ 0.09777158  0.0954667   0.11120812  0.0954667   0.10338579  0.0954667
  0.10034396  0.10739583  0.0980278   0.0954667 ]

首先测试一下初始状态下模型在测试集上的准确度

In [11]:
def test_model():
    batch_size=100
    model.cpu()
    pred = np.zeros((10000, 10), dtype=np.float32)    
    for i in range(10000 / batch_size):
        idx = range(i*batch_size,(i+1)*batch_size)
        data = mnist['test_imgs'].astype(np.float32)[:,:,idx]
        data = np.transpose(data, (2,0,1))
        data = np.expand_dims(data, axis=1)
        data = Variable(torch.from_numpy(data))
        pred[idx, :] = model(data).data.numpy()
    label = np.squeeze(mnist['test_label'])
    pred = np.argmax(pred, 1)
    return np.mean(label == pred)
In [12]:
test_acc = [test_model()]
print "Accuracy = %f"%test_acc[-1]
Accuracy = 0.103600

初始准确度的理想值是0.1附近

In [13]:
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)

我们先迭代100次,每次输入100张图像训练

In [14]:
loss1 = []
for i in range(100):
    # 每次随机地取100个样本进去训练
    idx = np.random.choice(mnist['train_imgs'].shape[2], 100)
    data = mnist['train_imgs'].astype(np.float32)[:,:,idx]
    data = np.transpose(data, (2,0,1))
    data = np.expand_dims(data, axis=1)
    label = list(mnist['train_label'].astype(np.int)[idx])
    data = Variable(torch.from_numpy(data))
    label = Variable(torch.LongTensor(label)).resize(100)
    # 将梯度清零,主要是为了清除上一次迭代产生的梯度
    optimizer.zero_grad()
    # 将训练数据输入到网络,得到网络输出。
    output = model(data)
    # 根据网络输出和样本标签计算损失函数loss
    loss = F.nll_loss(output, label)
    # 反向传导,计算梯度
    loss.backward()
    # 根据梯度更新参数
    optimizer.step()
    if (i+1)%10 == 0 or i == 0:
        print "迭代%d次, loss=%f, learning_rate=%f"%(i+1, loss.data.cpu().numpy(),optimizer.param_groups[0]['lr'])
    loss1.append(float(loss.data.numpy()))
迭代1次, loss=2.296437, learning_rate=0.100000
迭代10次, loss=2.294449, learning_rate=0.100000
迭代20次, loss=2.200367, learning_rate=0.100000
迭代30次, loss=1.811251, learning_rate=0.100000
迭代40次, loss=1.502604, learning_rate=0.100000
迭代50次, loss=1.230188, learning_rate=0.100000
迭代60次, loss=1.218298, learning_rate=0.100000
迭代70次, loss=0.842343, learning_rate=0.100000
迭代80次, loss=0.642054, learning_rate=0.100000
迭代90次, loss=1.000200, learning_rate=0.100000
迭代100次, loss=0.702945, learning_rate=0.100000

我们看看损失随着训练迭代是怎么变化的:

In [15]:
plt.plot(loss1, 'r')
plt.title("loss vs iter")
plt.xlabel('iter')
plt.ylabel('loss')
Out[15]:
<matplotlib.text.Text at 0x7fec5f746050>

可以看到,损失函数随着训练的进行在逐渐下降,但会有震荡,因为每次输入的样本由随机性。当迭代到一定次数之后会逐渐稳定,此时我们称模型已经收敛(convergent)

经过100次迭代之后,我们再看看输入数字5的图像到网络中,输出结果如何:

In [16]:
output = model(data0)
loss = F.nll_loss(output, label0)
loss = loss.data.numpy()
print softmax(output.data.numpy())
print "loss=%f"%loss
plt.bar(range(10), softmax(output.data.numpy()).reshape(10))
[[ 0.00630555  0.00282529  0.00468245  0.06712719  0.00282529  0.47379625
   0.00597206  0.00282529  0.42578837  0.00785228]]
loss=0.746978
Out[16]:
<Container object of 10 artists>

可以看到,此时的预测已经不是随机值了,但不一定预测对(每次训练都有一定随机性)。

在测试集上看看准确度:

In [17]:
test_acc.append(test_model())
print "Accuracy = %f"%test_acc[-1]
Accuracy = 0.782800

再迭代2000次,然后看准确度:

In [18]:
model.cuda()
test_acc = []
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
for i in range(2000):
    idx = np.random.choice(mnist['train_imgs'].shape[2], 100)
    data = mnist['train_imgs'].astype(np.float32)[:,:,idx]
    data = np.transpose(data, (2,0,1))
    data = np.expand_dims(data, axis=1)
    label = list(mnist['train_label'].astype(np.int)[idx])
    data = Variable(torch.from_numpy(data).cuda())
    label = Variable(torch.LongTensor(label).cuda()).resize(100)
    optimizer.zero_grad()
    output = model(data)
    loss = F.nll_loss(output, label)
    loss.backward()
    optimizer.step()
    if (i+101)%100 == 0 or i == 0:
        test_acc.append(test_model())
        print "迭代%d次, loss=%f, learning_rate=%f,测试acc=%f"%(i+101, loss.data.cpu().numpy(),optimizer.param_groups[0]['lr'],test_acc[-1])
        model.cuda()
    if (i+101)%1000 == 0:
        for param_group in optimizer.param_groups:
            param_group['lr'] = param_group['lr'] / 10
    loss1.append(float(loss.data.cpu().numpy()))
print "Accuracy = %f"%test_model()
迭代101次, loss=0.629804, learning_rate=0.100000,测试acc=0.773800
迭代200次, loss=0.576913, learning_rate=0.100000,测试acc=0.825000
迭代300次, loss=0.488319, learning_rate=0.100000,测试acc=0.842100
迭代400次, loss=0.395977, learning_rate=0.100000,测试acc=0.856900
迭代500次, loss=0.588810, learning_rate=0.100000,测试acc=0.870200
迭代600次, loss=0.360592, learning_rate=0.100000,测试acc=0.880000
迭代700次, loss=0.548169, learning_rate=0.100000,测试acc=0.880400
迭代800次, loss=0.132859, learning_rate=0.100000,测试acc=0.891200
迭代900次, loss=0.313449, learning_rate=0.100000,测试acc=0.885800
迭代1000次, loss=0.347670, learning_rate=0.100000,测试acc=0.883700
迭代1100次, loss=0.311989, learning_rate=0.010000,测试acc=0.913600
迭代1200次, loss=0.135406, learning_rate=0.010000,测试acc=0.930100
迭代1300次, loss=0.498950, learning_rate=0.010000,测试acc=0.932800
迭代1400次, loss=0.106556, learning_rate=0.010000,测试acc=0.930100
迭代1500次, loss=0.345995, learning_rate=0.010000,测试acc=0.935300
迭代1600次, loss=0.345517, learning_rate=0.010000,测试acc=0.935300
迭代1700次, loss=0.287954, learning_rate=0.010000,测试acc=0.936900
迭代1800次, loss=0.279675, learning_rate=0.010000,测试acc=0.938300
迭代1900次, loss=0.227744, learning_rate=0.010000,测试acc=0.938300
迭代2000次, loss=0.161694, learning_rate=0.010000,测试acc=0.940100
迭代2100次, loss=0.183835, learning_rate=0.001000,测试acc=0.941300
Accuracy = 0.935900

此时损失函数曲线:

In [20]:
p1, = plt.plot(loss1, 'r', label='Training Loss')
p2, = plt.plot(np.arange(0,len(test_acc))*100, test_acc , 'b--', label='Testing Accuracy')
plt.title("loss and acc VS iter")
plt.xlabel('iter')
plt.ylabel('loss')
plt.legend()
Out[20]:
<matplotlib.legend.Legend at 0x7fec5f776450>
In [21]:
model.cpu()
output = model(data0)
loss = F.nll_loss(output, label0)
loss = loss.data.numpy()
print softmax(output.data.numpy())
print "loss=%f"%loss
plt.bar(range(10), softmax(output.data.numpy()).reshape(10))
[[  5.66594899e-05   2.38073189e-05   2.38073189e-05   8.17918181e-02
    2.38073189e-05   9.09501135e-01   3.76468706e-05   2.67608830e-05
    2.42880406e-03   6.08565100e-03]]
loss=0.094859
Out[21]:
<Container object of 10 artists>

再输入一张随机的噪声看输出如何:

In [22]:
rd = np.random.randn(28,28).astype(np.float32)
plt.imshow(rd, cmap=cm.Greys_r)
rd = np.expand_dims(rd, axis=0)
rd = np.expand_dims(rd, axis=0)
rd = torch.from_numpy(rd)
rd = Variable(rd)
In [23]:
rd_output = model(rd)
print rd_output.data.numpy()
plt.bar(range(10),softmax(rd_output.data.numpy()).reshape(10))
[[-3.8664391  -0.41642177 -2.20800877 -3.36911607 -4.16607761 -4.2404933
  -3.99021435 -3.06137681 -2.71522784 -4.27561426]]
Out[23]:
<Container object of 10 artists>

到这里整个网络的训练过程就讲完了, 主要有以下几步:

1. 输入数据得到模型的输出,一般输出我们称为模型的prediction,得到prediction的过程叫Forward

2. 根据prediction计算损失函数loss

3. 用反向传导算法将损失反向传递,得到损失对网络参数的梯度

4. 利用Step3中的梯度更新网络参数

Step1-4整个过程称为一个迭代(iteration),经过多次迭代(一般以万计)直到模型收敛为止。




最后呢,给大家推荐这个工具:jupyter

你可以用它:

* 写一段代码: print("Hello world")

* 执行一段python脚本:

In [24]:
sum1 = 0
for i in range(1, 101):
    sum1 = sum1 + i
print "从1加到100等于 ", sum1
从1加到100等于  5050

* 写一个Latex公式:

$$ f(x) = \sum_{i=1}^{i=N} x_i $$

* 更复杂的公式:

$$ \begin{equation} \textbf{f} = \frac{d \mathbf{f}}{d \mathbf{x}} = \left[ \frac{\partial \mathbf{f}}{\partial x_1} \cdots \frac{\partial \mathbf{f}}{\partial x_n} \right] = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \end{bmatrix} \end{equation} $$

* 插入一张图片

* 绘制一张图表

In [25]:
x = np.arange(1,10,0.01)
plt.plot(x, np.sin(x))
plt.hold=True
plt.plot(x, np.cos(x))
plt.plot(x, np.cos(x-np.pi/4))
plt.xlabel('x')
plt.ylabel('sin(x)/cos(x)')
Out[25]:
<matplotlib.text.Text at 0x7fec64a11c10>
In [ ]: