Framist's Little House

◇ 自顶而下 - 面向未来 ◇

0%

【人工智能选修课】实验报告

基于神经网络的 MNIST 手写数字识别

仅仅是“人工智能选修课”的上机作业

[TOC]

一、实验目的

  • 掌握运用神经网络模型解决有监督学习问题
  • 掌握机器学习中常用的模型训练测试方法
  • 了解不同训练方法的选择对测试结果的影响

二、实验内容

MNIST 数据集

本实验采用的数据集 MNIST 是一个手写数字图片数据集,共包含图像和对应的标签。数据集中所有图片都是 28x28 像素大小,且所有的图像都经过了适当的处理使得数字位于图片的中心位置。MNIST 数据集使用二进制方式存储。图片数据中每个图片为一个长度为 784(28x28x1,即长宽 28 像素的单通道灰度图)的一维向量,而标签数据中每个标签均为长度为 10 的一维向量。

分层采样方法

分层采样(或分层抽样,也叫类型抽样)方法,是将总体样本分成多个类别,再分别在每个类别中进行采样的方法。通过划分类别,采样出的样本的类型分布和总体样本相似,并且更具有代表性。在本实验中,MNIST 数据集为手写数字集,有 0~9 共 10 种数字,进行分层采样时先将数据集按数字分为 10 类,再按同样的方式分别进行采样。

神经网络模型评估方法

通常,我们可以通过实验测试来对神经网络模型的误差进行评估。为此,需要使用一个测试集来测试模型对新样本的判别能力,然后以此测试集上的测试误差作为误差的近似值。两种常见的划分训练集和测试集的方法:

留出法(hold-out)直接将数据集按比例划分为两个互斥的集合。划分时为尽可能保持数据分布的一致性,可以采用分层采样(stratified sampling)的方式,使得训练集和测试集中的类别比例尽可能相似。需要注意的是,测试集在整个数据集上的分布如果不够均匀还可能引入额外的偏差,所以单次使用留出法得到的估计结果往往不够稳定可靠。在使用留出法时,一般要采用若干次随机划分、重复进行实验评估后取平均值作为留出法的评估结果。

k 折交叉验证法(k-fold cross validation)先将数据集划分为 k 个大小相似的互斥子集,每个子集都尽可能保持数据分布的一致性,即也采用分层采样(stratified sampling)的方法。然后,每次用 k-1 个子集的并集作为训练集,余下的那个子集作为测试集,这样就可以获得 k 组训练集和测试集,从而可以进行 k 次训练和测试。最终返回的是这 k 个测试结果的均值。显然,k 折交叉验证法的评估结果的稳定性和保真性在很大程度上取决于 k 的取值。k 最常用的取值是 10,此外常用的取值还有 5、20 等。

三、实验方法设计

介绍实验中程序的总体设计方案、关键步骤的编程方法及思路,主要包括:

1)模型构建的程序设计(伪代码或源代码截图)及说明解释(10 分)

构建全连接神经网络,每一层的神经元个数分别为:784->128->128->10

采用 Adam 优化器,使用 softmax 函数计算 loss

具体解释见代码注释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# 构建和训练模型
def train_and_test(images_train, labels_train, images_test, labels_test, images_validation, labels_validation):
x = tf.placeholder(tf.float32, [None, 784], name="X")
y = tf.placeholder(tf.float32, [None, 10], name="Y")
h1 = fcn_layer(inputs=x,
input_dim=784,
output_dim=128,
activation=tf.nn.relu)

h2 = fcn_layer(inputs=h1,
input_dim=128,
output_dim=128,
activation=tf.nn.relu)
forward = fcn_layer(inputs=h2,
input_dim=128,
output_dim=10,
activation=None)
pred = tf.nn.softmax(forward)

loss_function = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(logits=forward, labels=y))

optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss_function) # 优化器
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) # 比较预测值和真实值
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

其中 fcn_layer 函数:

1
2
3
4
5
6
7
8
9
def fcn_layer(inputs,           # 输入数据
input_dim, # 输入神经元数量
output_dim, # 输出神经元数量
activation=None): # 激活函数
W = tf.Variable(tf.truncated_normal(
[input_dim, output_dim], stddev=0.1)) # 初始化权重
b = tf.Variable(tf.zeros([output_dim])) # 初始化为 0
XWb = tf.matmul(inputs, W) + b
return XWb if activation is None else activation(XWb)

2)模型迭代训练的程序设计(伪代码或源代码截图)及说明解释(10 分)

具体解释见代码注释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
train_epochs = 32  # 训练轮数
batch_size = 64 # 单次训练样本数(批次大小)
display_step = 4096 # 显示粒度
learning_rate = 0.001 # 学习率

optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss_function) # 优化器
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) # 比较预测值和真实值
# 准确率,将布尔值转化为浮点数,并计算平均值
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


with tf.Session() as sess:
init = tf.global_variables_initializer() # 初始化变量
sess.run(init)

step = 0
for (batchImages, batchLabels) in batch_iter(images_train, labels_train, batch_size, train_epochs, shuffle=True):
sess.run(optimizer,feed_dict={x: batchImages, y: batchLabels})

3)模型训练过程中周期性测试的程序设计(伪代码或源代码截图)及说明解释(周期性测试指的是每训练 n 个 step 就对模型进行一次测试,得到准确率和 loss 值)(10 分)

具体解释见代码注释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
display_step = 4096  # 显示粒度
with tf.Session() as sess:
init = tf.global_variables_initializer() # 初始化变量
sess.run(init)

step = 0
for (batchImages, batchLabels) in batch_iter(images_train, labels_train, batch_size, train_epochs, shuffle=True):
sess.run(optimizer,feed_dict={x: batchImages, y: batchLabels})

if step % display_step == 0:
loss, acc = sess.run([loss_function, accuracy],
feed_dict={x: images_validation, y: labels_validation}) # 测试
print(f"step: {step+1} Loss={loss} accuracy={acc}")
step += 1

输出结果:

1
2
3
4
5
6
7
8
step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635
step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776
step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904
step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944
step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646
step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398
step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308
=== test accuracy: 0.97 ===

4)分层采样的程序设计(伪代码或源代码截图)及说明解释(10 分)

利用 sklearn 中的 train_test_split 实现。十次随机抽取训练集和测试集,取平均值。

具体解释见代码注释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 留出法(hold-out)
from sklearn.model_selection import train_test_split
def hold_out(images, labels, train_percentage):
accu = []
# 十次随机抽取训练集和测试集,取平均值
for _ in range(10):
train_images, test_images, train_labels, test_labels = \
train_test_split(images,
labels,
train_size=train_percentage, # 训练集比例
stratify = labels # 保持类别分布
)
accu.append(train_and_test(train_images, train_labels, test_images, test_labels, test_images, test_labels))
print("hold-out accuracy:", accu)

5)k 折交叉验证法的程序设计(伪代码或源代码截图)及说明解释(10 分)

利用 sklearn 中的 KFold 实现。计算 k 中不同抽取下的平均值。

具体解释见代码注释

1
2
3
4
5
6
7
8
9
10
# k 折交叉验证法(k-fold cross validation)
from sklearn.model_selection import KFold
def cross_validation(images, labels, k):
accu = []
kf = KFold(n_splits=k, shuffle=True)
for train_index, test_index in kf.split(images):
images_train, images_test = images[train_index], images[test_index]
labels_train, labels_test = labels[train_index], labels[test_index]
accu.append(train_and_test(images_train, labels_train, images_test, labels_test, images_test, labels_test))
print("cross-validation accuracy:", np.mean(accu))

四、实验结果展示

展示程序界面设计、运行结果及相关分析等,主要包括:

1)模型在验证集下的准确率(输出结果并截图)(10 分)

1
2
3
4
5
6
7
8
step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635
step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776
step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904
step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944
step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646
step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398
step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308
=== test accuracy: 0.97 ===

2)不同模型参数(隐藏层数、隐藏层节点数)对准确率的影响和分析(10 分)

不同的隐藏层数

  • 隐藏层数为 0 时:
1
2
3
4
5
6
7
8
9
step: 1 Loss=2.5073938369750977 accuracy=0.0729999914765358
step: 4097 Loss=0.27769413590431213 accuracy=0.9217997789382935
step: 8193 Loss=0.26662880182266235 accuracy=0.9259997010231018
step: 12289 Loss=0.263393372297287 accuracy=0.9231997728347778
step: 16385 Loss=0.26742368936538696 accuracy=0.9237997531890869
step: 20481 Loss=0.26651620864868164 accuracy=0.9251997470855713
step: 24577 Loss=0.26798802614212036 accuracy=0.9247996807098389
=== test accuracy: 0.9248 ===
0.92479974
  • 隐藏层数为 1 时:
1
2
3
4
5
6
7
8
9
step: 1 Loss=2.4127447605133057 accuracy=0.09719999134540558
step: 4097 Loss=0.08607088774442673 accuracy=0.9745997190475464
step: 8193 Loss=0.07784661650657654 accuracy=0.9785997271537781
step: 12289 Loss=0.095745749771595 accuracy=0.9759998321533203
step: 16385 Loss=0.09472983330488205 accuracy=0.9799997210502625
step: 20481 Loss=0.09713517129421234 accuracy=0.9787996411323547
step: 24577 Loss=0.0993366464972496 accuracy=0.9801996946334839
=== test accuracy: 0.9802 ===
0.98019964
  • 隐藏层数为 2 时:
1
2
3
4
5
6
7
8
step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635
step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776
step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904
step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944
step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646
step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398
step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308
=== test accuracy: 0.97 ===

综上可以得出结论:隐藏层的层数越多,训练越久,但是得到的结果也越准确。但越多增加的效果也越不明显

不同的隐藏层节点数

  • 隐藏节点数为 10*10 时:
1
2
3
4
5
6
7
8
9
step: 1 Loss=2.300844669342041 accuracy=0.12519998848438263
step: 4097 Loss=0.2754775583744049 accuracy=0.9239997863769531
step: 8193 Loss=0.24036210775375366 accuracy=0.9319997429847717
step: 12289 Loss=0.22833241522312164 accuracy=0.9349997639656067
step: 16385 Loss=0.22694511711597443 accuracy=0.9351996779441833
step: 20481 Loss=0.2160138636827469 accuracy=0.9395997524261475
step: 24577 Loss=0.20927678048610687 accuracy=0.9417997598648071
=== test accuracy: 0.9392 ===
0.93919969
  • 隐藏层数为 16*16 时:
1
2
3
4
5
6
7
8
9
step: 1 Loss=2.302095890045166 accuracy=0.10459998995065689
step: 4097 Loss=0.24206139147281647 accuracy=0.9285997152328491
step: 8193 Loss=0.19353719055652618 accuracy=0.9429997801780701
step: 12289 Loss=0.18354550004005432 accuracy=0.9491997361183167
step: 16385 Loss=0.18149533867835999 accuracy=0.9485996961593628
step: 20481 Loss=0.1877274215221405 accuracy=0.9493997097015381
step: 24577 Loss=0.1913667917251587 accuracy=0.951799750328064
=== test accuracy: 0.9548 ===
0.95479971
  • 隐藏层数为 128*128 时:
1
2
3
4
5
6
7
8
step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635
step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776
step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904
step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944
step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646
step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398
step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308
=== test accuracy: 0.97 ===

综上可以得出结论:

隐藏层的节点数越多,参数量指数上升,训练越久,但是得到的结果也越准确。

(但随着参数量到一定程度,训练结果准确率上升趋于不明显,甚至发生过拟合现象。)

3)不同训练参数(batch size、epoch num、学习率)对准确率的影响和分析(10 分)

不同的 batch size

  • batch size = 64
1
2
3
4
5
6
7
8
step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635
step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776
step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904
step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944
step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646
step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398
step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308
=== test accuracy: 0.97 ===
  • batch size = 4096
1
2
3
step: 1 Loss=2.2310731410980225 accuracy=0.16619999706745148
=== test accuracy: 0.9718 ===
0.97179973
  • batch size = 32786
1
2
3
step: 1 Loss=2.2919344902038574 accuracy=0.15059998631477356
=== test accuracy: 0.8782 ===
0.87819982

综上可以得出结论:batch size 越大,训练越快。

但是过大的 batch size 会占用过多的显存,甚至导致溢出;同时也不利于随机梯度下降。

不同的 epoch num

  • epoch num = 1
1
2
3
4
5
6
7
8
9
step: 1 Loss=2.343087673187256 accuracy=0.11779998987913132
step: 129 Loss=0.330795556306839 accuracy=0.9037997722625732
step: 257 Loss=0.24847447872161865 accuracy=0.925399661064148
step: 385 Loss=0.198109969496727 accuracy=0.9409997463226318
step: 513 Loss=0.17987975478172302 accuracy=0.9479997754096985
step: 641 Loss=0.1628917008638382 accuracy=0.9541996717453003
step: 769 Loss=0.14910820126533508 accuracy=0.9547997117042542
=== test accuracy: 0.9526 ===
0.95259976
  • epoch num = 32
1
2
3
4
5
6
7
8
step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635
step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776
step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904
step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944
step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646
step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398
step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308
=== test accuracy: 0.97 ===

综上可以得出结论:epoch num 训练了多少数据后停止训练。最好在模型准确率趋于稳定之后停止训练,不然准确率将达不到期望值。

不同的学习率

  • 学习率 = 0.0001
1
2
3
4
5
6
7
8
9
step: 1 Loss=2.328317642211914 accuracy=0.09679999947547913
step: 129 Loss=1.5215153694152832 accuracy=0.7011998891830444
step: 257 Loss=0.8109210133552551 accuracy=0.8205997943878174
step: 385 Loss=0.5582057237625122 accuracy=0.863199770450592
step: 513 Loss=0.4527219235897064 accuracy=0.8837997317314148
step: 641 Loss=0.39591166377067566 accuracy=0.8925997614860535
step: 769 Loss=0.3588014245033264 accuracy=0.8997997641563416
=== test accuracy: 0.9004 ===
0.9003998
  • 学习率 = 0.001
1
2
3
4
5
6
7
8
9
step: 1 Loss=2.343087673187256 accuracy=0.11779998987913132
step: 129 Loss=0.330795556306839 accuracy=0.9037997722625732
step: 257 Loss=0.24847447872161865 accuracy=0.925399661064148
step: 385 Loss=0.198109969496727 accuracy=0.9409997463226318
step: 513 Loss=0.17987975478172302 accuracy=0.9479997754096985
step: 641 Loss=0.1628917008638382 accuracy=0.9541996717453003
step: 769 Loss=0.14910820126533508 accuracy=0.9547997117042542
=== test accuracy: 0.9526 ===
0.95259976
  • 学习率 = 0.01
1
2
3
4
5
6
7
8
9
step: 1 Loss=2.3110170364379883 accuracy=0.2709999680519104
step: 129 Loss=0.2452460527420044 accuracy=0.9277997016906738
step: 257 Loss=0.215981587767601 accuracy=0.9361997246742249
step: 385 Loss=0.21104326844215393 accuracy=0.9363997578620911
step: 513 Loss=0.172766774892807 accuracy=0.9469997882843018
step: 641 Loss=0.14438582956790924 accuracy=0.9573997855186462
step: 769 Loss=0.15849816799163818 accuracy=0.9527996778488159
=== test accuracy: 0.9558 ===
0.95579976
  • 学习率 = 0.1
1
2
3
4
5
6
7
8
9
step: 1 Loss=43.770484924316406 accuracy=0.10619999468326569
step: 129 Loss=1.7850791215896606 accuracy=0.2809999883174896
step: 257 Loss=1.7752128839492798 accuracy=0.3105999827384949
step: 385 Loss=1.719871997833252 accuracy=0.3147999942302704
step: 513 Loss=1.6704318523406982 accuracy=0.3511999845504761
step: 641 Loss=1.6277217864990234 accuracy=0.34059998393058777
step: 769 Loss=1.8401107788085938 accuracy=0.2733999788761139
=== test accuracy: 0.2738 ===
0.27379999

综上可以得出结论:

学习率减小,准确率提高,但收敛慢。

学习率减小,学习速率增加,但易震荡

4)留出法不同比例对结果的影响和分析(10 分)

1
2
3
4
5
6
7
8
9
print("===== hold-out =====")
print("train_percentage: 0.8: ", end='')
hold_out(total_images, total_labels, 0.8)
print("train_percentage: 0.9: ", end='')
hold_out(total_images, total_labels, 0.9)
print("train_percentage: 0.5: ", end='')
hold_out(total_images, total_labels, 0.5)
print("train_percentage: 0.2: ", end='')
hold_out(total_images, total_labels, 0.2)

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
===== hold-out =====

train_percentage: 0.8:
hold-out accuracy: [0.97072774, 0.974455, 0.97690958, 0.97781873, 0.97545499, 0.96990955, 0.97654593, 0.97209132, 0.97390956, 0.97772777]

train_percentage: 0.9:
hold-out accuracy: [0.97945446, 0.97327256, 0.97381806, 0.97654533, 0.97981811, 0.97472721, 0.97472715, 0.97327256, 0.97436351, 0.97163624]

train_percentage: 0.5:
hold-out accuracy: [0.97501898, 0.97083724, 0.97414637, 0.97454625, 0.97087359, 0.97469169, 0.97520077, 0.96894628, 0.97367346, 0.97367346]

train_percentage: 0.2:
hold-out accuracy: [0.96202344, 0.95893252, 0.95929617, 0.95911437, 0.95968258, 0.95965987, 0.96059167, 0.96009171, 0.96056885, 0.95806891]

综上可以得出结论:太极端的 train_percentage 使测试说服性降低,最好在 0.8 附近

5)k 折交叉验证法不同 k 值对结果的影响和分析(10 分)

1
2
3
4
5
6
7
8
9
print("===== cross-validation =====")
print("k=5: ", end='')
cross_validation(total_images, total_labels, 5)
print("k=10: ", end='')
cross_validation(total_images, total_labels, 10)
print("k=20: ", end='')
cross_validation(total_images, total_labels, 20)
print("k=2: ", end='')
cross_validation(total_images, total_labels, 2)

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
===== cross-validation =====
k=5:
cross-validation accuracy: 0.975146

k=10:
cross-validation accuracy: 0.976927

k=20:
cross-validation accuracy: 0.977491

k=2:
cross-validation accuracy: 0.973474

综上可以得出结论:k 折交叉验证法比留出法对 k 的鲁棒性要好一点

五、实验总结及心得

我通过 MNIST 手写数字图片数据集训练一个简单的手写数字识别神经网络为例子,了解了用 Transflow 训练全连接神经网络的技巧,探究了神经网络的各种参数对训练过程以及训练结果的影响。

还尝试了“留出法”与“k 折交叉验证法”这两种神经网络模型评估方法,探索了这两种方法参数对评估结果的影响。

参考

anaconda 优雅安装 tensorflow(不用手动安装 cuda、cudnn 等):

conda create --name tf_gpu_env python=3.6 anaconda tensorflow-gpu

不踩坑:Ubuntu 下安装 TensorFlow 的最简单方法(无需手动安装 CUDA 和 cuDNN) - 知乎 (zhihu.com)

运行 jupyter 时遇到的问题解决:

彻底解决:AttributeError:type object IOLoop has no attribute initialized_Joyyang_c 的博客-CSDN 博客