基于神经网络的 MNIST 手写数字识别 仅仅是“人工智能选修课”的上机作业
[TOC]
一、实验目的
掌握运用神经网络模型解决有监督学习问题
掌握机器学习中常用的模型训练测试方法
了解不同训练方法的选择对测试结果的影响
二、实验内容 MNIST 数据集 本实验采用的数据集 MNIST 是一个手写数字图片数据集,共包含图像和对应的标签。数据集中所有图片都是 28x28 像素大小,且所有的图像都经过了适当的处理使得数字位于图片的中心位置。MNIST 数据集使用二进制方式存储。图片数据中每个图片为一个长度为 784(28x28x1,即长宽 28 像素的单通道灰度图)的一维向量,而标签数据中每个标签均为长度为 10 的一维向量。
分层采样方法 分层采样(或分层抽样,也叫类型抽样)方法,是将总体样本分成多个类别,再分别在每个类别中进行采样的方法。通过划分类别,采样出的样本的类型分布和总体样本相似,并且更具有代表性。在本实验中,MNIST 数据集为手写数字集,有 0~9 共 10 种数字,进行分层采样时先将数据集按数字分为 10 类,再按同样的方式分别进行采样。
神经网络模型评估方法 通常,我们可以通过实验测试来对神经网络模型的误差进行评估。为此,需要使用一个测试集来测试模型对新样本的判别能力,然后以此测试集上的测试误差作为误差的近似值。两种常见的划分训练集和测试集的方法:
留出法(hold-out) 直接将数据集按比例划分为两个互斥的集合。划分时为尽可能保持数据分布的一致性,可以采用分层采样(stratified sampling)的方式,使得训练集和测试集中的类别比例尽可能相似。需要注意的是,测试集在整个数据集上的分布如果不够均匀还可能引入额外的偏差,所以单次使用留出法得到的估计结果往往不够稳定可靠。在使用留出法时,一般要采用若干次随机划分、重复进行实验评估后取平均值作为留出法的评估结果。
k 折交叉验证法(k-fold cross validation) 先将数据集划分为 k 个大小相似的互斥子集,每个子集都尽可能保持数据分布的一致性,即也采用分层采样(stratified sampling)的方法。然后,每次用 k-1 个子集的并集作为训练集,余下的那个子集作为测试集,这样就可以获得 k 组训练集和测试集,从而可以进行 k 次训练和测试。最终返回的是这 k 个测试结果的均值。显然,k 折交叉验证法的评估结果的稳定性和保真性在很大程度上取决于 k 的取值。k 最常用的取值是 10,此外常用的取值还有 5、20 等。
三、实验方法设计 介绍实验中程序的总体设计方案、关键步骤的编程方法及思路,主要包括:
1)模型构建的程序设计(伪代码或源代码截图)及说明解释(10 分) 构建全连接神经网络,每一层的神经元个数分别为:784->128->128->10
采用 Adam 优化器,使用 softmax 函数计算 loss
具体解释见代码注释
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 def train_and_test (images_train, labels_train, images_test, labels_test, images_validation, labels_validation ): x = tf.placeholder(tf.float32, [None , 784 ], name="X" ) y = tf.placeholder(tf.float32, [None , 10 ], name="Y" ) h1 = fcn_layer(inputs=x, input_dim=784 , output_dim=128 , activation=tf.nn.relu) h2 = fcn_layer(inputs=h1, input_dim=128 , output_dim=128 , activation=tf.nn.relu) forward = fcn_layer(inputs=h2, input_dim=128 , output_dim=10 , activation=None ) pred = tf.nn.softmax(forward) loss_function = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=forward, labels=y)) optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss_function) correct_prediction = tf.equal(tf.argmax(pred, 1 ), tf.argmax(y, 1 )) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
其中 fcn_layer
函数:
1 2 3 4 5 6 7 8 9 def fcn_layer (inputs, input_dim, output_dim, activation=None ): W = tf.Variable(tf.truncated_normal( [input_dim, output_dim], stddev=0.1 )) b = tf.Variable(tf.zeros([output_dim])) XWb = tf.matmul(inputs, W) + b return XWb if activation is None else activation(XWb)
2)模型迭代训练的程序设计(伪代码或源代码截图)及说明解释(10 分) 具体解释见代码注释
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 train_epochs = 32 batch_size = 64 display_step = 4096 learning_rate = 0.001 optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss_function) correct_prediction = tf.equal(tf.argmax(pred, 1 ), tf.argmax(y, 1 )) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) with tf.Session() as sess: init = tf.global_variables_initializer() sess.run(init) step = 0 for (batchImages, batchLabels) in batch_iter(images_train, labels_train, batch_size, train_epochs, shuffle=True ): sess.run(optimizer,feed_dict={x: batchImages, y: batchLabels})
3)模型训练过程中周期性测试的程序设计(伪代码或源代码截图)及说明解释(周期性测试指的是每训练 n 个 step 就对模型进行一次测试,得到准确率和 loss 值)(10 分) 具体解释见代码注释
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 display_step = 4096 with tf.Session() as sess: init = tf.global_variables_initializer() sess.run(init) step = 0 for (batchImages, batchLabels) in batch_iter(images_train, labels_train, batch_size, train_epochs, shuffle=True ): sess.run(optimizer,feed_dict={x: batchImages, y: batchLabels}) if step % display_step == 0 : loss, acc = sess.run([loss_function, accuracy], feed_dict={x: images_validation, y: labels_validation}) print (f"step: {step+1 } Loss={loss} accuracy={acc} " ) step += 1
输出结果:
1 2 3 4 5 6 7 8 step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635 step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776 step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904 step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944 step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646 step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398 step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308 === test accuracy: 0.97 ===
4)分层采样的程序设计(伪代码或源代码截图)及说明解释(10 分) 利用 sklearn
中的 train_test_split
实现。十次随机抽取训练集和测试集,取平均值。
具体解释见代码注释
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 from sklearn.model_selection import train_test_split def hold_out (images, labels, train_percentage ): accu = [] for _ in range (10 ): train_images, test_images, train_labels, test_labels = \ train_test_split(images, labels, train_size=train_percentage, stratify = labels ) accu.append(train_and_test(train_images, train_labels, test_images, test_labels, test_images, test_labels)) print ("hold-out accuracy:" , accu)
5)k 折交叉验证法的程序设计(伪代码或源代码截图)及说明解释(10 分) 利用 sklearn
中的 KFold
实现。计算 k 中不同抽取下的平均值。
具体解释见代码注释
1 2 3 4 5 6 7 8 9 10 from sklearn.model_selection import KFolddef cross_validation (images, labels, k ): accu = [] kf = KFold(n_splits=k, shuffle=True ) for train_index, test_index in kf.split(images): images_train, images_test = images[train_index], images[test_index] labels_train, labels_test = labels[train_index], labels[test_index] accu.append(train_and_test(images_train, labels_train, images_test, labels_test, images_test, labels_test)) print ("cross-validation accuracy:" , np.mean(accu))
四、实验结果展示 展示程序界面设计、运行结果及相关分析等,主要包括:
1)模型在验证集下的准确率(输出结果并截图)(10 分) 1 2 3 4 5 6 7 8 step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635 step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776 step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904 step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944 step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646 step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398 step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308 === test accuracy: 0.97 ===
2)不同模型参数(隐藏层数、隐藏层节点数)对准确率的影响和分析(10 分) 不同的隐藏层数
1 2 3 4 5 6 7 8 9 step: 1 Loss=2.5073938369750977 accuracy=0.0729999914765358 step: 4097 Loss=0.27769413590431213 accuracy=0.9217997789382935 step: 8193 Loss=0.26662880182266235 accuracy=0.9259997010231018 step: 12289 Loss=0.263393372297287 accuracy=0.9231997728347778 step: 16385 Loss=0.26742368936538696 accuracy=0.9237997531890869 step: 20481 Loss=0.26651620864868164 accuracy=0.9251997470855713 step: 24577 Loss=0.26798802614212036 accuracy=0.9247996807098389 === test accuracy: 0.9248 === 0.92479974
1 2 3 4 5 6 7 8 9 step: 1 Loss=2.4127447605133057 accuracy=0.09719999134540558 step: 4097 Loss=0.08607088774442673 accuracy=0.9745997190475464 step: 8193 Loss=0.07784661650657654 accuracy=0.9785997271537781 step: 12289 Loss=0.095745749771595 accuracy=0.9759998321533203 step: 16385 Loss=0.09472983330488205 accuracy=0.9799997210502625 step: 20481 Loss=0.09713517129421234 accuracy=0.9787996411323547 step: 24577 Loss=0.0993366464972496 accuracy=0.9801996946334839 === test accuracy: 0.9802 === 0.98019964
1 2 3 4 5 6 7 8 step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635 step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776 step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904 step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944 step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646 step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398 step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308 === test accuracy: 0.97 ===
综上可以得出结论:隐藏层的层数越多,训练越久,但是得到的结果也越准确。但越多增加的效果也越不明显
不同的隐藏层节点数
1 2 3 4 5 6 7 8 9 step: 1 Loss=2.300844669342041 accuracy=0.12519998848438263 step: 4097 Loss=0.2754775583744049 accuracy=0.9239997863769531 step: 8193 Loss=0.24036210775375366 accuracy=0.9319997429847717 step: 12289 Loss=0.22833241522312164 accuracy=0.9349997639656067 step: 16385 Loss=0.22694511711597443 accuracy=0.9351996779441833 step: 20481 Loss=0.2160138636827469 accuracy=0.9395997524261475 step: 24577 Loss=0.20927678048610687 accuracy=0.9417997598648071 === test accuracy: 0.9392 === 0.93919969
1 2 3 4 5 6 7 8 9 step: 1 Loss=2.302095890045166 accuracy=0.10459998995065689 step: 4097 Loss=0.24206139147281647 accuracy=0.9285997152328491 step: 8193 Loss=0.19353719055652618 accuracy=0.9429997801780701 step: 12289 Loss=0.18354550004005432 accuracy=0.9491997361183167 step: 16385 Loss=0.18149533867835999 accuracy=0.9485996961593628 step: 20481 Loss=0.1877274215221405 accuracy=0.9493997097015381 step: 24577 Loss=0.1913667917251587 accuracy=0.951799750328064 === test accuracy: 0.9548 === 0.95479971
1 2 3 4 5 6 7 8 step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635 step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776 step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904 step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944 step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646 step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398 step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308 === test accuracy: 0.97 ===
综上可以得出结论:
隐藏层的节点数越多,参数量指数上升,训练越久,但是得到的结果也越准确。
(但随着参数量到一定程度,训练结果准确率上升趋于不明显,甚至发生过拟合现象。)
3)不同训练参数(batch size、epoch num、学习率)对准确率的影响和分析(10 分) 不同的 batch size
1 2 3 4 5 6 7 8 step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635 step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776 step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904 step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944 step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646 step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398 step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308 === test accuracy: 0.97 ===
1 2 3 step: 1 Loss=2.2310731410980225 accuracy=0.16619999706745148 === test accuracy: 0.9718 === 0.97179973
1 2 3 step: 1 Loss=2.2919344902038574 accuracy=0.15059998631477356 === test accuracy: 0.8782 === 0.87819982
综上可以得出结论:batch size 越大,训练越快。
但是过大的 batch size 会占用过多的显存,甚至导致溢出;同时也不利于随机梯度下降。
不同的 epoch num
1 2 3 4 5 6 7 8 9 step: 1 Loss=2.343087673187256 accuracy=0.11779998987913132 step: 129 Loss=0.330795556306839 accuracy=0.9037997722625732 step: 257 Loss=0.24847447872161865 accuracy=0.925399661064148 step: 385 Loss=0.198109969496727 accuracy=0.9409997463226318 step: 513 Loss=0.17987975478172302 accuracy=0.9479997754096985 step: 641 Loss=0.1628917008638382 accuracy=0.9541996717453003 step: 769 Loss=0.14910820126533508 accuracy=0.9547997117042542 === test accuracy: 0.9526 === 0.95259976
1 2 3 4 5 6 7 8 step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635 step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776 step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904 step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944 step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646 step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398 step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308 === test accuracy: 0.97 ===
综上可以得出结论:epoch num 训练了多少数据后停止训练。最好在模型准确率趋于稳定之后停止训练,不然准确率将达不到期望值。
不同的学习率
1 2 3 4 5 6 7 8 9 step: 1 Loss=2.328317642211914 accuracy=0.09679999947547913 step: 129 Loss=1.5215153694152832 accuracy=0.7011998891830444 step: 257 Loss=0.8109210133552551 accuracy=0.8205997943878174 step: 385 Loss=0.5582057237625122 accuracy=0.863199770450592 step: 513 Loss=0.4527219235897064 accuracy=0.8837997317314148 step: 641 Loss=0.39591166377067566 accuracy=0.8925997614860535 step: 769 Loss=0.3588014245033264 accuracy=0.8997997641563416 === test accuracy: 0.9004 === 0.9003998
1 2 3 4 5 6 7 8 9 step: 1 Loss=2.343087673187256 accuracy=0.11779998987913132 step: 129 Loss=0.330795556306839 accuracy=0.9037997722625732 step: 257 Loss=0.24847447872161865 accuracy=0.925399661064148 step: 385 Loss=0.198109969496727 accuracy=0.9409997463226318 step: 513 Loss=0.17987975478172302 accuracy=0.9479997754096985 step: 641 Loss=0.1628917008638382 accuracy=0.9541996717453003 step: 769 Loss=0.14910820126533508 accuracy=0.9547997117042542 === test accuracy: 0.9526 === 0.95259976
1 2 3 4 5 6 7 8 9 step: 1 Loss=2.3110170364379883 accuracy=0.2709999680519104 step: 129 Loss=0.2452460527420044 accuracy=0.9277997016906738 step: 257 Loss=0.215981587767601 accuracy=0.9361997246742249 step: 385 Loss=0.21104326844215393 accuracy=0.9363997578620911 step: 513 Loss=0.172766774892807 accuracy=0.9469997882843018 step: 641 Loss=0.14438582956790924 accuracy=0.9573997855186462 step: 769 Loss=0.15849816799163818 accuracy=0.9527996778488159 === test accuracy: 0.9558 === 0.95579976
1 2 3 4 5 6 7 8 9 step: 1 Loss=43.770484924316406 accuracy=0.10619999468326569 step: 129 Loss=1.7850791215896606 accuracy=0.2809999883174896 step: 257 Loss=1.7752128839492798 accuracy=0.3105999827384949 step: 385 Loss=1.719871997833252 accuracy=0.3147999942302704 step: 513 Loss=1.6704318523406982 accuracy=0.3511999845504761 step: 641 Loss=1.6277217864990234 accuracy=0.34059998393058777 step: 769 Loss=1.8401107788085938 accuracy=0.2733999788761139 === test accuracy: 0.2738 === 0.27379999
综上可以得出结论:
学习率减小,准确率提高,但收敛慢。
学习率减小,学习速率增加,但易震荡
4)留出法不同比例对结果的影响和分析(10 分) 1 2 3 4 5 6 7 8 9 print ("===== hold-out =====" )print ("train_percentage: 0.8: " , end='' )hold_out(total_images, total_labels, 0.8 ) print ("train_percentage: 0.9: " , end='' )hold_out(total_images, total_labels, 0.9 ) print ("train_percentage: 0.5: " , end='' )hold_out(total_images, total_labels, 0.5 ) print ("train_percentage: 0.2: " , end='' )hold_out(total_images, total_labels, 0.2 )
结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 ===== hold-out ===== train_percentage: 0.8: hold-out accuracy: [0.97072774, 0.974455, 0.97690958, 0.97781873, 0.97545499, 0.96990955, 0.97654593, 0.97209132, 0.97390956, 0.97772777] train_percentage: 0.9: hold-out accuracy: [0.97945446, 0.97327256, 0.97381806, 0.97654533, 0.97981811, 0.97472721, 0.97472715, 0.97327256, 0.97436351, 0.97163624] train_percentage: 0.5: hold-out accuracy: [0.97501898, 0.97083724, 0.97414637, 0.97454625, 0.97087359, 0.97469169, 0.97520077, 0.96894628, 0.97367346, 0.97367346] train_percentage: 0.2: hold-out accuracy: [0.96202344, 0.95893252, 0.95929617, 0.95911437, 0.95968258, 0.95965987, 0.96059167, 0.96009171, 0.96056885, 0.95806891]
综上可以得出结论:太极端的 train_percentage 使测试说服性降低,最好在 0.8 附近
5)k 折交叉验证法不同 k 值对结果的影响和分析(10 分) 1 2 3 4 5 6 7 8 9 print ("===== cross-validation =====" )print ("k=5: " , end='' )cross_validation(total_images, total_labels, 5 ) print ("k=10: " , end='' )cross_validation(total_images, total_labels, 10 ) print ("k=20: " , end='' )cross_validation(total_images, total_labels, 20 ) print ("k=2: " , end='' )cross_validation(total_images, total_labels, 2 )
结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 ===== cross-validation ===== k=5: cross-validation accuracy: 0.975146 k=10: cross-validation accuracy: 0.976927 k=20: cross-validation accuracy: 0.977491 k=2: cross-validation accuracy: 0.973474
综上可以得出结论:k 折交叉验证法比留出法对 k 的鲁棒性要好一点
五、实验总结及心得 我通过 MNIST 手写数字图片数据集训练一个简单的手写数字识别神经网络为例子,了解了用 Transflow 训练全连接神经网络的技巧,探究了神经网络的各种参数对训练过程以及训练结果的影响。
还尝试了“留出法”与“k 折交叉验证法”这两种神经网络模型评估方法,探索了这两种方法参数对评估结果的影响。
参考 anaconda 优雅安装 tensorflow(不用手动安装 cuda、cudnn 等):
conda create --name tf_gpu_env python=3.6 anaconda tensorflow-gpu
不踩坑:Ubuntu 下安装 TensorFlow 的最简单方法(无需手动安装 CUDA 和 cuDNN) - 知乎 (zhihu.com)
运行 jupyter 时遇到的问题解决:
彻底解决:AttributeError:type object IOLoop has no attribute initialized_Joyyang_c 的博客-CSDN 博客