Introduction: - University of Wisconsin–Madison



Handwritten Digit RecognizerECE 539 ProjectBy Ryan BambroughIntroduction:For my project this semester I designed a neural network to recognize hand written digits. The inspiration of this project stems from the Kaggle digit recognizer competition. This competition’s goal is to take an image of a handwritten digit, between 0 and 9, and determine which digit is present. The data set provided by Kaggle is based off the MNIST data set which is a well-known handwritten digit data set. Kaggle uses a slightly modified version of this data set, which consists of 42,000 training sample images and 28,000 testing images. The images are 28 by 28 pixels and are grayscale, between 0 and 255, and contain the digit in different parts of image with a different writing style. I wanted to do this project in order to get a better grasp on the application of neural networks in practical applications, i.e. those outside of the academic realm. Before starting work on this problem I researched a few of the applications that a solution to this problem would be optimal for, I was surprised by how prevalent this problem is in our daily lives. Handwritten digit recognition is used by the USPS to read the address written on letters and by digital ink to text programs like OneNote. Throughout my work I tried to explore the idea of K.I.S.S, keep it simple stupid, that is to achieve the best possible results with the simplest implementation. My goal for this project was to get above a 90% classification rate using as simple of network as possible. There are many approaches to solve this problem and each of those suffer from the law of diminishing returns, more on that topic later. I believe I achieved my goals and learned a lot on the way.Project Process:Starting this project was a bit daunting because of the huge variety of possible paths to take in order to solve this problem. There are a ton of different neural network types, each with its own pros and cons. Deciding to stick with the simple solution I went with a feedforward and back propagation network with one hidden layer and one output layer. At this point I still needed to determine the best sub-configuration of this network to solve them problem. After deciding what type of network I would be using I tried to use Matlab’s Neural Network Toolbox to solve the problem. However, I faced a few difficulties when trying to do this, as I will go into depth soon and ended up using a Python implementationMatlab ImplementationI started off trying to create the neural network using the Matlab provided tools. I was able to read in the provided .csv file, from Kaggle, and setup a basic network, 784 input nodes, 10 hidden nodes, and 10 output nodes, but when I tried to train it with the training data, I had a lot of issues. Primary among the issues that appeared was the time it took to run. I am not sure if it was the setup I used or the amount of data that I was trying to process, but training took ~5-6 hours running on my laptop which as an i7 and 16 GB of RAM and didn’t turn out very good once it finished. After trying to run this a few times I quickly abandoned Matlab and moved onto a Python implementation after posts online said that this type of implementation running on Python would take less than 45 minutes. I didn’t spend a lot of time trying to get Matlab implementation working instead deciding to focus on trying to get a Python implementation working.Python ImplementationI had previous experience with Python so it wasn’t that big of switch and I was helped along the way after I found a package of code, in Python, for the type of network that I wanted to implement. For my Python implementation I used the base code provided by an online book, Neural Networks and Deep Learning by Micheal Nielsen. Running this code with the provided data set has a run time of ~30 minutes, depending on the setup. After establishing that the Python implementation would provide a faster response time, I wrote a program to take the data provided by Kaggle and pre-process it before piping it into the provided code base. It took a bit longer to implement this than I expected because of some setup issues on my computer with Python. Once I had the program written I realized that the test data provided by Kaggle wasn’t labeled. Kaggle allows you to check the output of the network by submitting a document with the output of the network to their website. You can only submit a document to be checked 5 times per day which wasn’t ideal, especially with the time configuration I had to work on this project, I could only do work on this project one day per week. So instead of using the Kaggle data I went with the data that the code base provided. The data that was initially used by the code base had the testing data already labeled and properly formatted for the network. The code base used the same data set, MNIST, but had it formatted a little differently compared to Kaggle.Once I was able to properly test different network configurations and had the data formatted I started off testing different configurations for the parameters of the network. I created a simple feed forward and back propagation network with a configurable number of neurons with input, hidden, and output layers. The implementation also had parameters for batch testing, i.e. taking a small random sampling of the training data and using that to train the network, learning rate, and epoch numbers. I then started the long process of trying different network configurations and seeing which ones provided the best solution. I was able to change the learning rate, number of epochs, batch size, number of neurons in each layer, and activation function. For all the tests I used the input neuron count of 784, one for each pixel and 10 output neurons, one for each of the possible outputs. The output was configured to be one hot, only one output is high for a single input. Results and Discussions:After I had the network all setup and ready to be tested I started investigating the effect that each of the parameters had on the performance of the network. I created a baseline network with which to base all improvements off of. The base network was 784 inputs nodes, one for each pixel, 10 hidden nodes, and 10 output nodes, one for each of the possible outputs 0 through 9. The data set that I ended up using was 50,000 images for training and 10,000 images for testing. Images from the testing set were reallocated to the training set in order to provide more samples for training the network. From what I was seeing this problem has a fairly small solution space, listed below are a few of the main configurations that I tried. Certain setups performed really well, above the 90% mark that I was looking for, however a few network types just weren’t able to find the solution space at all.Hidden NeuronsLearning RateBatch SizeEpoch #Result101.01003088.9 %103.01003090.6 %301.01003091.7 %303.01003084.8 %101.010003064.2 %301.010003084.3 %303.010010094.5 %1001.01003081.6 %1003.01003074.4 %3010.01003094.0 %5010.01003095.3 %The results listed above are all with a sigmoidal activation function. I tried using a different activation functions, but it didn’t provide any benefit so I stuck with a sigmoid activation function.A few of the big things that I noticed is that batch size and epoch # didn’t really matter that much beyond a threshold as they provided roughly the same results. As I expected at the beginning I am seeing the law of diminishing returns present in this network. You can throw an insane amount of computing power at this problem, process every sample, and thoroughly go through this data set, however there won’t be much improvement by doing so. The learning rate and number of neurons in the hidden layer were what really determined if a network performed well or not. Each parameter had a range that it performed best in and if it was moved outside the ‘sweet zone’ performance took a hit. Trying anything to far from these zones completely destroyed the performance. Additionally, the larger number of hidden neurons the longer the network takes to process. The 100 hidden neuron networks took ~60 minutes to train.After running through all the tests I found that a simplicity really was best. More complicated networks with more hidden nodes seemed to over-fit the data and got bad performance and the opposite was also true, too simple of network and it wasn’t able to properly classify the data. The principle of KISS that I tried to keep to applied only after a certain threshold that was necessary. Conclusions:Going through the entire process of this project I got a glimpse of the power that artificial neural networks provide. They are fantastic devices for problem solving, especially when you have a large data set of pre-classified data, like I did. The power of these problem solving devices comes from their ability to go through data and modify itself trying to find a solution space that works for all data that is provided to it.If I were to do this project again, I would have loved creating a more complicated network and comparing it to the basic network. It would be interesting to see the amount of computing required to get to a certain level of performance and compare the efficiently of certain networks at solving different types of problems. While I was dealing with a relatively simply problem I saw the complexity that it takes to get a network, even a simple one, optimized to solve a problem. Problems can have an extremely small solution space that is hard to get into and finding a setup that is even capable of getting into the solution space at all can be a problem in and of itself. I can only imagine the complexity that experts currently working with artificial neural networks go through to train their networks. Overall, I loved doing this project and wish I could have put more time into exploring a more complex network. Learning the basics of neural networks has taught me that they are tough to get right but when you do they provide an invaluable problem solving too.References:Neural Networks and Deep Learning, by Micheal Nielsen, , Lecture Slides on Class Website ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download