TensorFlow is an opensource library for Machine Learning introduced by Google. Keras provides a high level API/Wrapper around TensorFlow. In this tutorial we look into building a simple neural network in Keras and we use TensorFlow as the backend. (Click here to learn how to install TensorFlow.)
We recommend using ‘Virtualenv’ to minimise dependency and version conflicts.
In this tech-guide, we implement a simple XOR (exclusive OR) logic in a Neural Network. XOR is interesting because it is not a linearly separable problem. We cannot draw a single linearly separable line between these nodes.
To illustrate, let’s look at a very simple example:
Example: You are a shop owner, you have four customer segments and you want a method to segment these customers. High earning old people and middle income earning young people are your most loyal customers and you are planning to provide some discounts for these segments.
Is there a way to draw a single line that separates your most loyal customers from the rest of your customers?
This is exactly the same scenario in XOR.
(Fig 1.1 – XOR representation in a graph)
(Fig 1.2 – Segmenting customers into four quadrants)
Here we attempt to solve this with Neural Networks (NN).
Let’s start by importing the packages we need to build the NN.
(Fig 2 – Code segment for imports)
Numpy is a Python library which makes array manipulations very easy. Also, Keras internally supports Numpy arrays as inputs.
There are two different APIs provided by Keras, which are Functional and Sequential models. We have chosen to import the simplest model which happens to be the sequential model. This can be seen as a linear stack of layers.
These Neural Networks consist of different layers where input data flows through and gets transformed on its way. There are several layers provided by Keras (Dense, Dropout and Merge layers). These different types of layers help us model individual kinds of neural nets for various machine learning tasks. In our scenario we only need the Dense layer.
(Fig 3 – Code Segment for creating training and test data )
The initial input will be created as a two dimensional array. The target (output) array is also two dimensional, but contains only a single output.
(Table 1 – Two dimensional data array)
Now the interesting part begins: we start creating the model.
(Fig 4 – Figure of the Neural Network)
(Fig 5 – Code segment for creating the Keras model)
As mentioned before, we have chosen to use the Dense layer. The first parameter states the number of neurons in that given layer and the second parameter states the number of inputs given to the model. (Do not get too worried about this activation function, we’ll talk about this in a while.) The next step is to add a second layer which only has one neuron.
In case you have noticed that the ‘input dimension’ is not stated in the second layer, this is because, internally, the first layer is taken as an input to the final layer.
(Fig 6 – Activation graph of Sigmoid and ReLu )
As illustrated in the above scenario, we have used two types of activation functions which is ReLU (Rectified Linear Unit) and a Sigmoid function. Just like neurons in our brains, these neurons also have to be activated in order to pass a message to the next neuron. Similarly, these activation functions are subject to certain thresholds, upon which the neurons are activated.
(Fig 7 – Code segment to compile the model)
To compile the model, we focus on the following areas:
1.) Loss Function: In order for the NN to set the correct weights to the model, we need to instruct the model on how well it is performing. This is known as the ‘loss’ (how bad it performed). For this purpose we will pick ‘mean_squared_error’ as the ‘loss’ function.
2.) Optimisation Function: The second factor is the ‘optimising’ function. The job for the optimisation function is to find the right numbers to the weights and thus reduce the loss. For optimisation we will use something called ‘Adam Optimiser’.
You may ask whether these are the only loss or optimisation functions that can be used.
The answer to that is no. Even if we add ‘binary_crossentropy’ as the error function, the NN will work without a fuss. But that does not mean all the error functions can be used interchangeably. A process of trial and error should be done to master which function fits which scenario.
3.) Measuring Accuracy: The third parameter is to measure the accuracy. In this case we use ‘binary_accuracy’ to calculate the accuracy of the predictions.
(Fig 8 -Code segment to train the model)
Next, we start the training process by calling the ‘fit’ function. Now we see that in the 54th iteration we achieve accuracy of 100%. If we run it several times the convergence number will slightly change because the weights are initialised randomly for the first iteration.
(Fig 9 – Results of the epochs)
Tweaking the model
We have built a very simple neural network that can be tweaked. For the purpose of visualisation, we have built something similar to the image below (Fig 10). We can also increase the number of hidden layers.
Increasing the number of hidden layers
(Fig 10 – Visualising the Neural Network Model with more hidden layers)
Let’s try adding another hidden layer and check if we are able to achieve 100% accuracy even quicker. To add a new layer we simply append another single line to the model.
(Fig 11 – Code segment for adding additional layers to the model)
The results are really good. We were able to achieve 100% accuracy within the 11th iteration.
(Fig 12 – Accuracy level per epochs)
Increasing the number of neurons in a layer
Now let’s have only one hidden layer and increase the number of neurons in the first layer and see whether accuracy increases.
(Fig 13 – Code segment for adding additional neurons to the model)
(Fig 14 – Accuracy level per epochs))
As illustrated above (Fig 14), the results show an increase in the iteration count in order to achieve an accuracy rate of 100%. We saw previously with 16 neurons we were able to converge in the 55th interaction, but now it takes 86 iterations to converge with 32 neurons in a layer. Therefore, we take note that increasing the number of neurons does not always yield improved results.
We have used ‘Hyper Parameter Tuning’, which is the automated model enhancer provided by Google Cloud Machine Learning Engine, to help us decide how many hidden layers we need and how many neurons are required per layer. Through trial and error we are able to find out better optimised values for our problem set.
Through this tutorial we have conveyed the fact that developers do not have to be mathematicians in order to build Machine Learning models or conduct Machine Learning. A basic understanding of how things works is enough to get started. In contrast, for Machine Learning theory, more in-depth knowledge in mathematics is required.
Thank you for reading our latest Mitra Innovation ‘How-To-Tech-Guide’ We hope you will also read our next tutorial so that we can help you solve some more interesting.
About the Author
Written by Nirojan Selvanathan – Senior Software Engineer at Mitra Innovation