ai-maths-consulting

AI & Algorithms: Neural Networks

This blog post on Neural Networks is part of the article series Understanding AI algorithms.

Artificial neural networks (ANN) are the most complex of the algorithms we will cover in this article series, and our explanation has been simplified to avoid getting lost in the weeds of the issue.

Simply put, they are inspired by the way human brains process information through a large number of interconnected neurons. Even though artificial neural networks can’t measure up with real brains, the method has proven effective for solving difficult problems like voice recognition, image recognition, and learning to recognize handwriting.

Each neuron, also known as a node, receives input from one or multiple nodes within the network, or from an external source. Each input has its own individual weight, or importance. When receiving some input, the node applies a non-linear function, called an activation function, to the sums of its weighted input and produces an output. In simpler terms, all nodes perform a computation on the information they receive. 

neural-network

Nodes are divided into three different layers, as shown in the image above. The first layer, called the input layer, receives input (data) from an external source. The outputs from the nodes in the input layer are then weighted and passed on to the second layer, called the hidden layer, where intermediate computation is performed.

This step is complex, but for our purposes just know that the computation being done in the hidden layer uses weighted outputs from several nodes – that is, information from several places in the network. There can be a number of hidden layers and the complexity increases with the number of hidden layers.

When a neural network contains more than one hidden layer it is referred to as deep learning. The output from the nodes in the hidden layer(s) are also weighted and passed on to the output layer, where the result is computed. By performing all these computations in different steps, the algorithm can find hidden patterns in the data.

There are several computations in different steps, in which the output from one computation is “merged” with the output from other computations, and these are computed together to get a new output.

In other words, information doesn’t go through a neural network in a straightforward way, as we saw with other systems. Here, the information is used for calculations at several points, and the results then feed back into the system for further analysis. 

With this complex structure, neural networks can almost simulate how a brain process information, though not nearly at the same scale.

After training the network, the nodes then learn to react to different inputs. For example, if we input the information that a customer has been loyal to your company for ten years and lives in a big city, the nodes in the input will react to this information and send the reaction forward to the hidden layer.

This will trigger a reaction (computation) in the hidden layer. The reactions in the hidden layer are then gathered in the output layer, which also creates a reaction. In this case, it may be that the customer is likely to use the online shop.

Artificial neural networks are complex and hard to grasp at first sight, and we can only scratch the surface of them here. Using this type of algorithm demands a lot of trust in data and in the data scientist.

The issue is that as a business executive, you cannot count on understanding the conditions for a certain outcome.

However, this is also the advantage of using an artificial neural network: they can handle relationships that are more complex than what can be explained without them.  

Artificial neural networks have a complex structure that gives them the ability to detect hidden patterns and non-linear relationships, and to work well with classification.

Due to their complex structure, artificial neural networks can find relationships that would be hard, or even impossible, to assume from the beginning and therefore could be missed when using a simpler algorithm.

Although neural networks may work well for some problems, they are not suitable for all. Sometimes a task requires interpretation and explanation, and since these types of networks are based on many connections and computations, it is hard to explain what the results are based on.

Even if the task does not require explanation, it can cast doubt on the output of the model when its function is too complex to understand. Data scientists can test the result by comparing predictions with observed outputs, but this isn’t always enough to justify the model’s downsides.  

Another issue is that it takes time to train a model, and as the number of hidden layers increases, the model becomes increasingly complex. This creates the risk of overfitting and increases the required investment in time and money.

Clearly, neural networks are an enormous topic onto themselves that fills whole books. Here, we’ve looked at some of the key attributes of these systems to help show how they differ from other forms of classification.

Next, we’ll look at a concept called clustering, which is another way to group and analyze data used by AI systems.

If you want to read all the related articles on the topic of AI algorithms, here is the list of all blog posts in this article series: