Exploring the Power and Limitations of Multi-Layer Perceptron (MLP) in Machine Learning

Perceptron: Unveiling the Essence of Neural Computation

7 min readApr 6, 2024

Introduction

In the realm of artificial neural networks (ANNs), the Multi-Layer Perceptron (MLP) stands as a foundational model, pioneering the field of deep learning. Developed in the mid-20th century, the MLP has revolutionised various domains, from computer vision to natural language processing. In this article, we delve into the inner workings of MLP. Its evolution, strengths, weaknesses, and how it is placed in modern machine learning.

1. Single-Layer Perceptron (SLP): Basic architecture with a single layer of input neurons connected directly to output neurons.
2. Multi-Layer Perceptron (MLP): Contains one or more hidden layers between input and output layers, enabling complex pattern recognition.
3. Feedforward Perceptron: Information flows in one direction, from the input to the output layer, without cycles or feedback loops.
4. Recurrent Perceptron: Utilises feedback connections, allowing information to loop back within the network, and enabling temporal dynamics and memory.
5. Convolutional Perceptron: Designed specifically to process grid-like data, like images, by employing spatial hierarchies and convolutional layers for feature extraction.
6. Radial Basis Function (RBF) Perceptron: Employs radial basis functions as activation functions, often used for function approximation tasks.
7. Probabilistic Perceptron: Incorporates probabilistic models or activations, useful for uncertainty estimation and probabilistic reasoning.

Today we will be going through Single-Layer Perceptron (SLP) and Multi-Layer Perceptron (MLP).

Development of Perceptron Model :

The concept of artificial neural networks dates back to the 1940s, with the introduction of the perceptron by Frank Rosenblatt.

A single-layer perceptron (SLP) is the simplest form of artificial neural network (ANN), comprising only one layer of output neurons. Key points:

1. Architecture: SLP has one layer of output neurons directly connected to input features.

2. Activation Function: Typically, it uses a step function, outputting 1 if the weighted sum of inputs exceeds a threshold, 0 otherwise.

3. Training: Trained using the perceptron learning algorithm, adjusting weights based on errors between predicted and actual outputs.

4. Linear Separability: It can only handle linearly separable patterns, limiting its classification capabilities to data separated by a straight line or hyperplane.

5. Applications: Used in binary classification, pattern recognition, and linear regression tasks. However, its simplicity and limited capability have led to more sophisticated neural network architectures like multi-layer perceptrons (MLPs) and deep learning models.

However, the perceptron had limitations in handling nonlinear data.

In the 1980s, researchers such as Rumelhart, Hinton, and Williams proposed the backpropagation algorithm, enabling efficient training of the Multi-Layer Perceptron (MLP). This breakthrough paved the way for the development of deeper neural networks, unlocking their potential for solving complex tasks.

What is Multi-Layer Perceptron (MLP)?

The Multi-Layer Perceptron (MLP) is a class of feedforward artificial neural networks characterised by multiple layers of interconnected nodes, or neurons. Unlike its predecessors, such as the perceptron, MLP consists of an input layer, one or more hidden layers, and an output layer. Each layer comprises multiple neurons, and connections between layers are weighted, allowing the model to learn complex patterns from input data.

Multi-Layer Perceptron (MLP) overcomes the limitations of Single Layer Perceptron (SLP) in several ways:

1. Complex Patterns: MLP can learn complex patterns and relationships in data due to its multiple hidden layers, allowing it to capture nonlinearities that SLPs cannot handle.

2. Non-linear Activation Functions: MLPs can use a variety of non-linear activation functions in hidden layers, enabling them to model complex mappings between inputs and outputs.

3. Universal Approximator: MLP with a single hidden layer containing many neurons can approximate any continuous function, making it more versatile than SLPs for a wide range of tasks.

4. Feature Representation: MLPs can automatically learn hierarchical representations of features through multiple layers, facilitating better feature extraction and representation learning.

5. Improved Performance: In practice, MLPs often achieve higher accuracy and better performance on various tasks compared to SLPs, especially for datasets with non-linear relationships between input features and target variables.

In summary, MLPs are more powerful and flexible than SLPs, capable of capturing complex patterns and relationships in data, leading to better performance across a wide range of tasks.

Code Implementation :

Let’s see this in action on a real-world dataset.

We will be using perception to classify various raisin varieties based on the physical characteristics of the raisin.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 900 entries, 0 to 899
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Area             900 non-null    int64  
 1   MajorAxisLength  900 non-null    float64
 2   MinorAxisLength  900 non-null    float64
 3   Eccentricity     900 non-null    float64
 4   ConvexArea       900 non-null    int64  
 5   Extent           900 non-null    float64
 6   Perimeter        900 non-null    float64
 7   Class            900 non-null    object 
dtypes: float64(5), int64(2), object(1)
memory usage: 56.4+ KB

Area               0
MajorAxisLength    0
MinorAxisLength    0
Eccentricity       0
ConvexArea         0
Extent             0
Perimeter          0
Class              0
dtype: int64

(900, 8)

1    450
0    450
Name: Class, dtype: int64

0.9055555555555556

               precision    recall  f1-score   support

           0       0.90      0.89      0.90        83
           1       0.91      0.92      0.91        97

    accuracy                           0.91       180
   macro avg       0.91      0.90      0.90       180
weighted avg       0.91      0.91      0.91       180

array([[74,  9],
       [ 8, 89]], dtype=int64)

Index(['Area', 'MajorAxisLength', 'MinorAxisLength', 'Eccentricity',
       'ConvexArea', 'Extent', 'Perimeter'],
      dtype='object')

(array([-2.]),
 array([[-4.51271857,  5.06436408,  1.31414625, -2.25418641, -1.70158398,
          0.01505407, -6.82419078]]))

We see that the ‘Perimeter’, and ‘MajorAxisLength’ have the highest contribution to the model.

Let’s try using it on Multilayer Perceptron, and compare the performance.

Train Accuracy: 0.8680555555555556
Test Accuracy: 0.9

Train Accuracy: 
               precision    recall  f1-score   support

           0       0.89      0.85      0.87       368
           1       0.85      0.89      0.87       352

    accuracy                           0.87       720
   macro avg       0.87      0.87      0.87       720
weighted avg       0.87      0.87      0.87       720

Test Accuracy: 
               precision    recall  f1-score   support

           0       0.93      0.84      0.88        82
           1       0.88      0.95      0.91        98

    accuracy                           0.90       180
   macro avg       0.90      0.90      0.90       180
weighted avg       0.90      0.90      0.90       180

Train Accuracy: 

 [[312  56]
 [ 39 313]]

Test Accuracy: 

 [[69 13]
 [ 5 93]]

Based on the results, we see that the multilayer perceptron is less accurate than the single-layer perceptron.

Additionally, it takes more time to train MLP when compared to the single later version.

We will discuss the limitations of the perceptron.

Advantages of MLP

1. Nonlinear Mapping: MLPs can approximate nonlinear functions, making them suitable for modelling complex relationships in data.

2. Feature Learning: Through hidden layers, MLPs can automatically learn hierarchical representations of features from raw input data.
3. Versatility: MLPs have demonstrated effectiveness across various domains, including image recognition, speech recognition, and time series prediction.
4. Scalability: With advancements in hardware and optimisation techniques, MLPs can scale to accommodate large datasets and complex architectures.

Weaknesses and Limitations:

1. Overfitting: MLPs are prone to overfitting, especially when trained on small datasets or with excessive model capacity.
2. Gradient Vanishing/Exploding: Deep MLPs may suffer from vanishing or exploding gradients during training, hindering convergence.
3. Hyperparameter Sensitivity: The performance of MLPs is sensitive to hyperparameters such as learning rate, batch size, and network architecture, requiring careful tuning.
4. Computationally Intensive: Training deep MLPs can be computationally expensive, especially without access to high-performance hardware or parallel processing capabilities.

Conclusion:

The Multi-Layer Perceptron (MLP) has played a pivotal role in shaping the machine learning landscape, offering powerful capabilities for modelling complex data. Despite their strengths, MLPs have limitations, and researchers continue to explore methods to mitigate these challenges. As deep learning evolves, MLPs remain a cornerstone model, driving innovations in artificial intelligence and advancing our understanding of neural computation.