Classification

Class MLPClassifier implements a multi-layer perceptron (MLP) algorithm that trains using Backpropagation.

MLP trains on two arrays: array X of size (n_samples, n_features), which holds the training samples represented as floating point feature vectors; and array y of size (n_samples,), which holds the target values (class labels) for the training samples:

>>>

>>> from sklearn.neural_network import MLPClassifier
>>> X = [[0., 0.], [1., 1.]]
>>> y = [0, 1]
>>> clf = MLPClassifier(solver='lbfgs', alpha=1e-5,
...                     hidden_layer_sizes=(5, 2), random_state=1)
...
>>> clf.fit(X, y)                         
MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False,
       epsilon=1e-08, hidden_layer_sizes=(5, 2), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
       solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)

After fitting (training), the model can predict labels for new samples:

>>>

>>> clf.predict([[2., 2.], [-1., -2.]])
array([1, 0])

MLP can fit a non-linear model to the training data. clf.coefs_ contains the weight matrices that constitute the model parameters:

>>>

>>> [coef.shape for coef in clf.coefs_]
[(2, 5), (5, 2), (2, 1)]

Currently, MLPClassifier supports only the Cross-Entropy loss function, which allows probability estimates by running the predict_proba method.

MLP trains using Backpropagation. More precisely, it trains using some form of gradient descent and the gradients are calculated using Backpropagation. For classification, it minimizes the Cross-Entropy loss function, giving a vector of probability estimates P(y|x) per sample x:

>>>

>>> clf.predict_proba([[2., 2.], [1., 2.]])  
array([[  1.967...e-04,   9.998...-01],
       [  1.967...e-04,   9.998...-01]])

MLPClassifier supports multi-class classification by applying Softmax as the output function.

Further, the model supports multi-label classification in which a sample can belong to more than one class. For each class, the raw output passes through the logistic function. Values larger or equal to 0.5 are rounded to 1, otherwise to 0. For a predicted output of a sample, the indices where the value is 1 represents the assigned classes of that sample:

>>>

>>> X = [[0., 0.], [1., 1.]]
>>> y = [[0, 1], [1, 1]]
>>> clf = MLPClassifier(solver='lbfgs', alpha=1e-5,
...                     hidden_layer_sizes=(15,), random_state=1)
...
>>> clf.fit(X, y)                         
MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False,
       epsilon=1e-08, hidden_layer_sizes=(15,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
       solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)
>>> clf.predict([[1., 2.]])
array([[1, 1]])
>>> clf.predict([[0., 0.]])
array([[0, 1]])

Multi-layer Perceptron

Multi-layer Perceptron (MLP) is a supervised learning algorithm that learns a function f(\cdot): R^m \rightarrow R^o by training on a dataset, where m is the number of dimensions for input and o is the number of dimensions for output. Given a set of features X = {x_1, x_2, ..., x_m} and a target y, it can learn a non-linear function approximator for either classification or regression. It is different from logistic regression, in that between the input and the output layer, there can be one or more non-linear layers, called hidden layers. Figure 1 shows a one hidden layer MLP with scalar output.

../_images/multilayerperceptron_network.png

Figure 1 : One hidden layer MLP.

The leftmost layer, known as the input layer, consists of a set of neurons \{x_i | x_1, x_2, ..., x_m\}representing the input features. Each neuron in the hidden layer transforms the values from the previous layer with a weighted linear summation w_1x_1 + w_2x_2 + ... + w_mx_m, followed by a non-linear activation function g(\cdot):R \rightarrow R – like the hyperbolic tan function. The output layer receives the values from the last hidden layer and transforms them into output values.

The module contains the public attributes coefs_ and intercepts_. coefs_ is a list of weight matrices, where weight matrix at index i represents the weights between layer i and layer i+1. intercepts_ is a list of bias vectors, where the vector at index i represents the bias values added to layer i+1.

The advantages of Multi-layer Perceptron are:

  • Capability to learn non-linear models.
  • Capability to learn models in real-time (on-line learning) using partial_fit.

The disadvantages of Multi-layer Perceptron (MLP) include:

  • MLP with hidden layers have a non-convex loss function where there exists more than one local minimum. Therefore different random weight initializations can lead to different validation accuracy.
  • MLP requires tuning a number of hyperparameters such as the number of hidden neurons, layers, and iterations.
  • MLP is sensitive to feature scaling.

神经元

神经元示意图:

Ncell.png
  • a1~an为输入向量的各个分量
  • w1~wn为神经元各个突触的权值
  • b为偏置
  • f为传递函数,通常为非线性函数。一般有traingd(),tansig(),hardlim()。以下默认为hardlim()
  • t为神经元输出

数学表示 {\displaystyle t=f({\vec {W'}}{\vec {A}}+b)}

  • {\vec {W}}为权向量 ,{\displaystyle {\vec {W'}}}{\vec {W}}的转置
  • {\vec {A}}为输入向量
  • b为偏置
  • f为传递函数

可见,一个神经元的功能是求得输入向量与权向量的内积后,经一个非线性传递函数得到一个标量结果。

单个神经元的作用:把一个n维向量空间用一个超平面分区成两部分(称之为判断边界),给定一个输入向量,神经元可以判断出这个向量位于超平面的哪一边。

该超平面的方程:{\displaystyle {\vec {W'}}{\vec {p}}+b=0}

  • {\vec {W}}权向量
  • b偏置
  • {\vec {p}}超平面上的向量

人工神经网络

人工神经网络英文:artificial neural network,缩写ANN),简称神经网络英文:neural network,缩写NN)

现代神经网络是一种非线性统计性数据建模工具。典型的神经网络具有以下三个部分:

  • 结构Architecture) 结构指定了网络中的变量和它们的拓扑关系。例如,神经网络中的变量可以是神经元连接的权重(weights)和神经元的激励值(activities of the neurons)。
  • 激励函数(Activity Rule) 大部分神经网络模型具有一个短时间尺度的动力学规则,来定义神经元如何根据其他神经元的活动来改变自己的激励值。一般激励函数依赖于网络中的权重(即该网络的参数)。
  • 学习规则(Learning Rule)学习规则指定了网络中的权重如何随着时间推进而调整。这一般被看做是一种长时间尺度的动力学规则。一般情况下,学习规则依赖于神经元的激励值。它也可能依赖于监督者提供的目标值和当前权重的值。