Mlp

June 9, 2020

MLP(Multi-layer Perceptron)

什么是凸优化（covex optimization）：
- 凸优化指的是损失函数是凸函数的优化（是吗？？）凸函数指的是这个函数的local maximum就是其global maximum，所以凸优化对一开始的条件不是很在乎，反正不管从哪个点开始，凸优化都可以沿着梯度的方向找到全局最优
- 但是随机梯度下降，处理损失函数往往就不是凸函数，therefore the initial parameters is actually quite important to neural network. It has direct relationship to the performance of the gradient based algorithm.
我乱赌一下cost function不会考，但是大体了解一下对于什么样子的数据，并且还有关于交叉熵的推导，可以这么说，几乎所有的损失函数，都是用最大似然角度推导而来的，所以最大似然是一个原则，而不是一个损失函数：
Hidden Unit:
- 线性整流函数在0的时候不可以导，not differentiable，那么为什么他依然可以作为hidden unit的激活函数呢？
  - ReLU is differentiable at all the point except 0. the left derivative at z = 0 is 0 and the right derivative is 1. This may seem like g is not eligible for use in gradient based optimization algorithm. But in practice, gradient descent still performs well enough for these models to be used for machine learning tasks. This is in part because neural network training algorithms do not usually arrive at a local minimum of the cost function. Hence it is acceptable for the minima of the cost function to correspond to points with undefined gradient.(一般达不到)
  - 在实际的应用中，可以通过别的办法来保护这个算法，另其在0的时候也能返还一个值，不至于造成error
  - 读完这个就稳了
- relu的拓展
  - absolute value rectification
  - leaky relu
  - parametric relu/PRelu
- softmax函数和sigmoid函数的区别：都可以用作输出函数
什么是universal approximation properties and depth
什么是雅可比矩阵和hessian矩阵
bp算法在计算机中是怎么实现的：dp！！！