# 【一起入门DeepLearning】中科院深度学习_期末复习题2018-2019第二题：求梯度

2018-2019 学年第二学期期末试题

∂ J ∂ w 2 = ∂ J ∂ y ^ ∂ y ^ ∂ h 2 ∂ h 2 ∂ z 2 ∂ z 2 ∂ w 2 = ( y ^ ∗ ∑ j y j − y i ) ∗ Relu ⁡ ( z 2 ) ∗ h 1 \frac{\partial J}{\partial w_{2}}=\frac{\partial J}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial h_{2}} \frac{\partial h_{2}}{\partial z_{2}} \frac{\partial z_{2}}{\partial w_{2}}=\left(\hat{y} * \sum_{j} y_{j}-y_{i}\right) * \operatorname{Relu}\left(z_{2}\right) * h_{1}
∂ J ∂ b 2 = ∂ J ∂ y ^ ∂ y ^ ∂ h 2 ∂ h 2 ∂ z 2 ∂ z 2 ∂ b 2 = ( y ^ ∗ ∑ j y j − y i ) ∗ Relu ⁡ ′ ( z 2 ) \frac{\partial J}{\partial b_{2}}=\frac{\partial J}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial h_{2}} \frac{\partial h_{2}}{\partial z_{2}} \frac{\partial z_{2}}{\partial b_{2}}=\left(\hat{y} * \sum_{j} y_{j}-y_{i}\right) * \operatorname{Relu}^{\prime}\left(z_{2}\right)
∂ J ∂ w 1 = ∂ J ∂ h 2 ∂ h 2 ∂ h 1 ∂ h 1 ∂ z 1 ∂ z 1 ∂ w 1 = ( y ^ ∗ ∑ j y j − y i ) ∗ Relu ⁡ ( z 2 ) ∗ w 1 ∗ Relu ⁡ ( z 1 ) ∗ x \frac{\partial J}{\partial w_{1}}=\frac{\partial J}{\partial h_{2}} \frac{\partial h_{2}}{\partial h_{1}} \frac{\partial h_{1}}{\partial z_{1}} \frac{\partial z_{1}}{\partial w_{1}}=\left(\hat{y} * \sum_{j} y_{j}-y_{i}\right) * \operatorname{Relu}\left(z_{2}\right) * w_{1} * \operatorname{Relu}\left(z_{1}\right) * x
∂ J ∂ b 1 = ∂ J ∂ h 2 ∂ h 2 ∂ h 1 ∂ h 1 ∂ z 1 ∂ z 1 ∂ b 1 = ( y ^ ∗ ∑ j y j − y i ) ∗ Relu ⁡ ( z 2 ) ∗ w 1 ∗ Rel ⁡ u ′ ( z 1 ) \frac{\partial J}{\partial b_{1}}=\frac{\partial J}{\partial h_{2}} \frac{\partial h_{2}}{\partial h_{1}} \frac{\partial h_{1}}{\partial z_{1}} \frac{\partial z_{1}}{\partial b_{1}}=\left(\hat{y} * \sum_{j} y_{j}-y_{i}\right) * \operatorname{Relu}\left(z_{2}\right) * w_{1} * \operatorname{Rel} u^{\prime}\left(z_{1}\right)

1. 将各层的计算公式按标量形式展开，下标中带括号的表示在矩阵或向量中对应的位置：
z 1 ( j ) = ∑ m = 1 d W 1 ( j m ) x ( m ) + b 1 ( j ) z_{1(j)}=\sum_{m=1}^{d} W_{1(j m)} x_{(m)}+b_{1(j)}
h 1 ( j ) = Re ⁡ L U ( z 1 ( j ) ) h_{1(j)}=\operatorname{Re} L U\left(z_{1(j)}\right)
z 2 ( i ) = ∑ n = 1 h W 2 ( i n ) h 1 ( n ) + b 2 ( i ) z_{2(i)}=\sum_{n=1}^{h} W_{2(i n)} h_{1(n)}+b_{2(i)}
h 2 ( i ) = Re ⁡ L U ( z 2 ( i ) ) h_{2(i)}=\operatorname{Re} L U\left(z_{2(i)}\right)
y ( i ) = soft ⁡ max ⁡ ( h 2 ( i ) ) = e h 2 ( i ) ∑ g e h 2 ( i ) ) y_{(i)}=\operatorname{soft} \max \left(h_{2(i)}\right)=\frac{e^{h_{2(i)}}}{\sum^{g}} e^{\left.h_{2(i)}\right)}
J = C E ( y , y ^ ) = − ∑ i ′ = 1 g y ( i ′ ) log ⁡ ( y ^ ( i ′ ) ) J=C E(y, \hat{y})=-\sum_{i^{\prime}=1}^{g} y_{\left(i^{\prime}\right)} \log \left(\hat{y}_{\left(i^{\prime}\right)}\right)

2. 计算交叉熵的导数：
∂ J ∂ ( y ^ ( i ′ ) ) = − y ( i ′ ) y ^ ( i ′ ) \frac{\partial J}{\partial\left(\hat{y}_{\left(i^{\prime}\right)}\right)}=-\frac{y_{\left(i^{\prime}\right)}}{\hat{y}_{\left(i^{\prime}\right)}}

3. Softmax求导：
k = i \mathrm{k}=\mathrm{i}
∂ y ^ ( i ) ∂ h 2 ( k ) = ∂ y ^ ( i ) ∂ h 2 ( i ) = e h 2 ( i ) ∑ i ′ = 1 g e h 2 ( i ) − ( e h 2 ( i ) ) 2 ( ∑ i ′ = 1 g e h 2 ( i ) ) 2 = y ^ ( i ) ( 1 − y ^ ( i ) ) \frac{\partial \hat{y}_{(i)}}{\partial h_{2(k)}}=\frac{\partial \hat{y}_{(i)}}{\partial h_{2(i)}}=\frac{e^{h_{2(i)}} \sum_{i^{\prime}=1}^{g} e^{h_{2(i)}}-\left(e^{h_{2(i)}}\right)^{2}}{\left(\sum_{i^{\prime}=1}^{g} e^{h_{2(i)}}\right)^{2}}=\hat{y}_{(i)}\left(1-\hat{y}_{(i)}\right)
k ≠ i k \neq i
∂ y ^ ( i ) ∂ h 2 ( k ) = − e h 2 ( i ) e h 2 ( k ) ( ∑ i ′ = 1 g e h 2 ( i ) ) 2 = − y ^ ( i ) y ^ ( k ) \frac{\partial \hat{y}_{(i)}}{\partial h_{2(k)}}=-\frac{e^{h_{2(i)}} e^{h_{2(k)}}}{\left(\sum_{i^{\prime}=1}^{g} e^{h_{2(i)}}\right)^{2}}=-\hat{y}_{(i)} \hat{y}_{(k)}
4. ReLU求导

∂ ReLU ⁡ ( x ) ∂ x = u ( x ) \frac{\partial \operatorname{ReLU}(x)}{\partial x}=u(x) , 其中 u ( x ) u(x) 为阶跃函数

(1)
∂ J ∂ W 2 ( i n ) = ∑ i ′ = 1 g ∂ J ∂ y ^ ( i i ) ∂ y ^ ( i ′ ) ∂ h 2 ( i ) ∂ h 2 ( i ) ∂ z 2 ( i ) ∂ z 2 ( i ) ∂ W 2 ( i n ) = − [ y ( i ) y ^ ( i ) y ^ ( i ) ( 1 − y ^ ( i ) ) − ∑ k ≠ i y ( k ) y ^ ( k ) y ^ ( i ) y ^ ( k ) ] u ( z 2 ( i ) ) h 1 ( n ) = u ( z 2 ( i ) ) h 1 ( n ) [ − y ( i ) ( 1 − y ^ ( i ) ) + ∑ k ≠ i y ( k ) y ^ ( i ) ] = u ( z 2 ( i ) ) h 1 ( n ) [ − y ( i ) + ∑ k = 1 g y ( k ) y ^ ( i ) ] = u ( z 2 ( i ) ) h 1 ( n ) ( y ^ ( i ) − y ( i ) )  同理可知  ∂ J ∂ b 2 ( i ) = u ( z 2 ( i ) ) ( y ^ ( i ) − y ( i ) ) \begin{aligned} &\frac{\partial J}{\partial W_{2(i n)}}=\sum_{i^{\prime}=1}^{g} \frac{\partial J}{\partial \hat{y}_{\left(i^{i}\right)}} \frac{\partial \hat{y}_{\left(i^{\prime}\right)}}{\partial h_{2(i)}} \frac{\partial h_{2(i)}}{\partial z_{2(i)}} \frac{\partial z_{2(i)}}{\partial W_{2(i n)}} \\ &=-\left[\frac{y_{(i)}}{\hat{y}_{(i)}} \hat{y}_{(i)}\left(1-\hat{y}_{(i)}\right)-\sum_{k \neq i} \frac{y_{(k)}}{\hat{y}_{(k)}} \hat{y}_{(i)} \hat{y}_{(k)}\right] u\left(z_{2(i)}\right) h_{1(n)} \\ &=u\left(z_{2(i)}\right) h_{1(n)}\left[-y_{(i)}\left(1-\hat{y}_{(i)}\right)+\sum_{k \neq i} y_{(k)} \hat{y}_{(i)}\right] \\ &=u\left(z_{2(i)}\right) h_{1(n)}\left[-y_{(i)}+\sum_{k=1}^{g} y_{(k)} \hat{y}_{(i)}\right] \\ &=u\left(z_{2(i)}\right) h_{1(n)}\left(\hat{y}_{(i)}-y_{(i)}\right) \\ &\text { 同理可知 } \\ &\frac{\partial J}{\partial b_{2(i)}}=u\left(z_{2(i)}\right)\left(\hat{y}_{(i)}-y_{(i)}\right) \end{aligned}
(2)
∂ J ∂ w 1 ( j m ) = ∑ i ′ = 1 g ∂ J ∂ y ^ ( i ′ ) ∂ y ^ ( i ′ ) ∂ h 2 ( i ) ∂ h 2 ( i ) ∂ z 2 ( i ) ∂ z 2 ( i ) ∂ w 1 ( j m ) = ∑ i ′ = 1 g u ( z 2 ( i ′ ) ) ( y ^ ( i ′ ) − y ( i ′ ) ) W 2 ( i ′ j ) u ( z 1 ( j ) ) ∑ m = 1 d x ( m ) = u ( z 1 ( j ) ) ∑ m = 1 d x ( m ) ∑ i ′ = 1 g u ( z 2 ( i ′ ) ) ( y ^ ( i ′ ) − y ( i ′ ) ) W 2 ( i ′ j ) \begin{aligned} &\frac{\partial J}{\partial w_{1(j m)}}=\sum_{i^{\prime}=1}^{g} \frac{\partial J}{\partial \hat{y}_{\left(i^{\prime}\right)}} \frac{\partial \hat{y}_{\left(i^{\prime}\right)}}{\partial h_{2(i)}} \frac{\partial h_{2(i)}}{\partial z_{2(i)}} \frac{\partial z_{2(i)}}{\partial w_{1(j m)}} \\ &=\sum_{i^{\prime}=1}^{g} u\left(z_{2\left(i^{\prime}\right)}\right)\left(\hat{y}_{\left(i^{\prime}\right)}-y_{\left(i^{\prime}\right)}\right) W_{2\left(i^{\prime} j\right)} u\left(z_{1(j)}\right) \sum_{m=1}^{d} x_{(m)} \\ &=u\left(z_{1(j)}\right) \sum_{m=1}^{d} x_{(m)} \sum_{i^{\prime}=1}^{g} u\left(z_{2\left(i^{\prime}\right)}\right)\left(\hat{y}_{\left(i^{\prime}\right)}-y_{\left(i^{\prime}\right)}\right) W_{2\left(i^{\prime} j\right)} \end{aligned}

∂ J ∂ b 1 ( j ) = u ( z 1 ( j ) ) ∑ i ′ = 1 g u ( z 2 ( i ′ ) ) ( y ^ ( i ′ ) − y ( i ′ ) ) W 2 ( i ′ j ) \frac{\partial J}{\partial b_{1(j)}}=u\left(z_{1(j)}\right) \sum_{i^{\prime}=1}^{g} u\left(z_{2\left(i^{\prime}\right)}\right)\left(\hat{y}_{\left(i^{\prime}\right)}-y_{\left(i^{\prime}\right)}\right) W_{2\left(i^{\prime} j\right)}