最小二乘法矩阵求导
statistic
本文字数:998 字 | 阅读时长 ≈ 5 min

最小二乘法矩阵求导

statistic
本文字数:998 字 | 阅读时长 ≈ 5 min

1. 求导法则

本文采用矩阵求导中的分母布局,即:分子横向,分母纵向

2. 两个常用例子

例子 1

$f(x) = A^{T}X$, 其中$A^{T} = \begin{pmatrix} a_{1}, & a_{2}, & … & a_{n} \end{pmatrix}$, $X^{T} = \begin{pmatrix} x_{1}, & x_{2}, & … & x_{n} \end{pmatrix}$

$$
\begin{aligned}
sol:~~f(x) =A^{T}X =
\sum_{i=1}^{n}a_{i}x_{i} \
\frac{df(x)}{dx} = \begin{pmatrix} \frac{df(x)}{dx_{1}} \ \frac{df(x)}{dx_{2}}\ … \ \frac{df(x)}{dx_{n}} \end{pmatrix}
= \begin{pmatrix} a_{1} \ a_{2} \ … \ a_{n} \end{pmatrix}
= A
\end{aligned}
$$

所以:$\frac{dA^{T}X}{dx} = \frac{dX^{T}A}{dx} = A$

例子2

$f(x) = X{T}AX$,其中$X{T} = \begin{pmatrix} x_{1}, & x_{2}, & … & x_{n} \end{pmatrix}$,$A = \begin{pmatrix} a_{11} & a_{12} & … & a_{1n} \ a_{21} & a_{22} & … & a_{2n} \ … & … & … & … \ a_{n1} & a_{n2} & … & a_{nn} \end{pmatrix}$

$$
sol:~f(x) = X^{T}AX =
\begin{pmatrix} x_{1}, & x_{2}, & … & x_{n} \end{pmatrix}
\begin{pmatrix} a_{11} & a_{12} & … & a_{1n} \ a_{21} & a_{22} & … & a_{2n} \ … & … & … & … \ a_{n1} & a_{n2} & … & a_{nn} \end{pmatrix}
\begin{pmatrix} x_{1} \ x_{2} \ … \ x_{n} \end{pmatrix}
= \sum_{i=1}{n}\sum_{j=1}{n}a_{ij}x_{i}x_{j}
$$

化简得

$$
\begin{aligned}
\frac{df(x)}{dx} =
\begin{pmatrix} \frac{df(x)}{dx_{1}} \ \frac{df(x)}{dx_{2}}\ … \ \frac{df(x)}{dx_{n}} \end{pmatrix}
= \begin{pmatrix} \sum_{j=1}{n}a_{1j}x_{j}+\sum_{i=1}{n}a_{i1}x_{i} \ \sum_{j=1}{n}a_{2j}x_{j}+\sum_{i=1}{n}a_{i2}x_{i} \ … \ \sum_{j=1}{n}a_{nj}x_{j}+\sum_{i=1}{n}a_{in}x_{i} \end{pmatrix} \
~\
= \begin{pmatrix} \sum_{j=1}^{n}a_{1j}x_{j} \ \sum_{j=1}^{n}a_{2j}x_{j} \ … \ \sum_{j=1}^{n}a_{nj}x_{j} \end{pmatrix} +
\begin{pmatrix} \sum_{i=1}^{n}a_{i1}x_{i} \ \sum_{i=1}^{n}a_{i2}x_{i} \ … \ \sum_{i=1}^{n}a_{in}x_{i} \end{pmatrix}
= AX+A^{T}X
\end{aligned}
$$

所以:$\frac{dX^{T}AX}{dx} = AX+A^{T}X$

从上面两个例子中可以得到两个结论:

  • $\frac{dA^{T}X}{dx} = \frac{dX^{T}A}{dx} = A$
  • $\frac{dX^{T}AX}{dx} = AX+A^{T}X$

接下来我们会用到上面的两个结论

3. 最小二乘法

3.1 没有加权的回归

各个参数形式如下:

$$
Y = \begin{pmatrix} y_{1} \ y_{2} \ … \ y_{n} \end{pmatrix}{n\times 1}~~~
X = \begin{pmatrix} x
{1}^{T} \ x_{2}^{T} \ … \ x_{n}^{T} \end{pmatrix}{n\times p}~~~
w = \begin{pmatrix} w
{1} \ w_{2} \ … \ w_{n} \end{pmatrix}_{p\times 1}
$$

将最小二乘表示成矩阵相乘的形式

$$
\begin{aligned}
L(w) & = \sum_{i=1}{n}(y_{i}-x_{i}{T}w)^{2} \
& = ||Y-Xw||^{2} \
& = (Y-Xw)^{T}(Y-Xw) \
& = (Y{T}-w{T}X^{T})(Y-Xw) \
& = (Y{T}Y-Y{T}Xw-w{T}X{T}Y+w{T}X{T}Xw)
\end{aligned}
$$

对上述形式的矩阵求导得到最终的结果

$$
\begin{aligned}
\frac{L(w)}{dw} & = \frac{d(Y^{T}Y)}{dw} - \frac{d(Y^{T}Xw)}{dw} - \frac{d(w{T}X{T}Y)}{dw} + \frac{d(w{T}X{T}Xw)}{dw} \ ~~ \
& = 0 - X^{T}Y - X^{T}Y + 2X^{T}Xw \
& = 0
\end{aligned}
$$

整理得:<>$-X{T}Y-X{T}Y+2X{T}Xw=0,w{}=(X{T}X){-1}X^{T}Y$</>, 将$w^{}$带入原式

$$
\begin{aligned}
Xw^{*} = X (X{T}X){-1}X^{T}Y = \hat Y = \hat H Y \
\hat H = X (X{T}X){-1}X^{T}
\end{aligned}
$$

3.2 加权回归

各个参数形式与没有加权的回归一致
将最小二乘表示成矩阵相乘的形式

$$
\begin{aligned}
L(w) & = \sum_{i=1}{n}r_{i}(y_{i}-x_{i}{T}w)^{2} \
& = r||Y-Xw||^{2} \
& = (Y-Xw)^{T}r(Y-Xw) \
& = (Y{T}-w{T}X^{T})r(Y-Xw) \
& = (Y{T}rY-Y{T}rXw-w{T}X{T}rY+w{T}X{T}rXw)
\end{aligned}
$$

对上述形式的矩阵求导得到最终的结果

$$
\begin{aligned}
\frac{L(w)}{dw} & = \frac{d(Y^{T}rY)}{dw} - \frac{d(Y^{T}rXw)}{dw} - \frac{d(w{T}X{T}rY)}{dw} + \frac{d(w{T}X{T}rXw)}{dw} \ ~~ \
& = 0 - X^{T}rY - X^{T}rY + 2X^{T}rXw \
& = 0
\end{aligned}
$$

整理得:<>$-X^{T}rY - X^{T}rY + 2X^{T}rXw = 0, w^{} = (X{T}rX){-1}X{T}rY$</>,将$w{}$带入原式

$$
\begin{aligned}
Xw^{*} = X(X{T}rX){-1}X^{T}rY = \hat Y = \hat H Y \
\hat H = X(X{T}rX){-1}X^{T}r
\end{aligned}
$$

4月 06, 2025
3月 10, 2025
12月 31, 2024