harry's blog

1. Linear

torch.nn.Linear(in_features, out_features, bias=True)

in_features：输入数据的数据维度
out_features：输出数据的数据维度

空间线性映射，假设我们有一批数据$x$，$x$的维度为 20 维，这一批数据一共有 128 个，我们要将 20 维的$x$映射到 30 维空间的$y$中，下面是计算过程，其中$w$是Linear 函数的 weight 权重

$$
{
y = xW^{T}+b
}
$$

其中$x=\begin{pmatrix} x_{1,1} & x_{1,2} & … & x_{1,20} \ x_{21} & x_{22} & … & x_{2,20} \ … & … & … & … \ x_{128,1} & x_{128,2} & … & x_{128,20}\ \end{pmatrix}{128\times 20}$, $w=\begin{pmatrix} w{11} & w_{12} & … & w_{1,20} \ w_{21} & w_{22} & … & w_{2,20}\ … & … & … & … \ w_{30,1} & w_{30,2} & … & w_{30,20} \ \end{pmatrix}_{30\times 20}$

$$
\begin{pmatrix}
x_{11} & x_{12} & … & x_{1,20} \
x_{21} & x_{22} & … & x_{2,20} \
… & … & … & … \
x_{128,1} & x_{128,2} & … & x_{128,20} \
\end{pmatrix}{128\times 20}
\begin{pmatrix}
w{11} & w_{21} & … & w_{30,1} \
w_{12} & w_{22} & … & w_{30,2} \
… & … & … & … \
w_{1,20} & w_{2,20} & … & w_{30,20} \
\end{pmatrix}{20\times 30} =
\begin{pmatrix}
y{11} & y_{12} & … & y_{1,30} \
y_{12} & y_{22} & … & y_{2,30} \
… & … & … & … \
y_{128,1} & y_{128,2} & … & y_{128,30} \
\end{pmatrix}_{128\times 30}
$$

>>> x = torch.randn(128, 20)  # 输入的维度是（128，20）
>>> linear = torch.nn.Linear(20, 30)  # 20, 30是指维度
>>> output = linear(x)
torch.Size([128, 30])
>>> linear.weight.shape
torch.Size([30, 20])
>>> linear.bias.shape
torch.Size([30])
>>> ans = torch.mm(x, linear.weight.t()) + linear.bias
torch.Size([128, 30])
>>> torch.equal(ans, output)
True

2. Conv1d

Pytorch官方文档

torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')

in_channels: 输入词向量的维度
out_channels: 输出词向量的维度
kernel_size: 卷积核大小
stride：步长
padding: 是否进行填充，这里在词的两边分别填充padding的大小
groups: 这个就是分组卷积的意思，和二维卷积的一样

1. 第一个例子

下面直接看一组例子来理解，二维卷积一般用在图像中，一维卷积更多的用在NLP领域，也就是词向量中

>>> x = torch.randn(1, 8, 6)
>>> m = nn.Conv1d(8, 16, 3, padding=2)
>>> y = m(x)
torch.Size([1, 16, 8])

假设我们的词的个数为6，向量维度为8，即一句话一共有6个词，每个词被编码为8维度的向量。我们的1d卷积第一个参数为输入词向量的维度，即8；第二个为输出维度为16，即一共有16个卷积核；第三个参数为卷积核的大小为3，卷积核有两个维度，第二个维度由词向量维度来决定，因为词向量维度为8，所以卷积核大小为$3\times 8$；第四个参数为padding，如下图，这里padding为2，则向词的两边分别填充2

2. 第二个例子

下面看一下group参数的作用

>>> x = torch.randn(1, 8, 6)
>>> n = nn.Conv1d(8, 16, 3, padding=2, groups=2)
>>> z = n(x)
torch.Size([1, 16, 8])

这里我们仅仅加入一个groups=2，所得结果与上述结果相同，由之前的二维卷积可以分析出，group=2参数表明不对所有维度同时进行卷积，即这里的卷积核不是$3\times 8$，而是将维度分为两份，这样卷积核的大小就是$3\times 4$，相当于卷积核变小了。如下图所示，本来卷积得到的结果需要对所有维度进行操作，这里分成了两份，相当于把词向量的维度首先分成了两份，然后对两份分别操作，这样的好处可以节省参数，即一共需要$2\times 8\times 3\times 4$个参数，这里的第一个2表示分为两份，后面的8，3，4分别表示8个卷积核，每个卷积核的大小为$3\times 4$，而例一中需要$16\times 3\times 8$个参数，是之前的两倍

3. nn.init

nn.init初始化pytorch文档

3.1. uniform_

nn.init.uniform_(tensor, a=0, b=1)均匀分布

a,b: 均匀分布的下界和上界

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.l1 = nn.Linear(3, 3)
        self._init_parameters()  # 对模型中的参数初始化

    def _init_parameters(self):
        for p in self.parameters():
            nn.init.uniform_(p, a=0, b=1)

    def forward(self, x):
            return x

model = MyModel()
for param in model.parameters():
    print(param)

'''
Parameter containing:
tensor([[0.0213, 0.8163, 0.0422],
        [0.9847, 0.6568, 0.3481],
        [0.1649, 0.3403, 0.9780]], requires_grad=True)
Parameter containing:
tensor([0.7987, 0.0152, 0.0960], requires_grad=True)
'''

可以看到初始化的结果在0-1之间，当初始化的数据足够多时服从均匀分布，之后的初始化方法与之相同，就不在举例了

持续更新中…

4. Sigmoid

torch.nn.Sigmoid(input)

$$
Sigmoid(x) = \frac{1}{1+exp^{-x}}
$$

Sigmoid的函数图像如下所示

>>> x = torch.tensor([1., 2., 3.])
tensor([1., 2., 3.])
>>> m = torch.nn.Sigmoid()
>>> y = m(x)
tensor([0.7311, 0.8808, 0.9526])

5. torch.nn.AdaptiveAvgPool2d

AdaptiveAvgPool2d二维平均自适应池化，只需要给出输出的参数就可以自动寻找相应的 kernal size 以及 stride

torch.nn.AdaptiveAvgPool2d(output_size)

output_size：可以为 tuple 类型 (H, W)，也可以为一个数字 H 表示 (H, H)，H,W 可以为 int 或者 None 类型，如果是 None 默认与输入相同大小
输入：(N, C, H_in, W_in)or(C, H_in, W_in)
输出：(N, C, S_0, S_1)or(C, S_0, S_1),S = output_size

>>> input = torch.tensor([[1, 2, 3],
                          [4, 5, 6],
                          [7, 8, 9]], dtype=torch.float64)
>>> input = torch.unsqueeze(input, 0)
tensor([[[1., 2., 3.],
         [4., 5., 6.],
         [7., 8., 9.]]], dtype=torch.float64)
>>> m = nn.AdaptiveAvgPool2d((2,2))
>>> output = m(input)
tensor([[[3., 4.],
         [6., 7.]]], dtype=torch.float64)

# 将AdaptiveAvgPool2d((2,2))换成AdaptiveAvgPool2d(2)
# 输出依然为(2, 2)维度，不变
>>> p = nn.AdaptiveAvgPool2d(2)
>>> output = p(input)
tensor([[[3., 4.],
         [6., 7.]]], dtype=torch.float64)

# 其中一维改为None，这一维与输入相同
>>> q = nn.AdaptiveAvgPool2d((None, 2))
>>> output = q(input)
tensor([[[1.5000, 2.5000],
         [4.5000, 5.5000],
         [7.5000, 8.5000]]], dtype=torch.float64)

下面是第一个程序的执行过程，值与后面两个执行过程，我猜测可能 kernal size 并不是一个正方形，而是随着输出调整为矩形，步长依赖输出和核大小而定

本文由 Yonghui Wang 创作，采用知识共享署名4.0 国际许可协议进行许可
本站文章除注明转载/出处外，均为本站原创或翻译，转载前请务必署名
最后编辑时间为: Dec 19, 2024 12:13 pm

torch.nn 向量操作