"1 더하기 2는?" 부터 시작하기

사실 누구나 다 아는 "hello world"부터 시작하고 싶었지만, 기본(주로사용하는) 변수가 정수나 실수라고 생각하시면 됩니다. 그래서 "1 + 2"를 계산하는 코드를 만들도록 하겠습니다.



In [1]:

    
import torch

a = torch.Tensor([1])
b = torch.Tensor([2])

print(a+b)









    



 3
[torch.FloatTensor of size 1]

정말 간단합니다.

이제 1+2, 1+3, 1+4를 Tensor를 이용해서 계산하도록 하겠습니다.
[1, 1, 1]의 벡터를 [2, 3, 4]벡터와 합하면 됩니다. 답은 [3, 4, 5] 이겠죠



In [2]:

    
import torch

a = torch.Tensor([1, 1, 1])
b = torch.Tensor([2, 3, 4])

print(a+b)









    



 3
 4
 5
[torch.FloatTensor of size 3]

정말 간단합니다.

이제 y=x*x의 미분값을 구해보도록 하겠습니다.
y미분값은 dy/dx = 2x 입니다.x는 1이니 미분값은 2x, 즉 2입니다.
(표기의 용이성때문에, 여기서 d는 전미분이 아니라 편미분 기호입니다.)



In [3]:

    
import torch
from torch.autograd import Variable

x = Variable(torch.Tensor([1]), requires_grad = True)
y = x*x

y.backward()

print('x = ', x)
print('dy/dx', x.grad)









    



x =  Variable containing:
 1
[torch.FloatTensor of size 1]

dy/dx Variable containing:
 2
[torch.FloatTensor of size 1]

정말 간단합니다.

변수가 x1, x2 두개인 출력값인 y인 경우는 어떻게 될까요?
y= (x1^2) + (x2^2)의 미분값을 구해보도록 하겠습니다.
dy/dx1 = 2(x1)이 dy/dx2 = 2(x2)입니다. x1, x2 가 1, 2이면 각각 2와 4가 됩니다.



In [4]:

    
import torch
from torch.autograd import Variable

x1 = Variable(torch.Tensor([1]), requires_grad = True)
x2 = Variable(torch.Tensor([2]), requires_grad = True)

y = x1*x1 + x2*x2
y.backward()
print('dy/dx1', x1.grad)
print('dy/dx2', x2.grad)









    



dy/dx1 Variable containing:
 2
[torch.FloatTensor of size 1]

dy/dx2 Variable containing:
 4
[torch.FloatTensor of size 1]

정말 간단합니다.

위의 예제에서 x1, x2를 x인 Tensor 변수 하나로 표현하겠습니다.



In [5]:

    
import torch
from torch.autograd import Variable

x = Variable(torch.Tensor([1, 2]), requires_grad = True)
y = (x*x).sum()

y.backward()
print('dy/dx', x.grad)









    



dy/dx Variable containing:
 2
 4
[torch.FloatTensor of size 2]

여기서 주의 할 것은 backward를 할 z를 scalar 값으로 해주어야 한다는 것입니다. 만약 벡터인 y = x^2로 backward한다면 y.backward()함수에서 에러가 발생하게 됩니다.

만약 굳이 tensor이 y형태로 backward를 하고 싶은면, 다음같이 sum에 대한 y.grad를 이용하는 방법이 있습니다.
sum(y) = y1 + y2 이니깐 dz/dy = [d(sum)/dy1, d(sum)/dy2] = [1, 1] 입니다.



In [6]:

    
import torch
from torch.autograd import Variable

x = Variable(torch.Tensor([1, 2]), requires_grad = True)
y = (x*x)

y.backward(torch.ones(2))
print('dy/dx', x.grad)









    



dy/dx Variable containing:
 2
 4
[torch.FloatTensor of size 2]

session이 없는데 만약에 그래프를 계속 추가하면 어떻게 될까요?
y, y1, y2를 계속추가하면서 각변수를 backward시키는 것과
z = y + y1 + y2를 만들어서 z만을 backward시키는 것과 동일합니다.



In [7]:

    
import torch
from torch.autograd import Variable

x = Variable(torch.Tensor([1, 2]), requires_grad = True)
y = (x*x).sum()
y1 = (x*x).sum()

y.backward()
y1.backward()

print(x.grad)









    



Variable containing:
 4
 8
[torch.FloatTensor of size 2]



In [8]:

    
import torch
from torch.autograd import Variable

x = Variable(torch.Tensor([1, 2]), requires_grad = True)
y = (x*x).sum()
y1 = (x*x).sum()

z = y + y1
z.backward()

print(x.grad)









    



Variable containing:
 4
 8
[torch.FloatTensor of size 2]

이번에 위식을 약간 변형해보겠습니다. 각각 backward사용과 grad를 구현하는 방식의 차이점과 코딩구조의 차이를 이해하기만 하면 됩니다.



In [9]:

    
import torch
from torch.autograd import Variable

x = Variable(torch.Tensor([1, 2]), requires_grad = True)
y = (x*x).sum()
y1 = (x*x).sum()

y.backward(torch.Tensor([1]))
y1.backward(torch.Tensor([2]))

print(x.grad)









    



Variable containing:
  6
 12
[torch.FloatTensor of size 2]



In [10]:

    
import torch
from torch.autograd import Variable

x = Variable(torch.Tensor([1, 2]), requires_grad = True)
y = (x*x).sum()
y1 = (x*x).sum()

z = y + 2*y1
z.backward()

print(x.grad)









    



Variable containing:
  6
 12
[torch.FloatTensor of size 2]



In [11]:

    
import torch
from torch.autograd import Variable

x = Variable(torch.Tensor([1, 2]), requires_grad = True)
y = (x*x).sum()
y1 = (x*x).sum()

z = torch.cat((y, y1))
z.backward(torch.Tensor([1, 2]))

print(x.grad)









    



Variable containing:
  6
 12
[torch.FloatTensor of size 2]

어느 것이 좋다고 말하는 것보다 각각의 구조를 이해하고, 상황에 맞게 변형하여 썼으면 됩니다. 하지만 2번의 형태가 가장 직관적으로 코드를 파악할 수 있고 마지막의 최적화의 사용하는 Loss는 scalar형태이기 때문에 2번의 행태를 추천합니다.