[ ML ] 모두를 위한 TensorFlow (3) Gradient descent algorithm 기본

peter_yun

7 min readFeb 17, 2017

본 글은 홍콩과기대 김성훈 교수님의 강의를 바탕으로 함
(참고 : https://hunkim.github.io/ml/ )

Gradient descent algorithm

cost함수 그래프를 그리면 위와 같은 2차 포물선이 그려짐
Gradient(경사) descent(내려감) 이라는 뜻
Gradient Descent Algorithm
- 최소화 문제의 경우에 많이 사용
- cost(w1, w2, w3 … ) 등 다양한 변수가 있는 경우에도 활용

알파는 learning rate 이라고 부름 (상수)
알파에 cost함수를 미분한 것(기울기)를 곱함
기울기가 양의 값이면 W를 감소시키고
기울기가 음의 값이면 W를 증가시키는 결과를 가져옴

cost function이 위와 같은 형태이면 시작점에 따라 minimize하는 부분이 달라짐. 즉 알고리즘이 제대로 작동하지 않음

Convex function은 위와 같은 그래프 형태를 말함
Gradient Descent 알고리즘은 cost function이 Convex function일 때는 항상 만족

Minimizing Cost

이제 직접 코드를 확인토록 하겠습니다. 우선 코드 전문입니다.

matplotlib 설치하기

코드를 보기에 앞서 matplotlib 의 윈도우 설치법은 아래와 같습니다.

http://matplotlib.org/users/installing.html

위 공식사이트에서 운영체제별 설치방법을 확인할 수 있습니다. 윈도우는 아래의 명령어를 powershell에 입력하여 설치합니다.

python -m pip install -U pip setuptools
python -m pip install matplotlib

코드 살펴보기

import tensorflow as tf
import matplotlib.pyplot as pltX = [1.,2.,3.]
Y = [1.,2.,3.]
m = n_samples = len(X)# Set model weights
W = tf.placeholder(tf.float32)# Construct a linear model
hypothesis = tf.mul(X, W)# Cost function
cost = tf.reduce_sum(tf.pow(hypothesis-Y, 2))/(m)# initializing the variables
init = tf.initialize_all_variables()

위 코드는 ‘cost = (예측치-실제치)² / m’ 공식을 코드로 작성한 것입니다. hypothesis는 W*X로 단순화했습니다.

# For graph
W_val = []
cost_val = []

아래 for문이 돌면서 값을 추가할 리스트입니다. W값과 cost값을 기록합니다. 이는 나중에 그래프를 그리기 위한 데이터로 활용됩니다.

# Launch the graph
sess = tf.Session()
sess.run(init)
for i in range(-30, 50):
    print( i*0.1, sess.run(cost, feed_dict={W: i*0.1})) 
    
    #그래프를 위한 데이터 추가
    W_val.append( i*0.1) 
    cost_val.append(sess.run(cost, feed_dict={W: i*0.1}))# Graphic display
plt.plot(W_val, cost_val, 'ro')
plt.ylabel('Cost')
plt.xlabel('W')
plt.show()

결과적으로 아래와 같은 convex function 그래프가 그려집니다.

W가 1일 때 cost function이 최저
이를 convex function이라고 부름
gradient descent 알고리즘을 적용하기에 적합함을 알 수 있음

Gradient Descent algorithm (code)

cost 알고리즘의 모양을 살펴보았으니 이제 알고리즘을 직접 코드로 구현해보겠습니다.

소스코드

설명

import tensorflow as tfx_data = [1., 2., 3.]
y_data = [1., 2., 3.]# W에 -10부터 10까지 랜덤한 값을 부여,
# tf.random_uniform[1] : 랜덤한 값 '1개' 생성
# 랜덤 값 5개 생성은 tf.random_uniform[5]
W = tf.Variable(tf.random_uniform([1],-10.0, 10.0))X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)# 우리의 가정, 절편없이 단순화함
hyphothesis = W * X# cost function
cost = tf.reduce_mean(tf.square(hyphothesis - Y))

초기 세팅을 합니다.

# minimize
# Gradient Descent algorithm 부분
# learning rate는 0.1
descent = W - tf.mul( 0.1, tf.reduce_mean(tf.mul( (tf.mul(W,X)-Y), X ) ))# 이것은 단지 operation!
#= W.assign(descent)
#update = W.assign(descent)

이 부분은 수식을 그대로 적어놓았기 때문에 gradient descent algoritm 수식을 먼저 이해하는 것이 선행되어야 합니다.

# 변수 초기화
init = tf.initialize_all_variables()# Launch the graph
sess = tf.Session()
sess.run(init)# Fit the line
for step in range(20):    sess.run(W.assign(descent), feed_dict={X:x_data, Y:y_data})
    print( step, sess.run(cost, feed_dict={X:x_data, Y:y_data}), sess.run(W))

이제 위와 같이 출력하면 step-cost-W의 변화추이를 볼 수 있습니다.