[Study] Machine Learning โ€” Basic

Doyunโ€™s Journey
Doyunโ€™s Lab
Published in
10 min readOct 23, 2020

์ธ๊ณต์ง€๋Šฅ != ๋จธ์‹ ๋Ÿฌ๋‹ != ๋”ฅ๋Ÿฌ๋‹

์ธ๊ณต์ง€๋Šฅ โŠƒ ๋จธ์‹ ๋Ÿฌ๋‹ โŠƒ ๋”ฅ๋Ÿฌ๋‹

์ธ๊ณต์ง€๋Šฅ๊ณผ ๋น…๋ฐ์ดํ„ฐ : ๋น…๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์˜ ๋งˆ์ง€๋ง‰ ๋ถ„์„ ๋‹จ๊ณ„์—์„œ ์ธ๊ณต์ง€๋Šฅ ๊ธฐ์ˆ  ์‚ฌ์šฉ ๊ฐ€๋Šฅ

โ€‹

โ€‹๐–ฃ๐—‚๐–ฟ๐–ฟ๐–พ๐—‹๐–พ๐—‡๐— ๐—„๐—‚๐—‡๐–ฝ๐—Œ ๐—ˆ๐–ฟ ๐–ซ๐–พ๐–บ๐—‹๐—‡๐—‚๐—‡๐—€

โ˜† 4๊ฐ€์ง€ ๋ฐฉ๋ฒ•๊ณผ ๊ฐ๊ฐ์˜ ์ฐจ์ด์ 

ยท Supervised Learning

ยท Unsupervised Learning

ยท Semi-supervised Learning

ยท Reinforcement Learning

โ€‹

  • Supervised Learning

- ํ•™์Šต์„ ์œ„ํ•ด โ€˜์ •๋‹ตโ€™์ด ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ ํ•„์š”

- ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๋ฅผ โ€˜๋ถ„๋ฅ˜โ€™ํ•˜๋Š” ์ž‘์—…์— ์ฃผ๋กœ ์ด์šฉ (Classification)

- ๋Œ€ํ‘œ์  ๋ถ„๋ฅ˜ ๋ชจ๋ธ : Decision Trees, Neural Networks, Support Vector Machines(SVM)

โ€‹

  • Unsupervised Learning

- ํ•™์Šต์„ ์œ„ํ•œ โ€˜์ •๋‹ตโ€™์ด ํ•„์š” ์—†์œผ๋ฉฐ, ๋ฐ์ดํ„ฐ๋งŒ ์žˆ์œผ๋ฉด ํ•™์Šต ๊ฐ€๋Šฅ

- ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๋ฅผ โ€˜๊ตฐ์ง‘ํ™”โ€™ํ•˜๋Š” ์ž‘์—…์— ์ฃผ๋กœ ์ด์šฉ (Clustering)

- ๋Œ€ํ‘œ์  ๊ตฐ์ง‘ํ™” ๋ชจ๋ธ : Hierarchical Clustering, K-means Clustering

* ์ฃผ์˜ : ๋ชจ๋ธ ์ •๋Ÿ‰ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด์„œ๋Š” โ€˜์ •๋‹ตโ€™ ํ•„์š”

โ€‹

  • Semi-supervised Learning

- ์ผ๋ถ€ ๋ฐ์ดํ„ฐ์—๋งŒ โ€˜์ •๋‹ตโ€™ ์กด์žฌ

- ์ •๋‹ต์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋„ ํ•™์Šต์— ๋„์›€์ด ๋  ๊ฒƒ์ด๋ผ๋Š” ๊ฐ€์ •

โ€‹

  • Reinforcement Learning

- ํ–‰๋™(์„ ํƒ) โ†’ ๋ณด์ƒ(or ํŒจ๋„ํ‹ฐ)

- ๋” ํฐ ๋ณด์ƒ์„ ์–ป๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต ์ง„ํ–‰

โ€‹

๐–ฅ๐–พ๐–บ๐—๐—Ž๐—‹๐–พโ€‹

== Attribute, ์—ฌ๋Ÿฌ๊ฐ€์ง€ Data type

ex) ์‚ฌ๋žŒ์˜ Feature : ํ‚ค, ์„ฑ๋ณ„, ๋‚˜์ด ๋“ฑ

โ€‹

Feature ์ •์˜๋Š” ์™œ ์ค‘์š”ํ•œ ๊ฒƒ์ธ๊ฐ€์š” ?

- ์ ์ ˆํ•˜๊ฒŒ ์ •์˜ํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์ค‘์š”. ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ขŒ์šฐํ•  ์ˆ˜ ์žˆ์Œ

- Domain Knowledge ํ˜น์€ ์ „๋ฌธ์ง€์‹ ์š”๊ตฌ

ex) ์ผ๋ณธ์–ด ๋ฌธ์žฅ์— ๋Œ€ํ•œ ๊ฐ์ • ๋ถ„์„ โ€” ์ผ๋ณธ์–ด ์ง€์‹ ํ•„์š”

โ€‹

Feature์˜ ๊ฐœ์ˆ˜๋Š” ๋งŽ์€๊ฒŒ ์ข‹์„๊นŒ ์ž‘์€๊ฒŒ ์ข‹์„๊นŒ ?

- ๋ชจ๋ธ๋งˆ๋‹ค ์ ์ ˆํ•œ ๊ฐœ์ˆ˜ ํ•„์š”, ๋„ˆ๋ฌด ๋งŽ์œผ๋ฉด ์ข‹์ง€ ์•Š์Œ

- Feature๋ฅผ ๋ฌดํ„ฑ๋Œ€๊ณ  ๋Š˜๋ฆฌ๋ฉด, Curse of Dimensionality(์ฐจ์›์˜ ์ €์ฃผ) ๋ฌธ์ œ ๋ฐœ์ƒ

* Curse of Dimensionality = Feature๊ฐ€ ๋Š˜์–ด๋‚จ์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ ์กฐํ•ฉ์˜ ๊ฐœ์ˆ˜(ํŠน์ง• ๊ณต๊ฐ„)์€ ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•˜๋ฏ€๋กœ ๊ทธ๋งŒํผ ํ•„์š” ๋ฐ์ดํ„ฐ๋„ ์ฆ๊ฐ€ํ•˜๊ณ , ์ฐจ์›์— ๋น„ํ•ด ๋ฐ์ดํ„ฐ๊ฐ€ ๋„ˆ๋ฌด ์ ์œผ๋ฉด Overfitting ๋ฐœ์ƒ ๊ฐ€๋Šฅ์„ฑ์ด ์ฆ๊ฐ€

โ€‹

  • Feature์˜ ๊ฐœ์ˆ˜

- ๋ฌธ์ œ์— ๋”ฐ๋ผ ์ˆ˜๋ฐฑ ~ ์ˆ˜๋งŒ๊ฐœ ์ด์ƒ์ด ๋  ์ˆ˜ ์žˆ์Œ

- Classification์˜ ๊ฒฝ์šฐ, Label ๊ฐœ์ˆ˜(Class ๊ฐœ์ˆ˜) ๋ณด๋‹ค๋Š” ๋งŽ์€ ๊ฒƒ, Data ๊ฐœ์ˆ˜๋ณด๋‹ค๋Š” ์ ์€ ๊ฒƒ์ด ์ผ๋ฐ˜์ 

ex) Label(Class) = ๊ฐ•์•„์ง€์™€ ๊ณ ์–‘์ด๋ฅผ ๋ถ„๋ฅ˜ ํ•  ๋•Œ ~ ๊ฐ•์•„์ง€, ๊ณ ์–‘์ด

โ€‹

Feature ์ •์˜๋ฅผ ์ƒ๋žตํ•  ์ˆ˜ ์—†์„๊นŒ ?

- Deep Learning์˜ ๊ฒฝ์šฐ, Feature๋ฅผ ์ž๋™์œผ๋กœ ๊ฐ Layer์—์„œ ์ธ์ง€ํ•˜๊ฒŒ ๋˜๋ฏ€๋กœ Feature ์ •์˜๊ฐ€ ์‰ฝ๋‹ค

But, Deep Learning์€ ๊ฐ Feature์— ๋Œ€ํ•œ ํ•ด์„์„ ํ•  ์ˆ˜ ์—†๋‹ค, Parameter ๊ฐœ์ˆ˜๋ฅผ ์ •ํ•ด์ค˜์•ผ ํ•œ๋‹ค

โ€‹

๐–ฌ๐—ˆ๐–ฝ๐–พ๐—…

= Method

- Model์€ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด, ์ž„์˜์˜ task๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ Hypothesis

์•„๋ž˜ ๋ชจ๋ธ ์˜ˆ์‹œ ์ค‘, ๋” ์ ํ•ฉํ•œ ๋ชจ๋ธ์€ ?

- y = ax + c

- y = ax^ + bx + c

โ€‹

  • Complexity (๋ณต์žก๋„)

- Computational Complexity

= ๋ฐ์ดํ„ฐ๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์–ผ๋งˆ๋‚˜ ๊ณ„์‚ฐ๋Ÿ‰์ด ์ฆ๊ฐ€ํ•˜๋Š”๊ฐ€

- Data (Sample Complexity / Consistency )

= ๋ฐ์ดํ„ฐ๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ๋ฌผ์ด ์ข‹์•„์ง€๋Š”๊ฐ€

* ๋‘ ๋ชจ๋ธ์ด ๋™์ผํ•˜๊ฒŒ ๊ฒฐ๊ณผ๋ฅผ ๋‚ผ ๋•Œ, Computational Complexity๋Š” ๋‚ฎ๊ณ  Data Complexity๋Š” ๋†’์œผ๋ฉด ์ข‹๋‹ค

โ€‹

์•„๋ž˜ ๋ชจ๋ธ ์˜ˆ์‹œ ์ค‘, ๋” ๋ณต์žกํ•œ ๋ชจ๋ธ์€ ?

- y = ax + c

- y = ax^ + bx + c

* Feature๊ฐ€ ๋” ๋งŽ๋‹ค(๊ณฑ์…ˆ, ๋ง์…ˆ์„ ๋” ๋งŽ์ด ์ง„ํ–‰)

* Computational ๊ด€์ ์œผ๋กœ (a, b, c) 3๊ฐœ์˜ Parameter๋กœ ์ธํ•œ ๊ณ„์‚ฐ๋Ÿ‰ ์ฆ๊ฐ€

โ€‹

  • Parametric, Non-Parametric

- Mahcine Learning์€ task๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก 

- Parametric๊ณผ Non-Parametric์€ task๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๊ณ ๋ฏผํ•˜๋Š” ๊ฒƒ๊ณผ ๊ด€๋ จ์ด ์žˆ์Œ

- ๋น„์ง€๋„ ํ•™์Šต์˜ ๊ฒฝ์šฐ, ์ด๋ฅผ ๋”ฐ์ ธ๋ณด๋Š” ๊ฒƒ์ด ์ค‘์š” (์ง€๋„ํ•™์Šต์—๋„ ์ ์šฉ ๊ฐ€๋Šฅ)

- โ€˜Parameter = ๋งค๊ฐœ๋ณ€์ˆ˜โ€™๋Š” ๋ชจ๋ธ์ด ์ตœ์ ํ™”ํ•˜๋Š” ๋Œ€์ƒ(bias, weight ๋“ฑ)

- Parametric์€ ๊ณ ์ € ๋œ ํฌ๊ธฐ์˜ ๋งค๊ฐœ ๋ณ€์ˆ˜ ์„ธํŠธ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์š”์•ฝํ•˜๋Š” ํ•™์Šต method (์•„๋ฌด๋ฆฌ ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•ด๋„ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ˆ˜ ๊ณ ๋ ค x)

- Non-Parametric์€ ๋ชจ๋ธ์„ ์ ์šฉํ•˜๊ธฐ ์–ด๋ ค์šด ์ƒํ™ฉ, ๋ฐ์ดํ„ฐ Feature๊ฐ€ ๋งŽ์•„์ง€๊ฑฐ๋‚˜ ๋ถ„ํฌ์–‘์ƒ์— ๋Œ€ํ•œ ์ง€์‹์ด ์—†์„ ๋•Œ ์‚ฌ์šฉ

ex) ์ˆœ์ˆ˜ ๋นˆ๋„์ˆ˜ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ (์ˆœ์œ„ํ•ฉ๊ฒ€์ •), knn ์•Œ๊ณ ๋ฆฌ์ฆ˜

โ€‹

  • Generative, Discriminative

- Generative ๋ฐฉ์‹ = Joint Probability, P(x, y)๋ฅผ ํ•™์Šต

- Discriminative ๋ฐฉ์‹ = Conditional Probability, P(y|x)๋ฅผ ์ง์ ‘ ํ•™์Šต

* Joint ์—ฐ์‚ฐ ์‹œ, ๋น„๊ต์  ๋‹ค์–‘ํ•œ ๋ชฉ์ ๋“ค์— ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— Generative๊ฐ€ ๋” Powerful

But, Classification์„ ์ง„ํ–‰ํ•  ๋•Œ, Bates Rule์„ ์ด์šฉํ•˜์—ฌ P(y|x) ๊ณ„์‚ฐ

โ€‹

โ€‹

๐–ญ๐—ˆ ๐–ฅ๐—‹๐–พ๐–พ ๐–ซ๐—Ž๐—‡๐–ผ๐— ๐—๐—Œ. ๐–ฎ๐–ผ๐–ผ๐–บ๐—†โ€™๐—Œ ๐–ฑ๐–บ๐—“๐—ˆ๐—‹

- ์ „์ž : โ€˜๋ชจ๋“  ๊ฒƒ์— ์ตœ์„ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์—†๋‹ค !โ€™

- ํ›„์ž : โ€˜๋‹จ์ˆœํ•จ์ด ์˜์™ธ๋กœ ์ง„๋ฆฌ์ผ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹คโ€™

โ€‹โ€‹

๐–ค๐—๐–บ๐—…๐—Ž๐–บ๐—๐—‚๐—ˆ๐—‡ ๐—ˆ๐–ฟ ๐–ฌ๐—ˆ๐–ฝ๐–พ๐—…

= ์ •์„ฑํ‰๊ฐ€, ์ •๋Ÿ‰ํ‰๊ฐ€

- ์ •์„ฑํ‰๊ฐ€ : ๋ˆˆ์œผ๋กœ ์ง์ ‘ ํ™•์ธ ํ•˜๋Š” ๊ฒƒ

ex) Case๋ฅผ ์ง์ ‘ ๋ถ„์„, ํŠน์ •๊ฒฐ๊ณผ๋“ค์˜ ์›์ธ์„ ์ถ”๋ก 

- ์ •๋Ÿ‰ํ‰๊ฐ€ : ์ •๋‹ต(Truth Ground)์™€ ๋น„๊ตํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๋Œ€์ˆ˜๋กœ ํ‘œํ˜„ํ•˜๊ณ  ๋น„๊ต

โ€‹

  • ์ •๋Ÿ‰ํ‰๊ฐ€ ๋ฐฉ๋ฒ•

- Accuracy = 1 โ€” Error

- Precision, Recall, F1 score โ€ฆ

- ROC curve์˜ Area Under the Curve(AUC) ๊ฐ’

- BLEU metric(score) = BiLingual Evaluation Understudy (๊ฐ€์ค‘ ๊ธฐํ•˜ ํ‰๊ท )

But, ๊ฐ๊ด€์ ์ด์ง€ ๋ชปํ•จ (์˜๋ฏธ์—†์ด ์ค‘๋ณต๋˜์–ด๋„ ํšŸ์ˆ˜ ์…ˆ), ์ด๊ฒƒ์„ ๋Œ€๋น„ํ•˜์—ฌ Clipping Precision Counts๋ผ๋Š” ๋Œ€์•ˆ ๋“ฑ์žฅ

ex) ์ž๋™๋ฒˆ์—ญ ๋ฌธ์žฅ์— ๋Œ€ํ•œ ํ‰๊ฐ€์—์„œ ๊ด€์‚ฌ โ€˜theโ€™๊ฐ€ ๋ฐ˜๋ณต๋˜๋Š” ๊ฒฝ์šฐ

But, ์‹ค์ œ๋กœ ๋ฌธ์žฅ์ด ์™„์„ฑ๋˜์ง€ ์•Š์•˜๋”๋ผ๋„ ์„ฑ๋Šฅ์ด ๋†’๊ฒŒ ๋‚˜์˜ด, Brevity Penalty๋ผ๋Š” ๋Œ€์•ˆ ๋“ฑ์žฅ (๋ฌธ์žฅ ์ „์ฒด Length ๊ณ ๋ ค)

โ€‹

  • ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ๋ถ„๋ฐฐ

- Train/Test = ๊ถŒ์žฅ x, ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ๋ฅผ ๋ณด๊ณ  ํŒ๋‹จํ•  ์ˆ˜ ์žˆ์Œ (์ตœ์ ํ™” ๋‹จ๊ณ„๊ฐ€ ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ๋กœ๋ถ€ํ„ฐ ์˜ํ–ฅ์„ ๋ฐ›์œผ๋ฉด ์•ˆ๋จ)

- Train/Validation/Test = ๋ชจ๋ธ ํŠœ๋‹์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ, ํ•™์Šต์— ๊ฐ„์ ‘์ ์œผ๋กœ ์˜ํ–ฅ์„ ์ฃผ๋Š” ๋ฐ์ดํ„ฐ โ†’ Validation Data

- k-fold Cross-Validation = ๊ฐ€์žฅ ๊ฐ๊ด€์ , Train Data๋ฅผ k๋“ฑ๋ถ„ํ•˜์—ฌ ์ฒซ๋ฒˆ์งธ ํ† ๋ง‰์„ Validation or Test Data๋กœ ์‚ฌ์šฉ

> ๋‘๋ฒˆ์งธ ํ† ๋ง‰์„ Validation or Test Data ๋‚˜๋จธ์ง€ k-1๊ฐœ๋ฅผ Training Data โ€ฆ (๋ชจ๋“  fold ์‚ฌ์šฉ์‹œ ๊นŒ์ง€)

โ€‹

  • Bias์™€ Variance

ex) ๋ถˆ์„ ๋Œ ๋•Œ, ์†Œ๋ฐฉํ˜ธ์Šค๊ฐ€ ํ–ฅํ•œ ๋ฐฉํ–ฅ Bias, ์†Œ๋ฐฉํ˜ธ์Šค์—์„œ ๋ฌผ์ด ๋‚˜๊ฐ€๋Š” ๋ถ„์‚ฌ ์ •๋„ Variance

- ์™ผ : ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ โ€” Model์˜ Complexity๊ฐ€ ๋‚ฎ์Œ, ํŒ๋‹จ ๊ฒฝ๊ณ„์„ ์„ ๋” ๋นผ๊ณกํ•˜๊ฒŒ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค

But, ๋„ˆ๋ฌด Overํ•˜๊ฒŒ ๋””๋ฉด Train set์—๋งŒ ๋„ˆ๋ฌด ๋งž์ถฐ์ ธ Test set์— ๋Œ€ํ•œ ์„ฑ๋Šฅ์€ ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ๋‹ค

* Model Complexity๊ฐ€ ๋„ˆ๋ฌด ๋†’์•„์ง€๋ฉด Overfitting ํ™•๋ฅ  ์ฆ๊ฐ€ (Model Complexity๋Š” Model์˜ Feature์™€ ๊ด€๋ จ)

- ์˜ค : Model Complexity๊ฐ€ ๋†’์•„์ง€๋ฉด Training Sample์˜ Prediction Error(์˜ˆ์ธก ์˜ค๋ฅ˜์œจ)์€ ๊ฐ์†Œ

But, Test Sample์˜ Prediction Error๋Š” ๋‹ค์‹œ ๋†’์•„์ง€๋Š” ๋ฐ, ์ด๊ฒƒ์ด Overfitting(๊ณผ์ ํ•ฉ)

โ€‹

*์šฉ์–ด

- Generalization = Unseen Data์— ๋Œ€ํ•ด์„œ ์ž˜ ๋™์ž‘ํ•˜๋ฉด Generalization์ด ์ž˜๋œ ๊ฒƒ

- Capacity = ๋ชจ๋ธ์— ์˜ํ•œ ๊ณต๊ฐ„ (ํŒ๋‹จ๊ฒฝ๊ณ„์„ )

โ€‹

- Variance๊ฐ€ ํฐ Model์˜ ๋Œ€์ฑ… : Data ๊ฐœ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๊ฑฐ๋‚˜ ๋ชจ๋ธ ๊ฒฝ๋Ÿ‰ํ™”

- Bias๊ฐ€ ํฐ Model์˜ ๋Œ€์ฑ… : Model Complexity ์ƒํ–ฅ

โ˜† Variance๊ฐ€ ํฐ ๋ชจ๋ธ์€ Overfittig, Bias๊ฐ€ ํฐ ๋ชจ๋ธ์€ Underfitting โ€” ์ด ๋‘˜์€ trade-off ๊ด€๊ณ„

โ€‹

๐– ๐—…๐—€๐—ˆ๐—‹๐—‚๐—๐—๐—† ๐–ฟ๐—ˆ๐—‹ ๐–ณ๐—‹๐–บ๐—‚๐—‡

= โ€˜ํ•™์Šต์˜ ๋ชฉ์ โ€™ : ์ž„์˜์˜ ์ฃผ์–ด์ง„ ๋ชจ๋ธ์˜ Parameter๋“ค์˜ ๊ฐ’์„ ์ฃผ์–ด์ง„ Data์— ๋งž๊ฒŒ Update

- Parameter ์ตœ์ ํ™”, Target Function์˜ ํ•จ์ˆ˜ ๊ฐ’์„ ์ตœ์ ํ™”ํ•˜๋Š” Parameter ๊ฐ’ ์ฐพ๊ธฐ

โ€‹

*์šฉ์–ด

- Loss Fuunction : Data Instance์— ๋Œ€ํ•œ Prediction Penalty ์ •์˜ํ•˜๋Š” ํ•จ์ˆ˜, ํ‹€๋ ธ์„ ๋•Œ ์–ด๋Š ์ •๋„ ํ‹€๋ ธ๋‚˜

ex) Square Loss, Hinge Loss, 0/1 Loss โ€ฆ

- Cost Function : Loss Function๋ณด๋‹ค ์ผ๋ฐ˜ํ™”๋œ ๊ฐœ๋…, ์ „์ฒด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ Loss ํ•ฉ์— ์ •๊ทœํ™” term ํฌํ•จ

ex) MSE(Mean Squared Error)

- Objective Function : ๊ฐ€์žฅ ์ผ๋ฐ˜ํ™”๋œ ์šฉ์–ด, ํ•™์Šต์„ ํ†ตํ•ด โ€˜์ตœ์ ํ™”โ€™ํ•˜๋ ค๋Š” ๋ชจ๋“  ์ข…๋ฅ˜์˜ ํ•จ์ˆ˜

ex) MLE

# Loss is a part of cost which is a type of objective

โ€‹

* ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜

- MLE (Maximum Likelihood Estimation)

- MAP (Maximum a Posteriori Estimation)

- EM (Expectation-Maximiazation)

- Gibbs Sampling, Gradient Descent, Variational Inference, Laplace Approximation โ€ฆ

โ€‹โ€‹

๐–ฃ๐–บ๐—๐–บ

= ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ธฐ๊ณ„ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ Parameter๋“ค์ด ํ•™์Šต

โ€˜์š”๋ฆฌ์— ๋น„์œ ํ•˜๋ฉด ์–ด๋–ป๊ฒŒ ๋ ๊นŒ ?

- ์žฌ๋ฃŒ = Data, ์žฌ๋ฃŒ์†์งˆ = Feeature Engineering, ์š”๋ฆฌ๋ฐฉ๋ฒ• = Algorithm, ์‹œ์‹ = Evaluation

โ€‹

Data๊ฐ€ ๋ถ€์กฑํ•˜๊ฑฐ๋‚˜, ๋ถ„ํฌ๊ฐ€ ์น˜์šฐ์ณ์ ธ(biased) ์žˆ์„ ๋•Œ๋Š” ์–ด๋–ป๊ฒŒ ํ• ๊นŒ ?

- Sampling : Down-sampling = ๋น„์œจ์ด ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ ๊ฒŒ ์ฑ„ํƒ, Up-sampling = ๋น„์œจ์ด ์ ์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŽ์ด ์ฑ„ํƒ

- Distant Supervision : Semi-supervised ๋ฐฉ์‹, โ€˜๊ฐ€์ •โ€™์„ ๋ฐ”ํƒ•์œผ๋กœ ๋ฐ์ดํ„ฐ Label์ด ์žˆ๋‹ค๊ณ  ์ทจ๊ธ‰

ex) ๋จธ๋ฆฌ์นด๋ฝ ๊ธธ์ด๊ฐ€ 30cm ์ด์ƒ์ด๋ฉด Girl

- Bagging (Bootstrap Aggregating) : ์ „์ฒด Data์—์„œ Samplingํ•˜์—ฌ Train, Test ๋ฐ˜๋ณต

--

--