๋จธ์‹ ๋Ÿฌ๋‹ ์ง€๋„ํ•™์Šต์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•˜๊ณ  ์ง๊ด€์ ์ธ ๋ถ„๋ฅ˜ ๋ฐ ํšŒ๊ท€ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ธ KNN ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ •๋ฆฌ

KNN (K-Nearest Neighbors) ๊ฐœ์š”

  • K-์ตœ๊ทผ์ ‘ ์ด์›ƒ(KNN, K-Nearest Neighbors)์€ ์ง€๋„ํ•™์Šต(Supervised Learning)์—์„œ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š” ๊ฐ„๋‹จํ•˜์ง€๋งŒ ๊ฐ•๋ ฅํ•œ ๋ถ„๋ฅ˜(Classification) ๋ฐ ํšŒ๊ท€(Regression) ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ, ํ•™์Šต ๊ณผ์ •์ด ๊ฑฐ์˜ ์—†๊ณ , ์˜ˆ์ธก ์‹œ์ ์— ๊ณ„์‚ฐ์ด ์ด๋ฃจ์–ด์ง€๋Š” Lazy Learning ๋ฐฉ์‹์˜ ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์ด๋‹ค.
  • ์•„์ด๋””์–ด: ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ํ•™์Šต ๋ฐ์ดํ„ฐ ์ค‘ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด K๊ฐœ์˜ ์ด์›ƒ์„ ์ฐพ๊ณ , ์ด์›ƒ๋“ค์˜ ๋ ˆ์ด๋ธ”์„ ๋ฐ”ํƒ•์œผ๋กœ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.
  • ์šฉ๋„: ๋ถ„๋ฅ˜(Classification), ํšŒ๊ท€(Regression), ์ด์ƒ์น˜ ํƒ์ง€ ๋“ฑ

KNN ๋ฐฉ๋ฒ•

1. ๊ฑฐ๋ฆฌ ์ธก์ •

  • ๋ฐ์ดํ„ฐ ๊ฐ„ ๊ฑฐ๋ฆฌ(์œ ์‚ฌ์„ฑ)๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๊ฑฐ๋ฆฌ ํ•จ์ˆ˜๊ฐ€ ์‚ฌ์šฉ๋˜๋ฉฐ, ๋Œ€ํ‘œ์ ์œผ๋กœ ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ(Euclidean Distance) ๊ฐ€ ์žˆ๋‹ค.

2. ๋ถ„๋ฅ˜(Classification)์—์„œ์˜ KNN

  • ๋ถ„๋ฅ˜ ๋ฌธ์ œ์—์„œ KNN ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ƒˆ๋กœ์šด ์ƒ˜ํ”Œ ์— ๋Œ€ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋™์ž‘ํ•œ๋‹ค:
    1. ๋ชจ๋“  ํ•™์Šต ์ƒ˜ํ”Œ ์™€์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.
    2. ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๊ฐœ์˜ ์ƒ˜ํ”Œ์„ ์„ ํƒํ•œ๋‹ค.
    3. ํด๋ž˜์Šค ๋ ˆ์ด๋ธ”์„ ํ™•์ธํ•˜์—ฌ ๋‹ค์ˆ˜๊ฒฐ(Voting) ๋กœ ์ตœ์ข… ํด๋ž˜์Šค๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค.

3. ํšŒ๊ท€(Regression)์—์„œ์˜ KNN

  • ํšŒ๊ท€ ๋ฌธ์ œ์—์„œ KNN ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ฐœ์˜ ์ด์›ƒ ์ƒ˜ํ”Œ์˜ ํƒ€๊ฒŸ ๊ฐ’ ํ‰๊ท ์œผ๋กœ ์˜ˆ์ธก ๊ฐ’์„ ๊ณ„์‚ฐํ•œ๋‹ค.

4. KNN ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ

  • K (์ด์›ƒ ์ˆ˜): ์ž‘์„์ˆ˜๋ก ๋ฏผ๊ฐ, ํด์ˆ˜๋ก ๋ถ€๋“œ๋Ÿฌ์šด ๊ฒฐ์ • ๊ฒฝ๊ณ„
  • ๊ฑฐ๋ฆฌ ์ฒ™๋„: ์œ ํด๋ฆฌ๋“œ, ๋งจํ•ดํŠผ ๋“ฑ
  • ๊ฐ€์ค‘์น˜(ํ•„์š”์‹œ):
    • ๊ท ๋“ฑ ๊ฐ€์ค‘์น˜: ๋ชจ๋“  ์ด์›ƒ์ด ๋™์ผํ•œ ์˜ํ–ฅ
    • ๊ฑฐ๋ฆฌ ๊ฐ€์ค‘์น˜: ๊ฐ€๊นŒ์šด ์ด์›ƒ์— ๋” ํฐ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌ

KNN ์žฅ๋‹จ์ 

์žฅ์ ๋‹จ์ 
- ๊ตฌํ˜„์ด ๊ฐ„๋‹จํ•จ
- ๋น„์„ ํ˜• ๋ฐ์ดํ„ฐ์—๋„ ์ž˜ ์ž‘๋™
- ํ•™์Šต ๊ณผ์ •์ด ํ•„์š” ์—†์Œ
- ์˜ˆ์ธก ์‹œ ๊ณ„์‚ฐ๋Ÿ‰์ด ํผ (๋ฉ”๋ชจ๋ฆฌ ์š”๊ตฌ๋Ÿ‰ ํผ)
- ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ์—์„œ ์„ฑ๋Šฅ ์ €ํ•˜ (์ฐจ์›์˜ ์ €์ฃผ)
- ์ด์ƒ์น˜์— ๋ฏผ๊ฐํ•˜๊ณ  ์ ์ ˆํ•œ ์„ ํƒ ํ•„์š”

KNN ์‚ฌ์šฉ ์‹œ ๊ณ ๋ คํ•ด์•ผํ•  ์ 

  • ๋ฐ์ดํ„ฐ ์ •๊ทœํ™”: ์„œ๋กœ ๋‹ค๋ฅธ ๋ฒ”์œ„์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณต์กดํ•  ๊ฒฝ์šฐ ๊ฑฐ๋ฆฌ ์ธก์ •์ด ์˜๋ฏธ๋ฅผ ๊ฐ€์ง€์ง€ ๋ชปํ•˜๋ฏ€๋กœ, ๊ณตํ‰ํ•œ ๊ฑฐ๋ฆฌ ์ธก์ •์„ ์œ„ํ•ด ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ํŠน์ • ๋ฒ”์œ„๋‚ด๋กœ ์ •๊ทœํ™”๊ฐ€ ํ•„์š”ํ•จ.
  • ํ™€์ˆ˜ K: ๋ถ„๋ฅ˜ ๋ฌธ์ œ์—์„œ ๊ฐœ ์ƒ˜ํ”Œ์„ ์ด์šฉํ•ด ๋‹ค์ˆ˜๊ฒฐ๋กœ ๋ ˆ์ด๋ธ” ์„ ํƒ ์‹œ, ์ง์ˆ˜๊ฐœ์˜ ๊ฒฝ์šฐ ๋™์  ์ƒํ™ฉ์ด ๋ฐœ์ƒํ•จ. ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ณ ์ž ๋Š” ํ™€์ˆ˜๋กœ ์ง€์ •ํ•˜๋Š” ๊ฒƒ์„ ์ถ”์ฒœํ•จ.

Python ์˜ˆ์‹œ ์ฝ”๋“œ

  • Scikit-learn ์„ ์ด์šฉํ•œ KNN ์‹ค์Šต ์ฝ”๋“œ
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
 
# ๋ฐ์ดํ„ฐ์…‹ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
X, y = load_iris(return_X_y=True)
 
# ํ•™์Šต/ํ…Œ์ŠคํŠธ ๋ถ„ํ• 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
 
# KNN ๋ชจ๋ธ ํ•™์Šต
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train) 
# ์‹ค์ œ๋กœ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ, ์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ X_train(ํŠน์ง•๋ฒกํ„ฐ)์™€ Y_train(์ •๋‹ต๋ ˆ์ด๋ธ”)์„
# ๋‚ด๋ถ€์— ์ €์žฅํ•จ. ์ดํ›„ ์˜ˆ์ธก ๋‹จ๊ณ„์—์„œ ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•จ.
 
# ์˜ˆ์ธก ๋ฐ ์ •ํ™•๋„ ํ‰๊ฐ€
y_pred = knn.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

์ฐธ๊ณ